Classification Benchmark Results

Generated: 2026-04-02 22:12

Summary

#	Raw Input	Claude Haiku 4.5	GPT-5.4 mini	GPT-5 mini	Raptor mini
1	Need to pick up some items from the store, milk, h...	actions (95%)	actions (99%)	actions (98%)	actions (95%)
2	I need to get a birthday present for my mom	actions (92%)	actions (98%)	actions (98%)	actions (92%)
3	Study the only begotten references	study (88%)	study (92%)	study (98%)	study (92%)
4	Teaching in the saviors way for parents : higbys	people (82%)	study (82%)	actions (95%)	people (72%)
5	Squad for ai agent flow, what can we learn? https:...	ideas (90%)	ideas (72%)	ideas (95%)	ideas (88%)
6	Ai jobs and skills, where do I stand? https://yout...	ideas (85%)	journal (71%)	ideas (92%)	ideas (82%)
7	In brain app the scriptures dont show the body of ...	projects (88%)	projects (86%)	actions (95%)	projects (90%)
8	Im pretty sure AI opus/sonnet wrote the article, b...	ideas (82%)	journal (77%)	people (95%)	ideas (85%)
9	Claude scientific research https://www.anthropic.c...	ideas (88%)	ideas (67%)	ideas (95%)	ideas (85%)
10	Study enhancement? See if there are other ways of ...	projects (88%)	projects (90%)	projects (95%)	projects (88%)
11	Agent sandbox as a basis for our brain sandbox? ht...	ideas (90%)	ideas (88%)	ideas (95%)	ideas (90%)
12	Could I use this for my star trek UIs? https://git...	ideas (85%)	ideas (80%)	ideas (93%)	ideas (88%)
13	Bryce physical therapy. Research hippa compliant A...	people (85%)	actions (84%)	actions (95%)	people (82%)
14	Look into how they are using github copilot sdk an...	ideas (90%)	ideas (86%)	ideas (95%)	ideas (88%)
15	Stripe minions https://stripe.dev/blog/minions-str...	ideas (90%)	ideas (87%)	ideas (95%)	ideas (87%)

Detail

Entry 1: Need to pick up some items from the store, milk, hotdogs, and pizza

Current: actions → "Grocery shopping list"

Claude Haiku 4.5: actions (95%) — "Pick up items from store" [groceries, errands] (5152ms)

GPT-5.4 mini: actions (99%) — "Pick up store items" [groceries, errand, shopping] (9108ms)

GPT-5 mini: actions (98%) — "Pick up groceries from store" [groceries, errand, shopping] (18974ms)

Raptor mini: actions (95%) — "Pick up groceries from store" [groceries, errands] (3989ms)

Entry 2: I need to get a birthday present for my mom

Current: actions → "Get birthday present for mom"

Claude Haiku 4.5: actions (92%) — "Get birthday present for mom" [birthday, gift, family] (16249ms)

GPT-5.4 mini: actions (98%) — "Buy mom a birthday present" [gift, birthday, family] (2935ms)

GPT-5 mini: actions (98%) — "Buy mom's birthday present" [gift, family, birthday] (21559ms)

Raptor mini: actions (92%) — "Get birthday present for mom" [birthday, gift, family] (16622ms)

Entry 3: Study the only begotten references

Current: study → "Study the Only Begotten References"

Claude Haiku 4.5: study (88%) — "Study only begotten references" [scripture study, christology, only begotten] (6205ms)

GPT-5.4 mini: study (92%) — "Only Begotten references" [scripture, doctrine, jesus-christ] (4272ms)

GPT-5 mini: study (98%) — "Study only begotten references" [scripture, theology, study] (24940ms)

Raptor mini: study (92%) — "Study only begotten references" [scripture study, christology, only begotten] (6082ms)

Entry 4: Teaching in the saviors way for parents : higbys

Current: study → "Teaching Children in Savior's Way"

Claude Haiku 4.5: people (82%) — "Teaching in Savior's way - Higbys" [church, teaching, family ministry] (5982ms)

GPT-5.4 mini: study (82%) — "Teaching in the Savior's Way" [gospel-learning, parents, teaching] (5087ms)

GPT-5 mini: actions (95%) — "Teach Savior's Way to Higbys" [teaching, parents, Higbys] (21410ms)

Raptor mini: people (72%) — "Higbys: Teaching in Savior's Way" [church, teaching, family ministry] (20524ms)

Entry 5: Squad for ai agent flow, what can we learn? https://github.blog/ai-and-ml/github...

Current: study → "Learning from SQUAD AI Agent Flow"

Claude Haiku 4.5: ideas (90%) — "Learn from Squad AI agent flow" [ai agents, architecture, research] (11199ms)

GPT-5.4 mini: ideas (72%) — "Squad AI agent flow" [ai, agents, workflow] (4901ms)

GPT-5 mini: ideas (95%) — "Squad AI agent flow insights" [ai-agents, github, research] (22049ms)

Raptor mini: ideas (88%) — "Learn from Squad AI agent flow" [ai-agents, github-copilot, research] (19797ms)

Entry 6: Ai jobs and skills, where do I stand? https://youtu.be/4cuT-LKcmWs?si=CqAizAMZy0...

Current: study → "Assessing AI Skills and Career Positioning"

Claude Haiku 4.5: ideas (85%) — "Assess AI jobs and skills standing" [career, ai skills, self-assessment] (14620ms)

GPT-5.4 mini: journal (71%) — "Where I stand in AI" [ai, career, skills] (8849ms)

GPT-5 mini: ideas (92%) — "Assess AI jobs and skills" [ai, career, skills] (22360ms)

Raptor mini: ideas (82%) — "Evaluate my AI job market position" [career, ai-skills, self-assessment] (32972ms)

Entry 7: In brain app the scriptures dont show the body of the memorize items

Current: study → "Scriptures not showing in Brain app"

Claude Haiku 4.5: projects (88%) — "Brain app scripture memorize bug" [bug, brain app, scriptures] (3445ms)

GPT-5.4 mini: projects (86%) — "Fix scripture body display bug" [bug, app, scriptures] (4053ms)

GPT-5 mini: actions (95%) — "Brain app scripture display bug" [bug, brain-app, scripture] (20110ms)

Raptor mini: projects (90%) — "Brain app scripture display bug" [brain-app, bug, scriptures] (4568ms)

Entry 8: Im pretty sure AI opus/sonnet wrote the article, but im also 95% sure he used AI...

Current: people → "Uncertainty about AI's role in content creation"

Claude Haiku 4.5: ideas (82%) — "AI-written article and comment detection" [ai detection, authenticity, observation] (8761ms)

GPT-5.4 mini: journal (77%) — "AI-written article suspicion" [ai, writing, observation] (2865ms)

GPT-5 mini: people (95%) — "Suspect AI-authored article/comments" [ai-detection, authorship, dev.to] (24008ms)

Raptor mini: ideas (85%) — "AI-written article and comment detection" [ai-detection, authenticity, content-creation] (6056ms)

Entry 9: Claude scientific research https://www.anthropic.com/research/long-running-Claud...

Current: projects → "Claude AI Research Project"

Claude Haiku 4.5: ideas (88%) — "Claude long-running scientific research" [ai research, anthropic, claude] (4007ms)

GPT-5.4 mini: ideas (67%) — "Claude research review" [ai, research, claude] (1196ms)

GPT-5 mini: ideas (95%) — "Long-running Claude research" [anthropic, claude, research] (10810ms)

Raptor mini: ideas (85%) — "Claude long-running scientific research" [anthropic, claude, ai-research] (27996ms)

Entry 10: Study enhancement? See if there are other ways of digesting our study materials,...

Current: projects → "Enhance Study Material Digestion System"

Claude Haiku 4.5: projects (88%) — "Enhance study material digestion methods" [brain app, study tools, knowledge management] (5077ms)

GPT-5.4 mini: projects (90%) — "Enhance study digestion" [study, workflow, knowledge-management] (7461ms)

GPT-5 mini: projects (95%) — "Improve study materials digestion" [study, corpus, ingestion] (30090ms)

Raptor mini: projects (88%) — "Enhance study corpus digestion methods" [study-tools, knowledge-management, ai-assisted] (6545ms)

Entry 11: Agent sandbox as a basis for our brain sandbox? https://sandbox.agent-infra.com/...

Current: ideas → "Agent Sandbox for Brain Sandbox Foundation"

Claude Haiku 4.5: ideas (90%) — "Agent sandbox for brain sandbox basis" [brain app, sandbox, ai agents] (4526ms)

GPT-5.4 mini: ideas (88%) — "Agent sandbox for brain sandbox" [sandbox, agents, brain] (4432ms)

GPT-5 mini: ideas (95%) — "Agent sandbox as brain sandbox" [agent-sandbox, brain-sandbox, agent-infra] (19127ms)

Raptor mini: ideas (90%) — "Agent sandbox for brain sandbox basis" [brain-app, sandbox, agent-infra] (5222ms)

Entry 12: Could I use this for my star trek UIs? https://github.com/chenglou/pretext

Current: ideas → "Star Trek UI with Pretext"

Claude Haiku 4.5: ideas (85%) — "Pretext library for Star Trek UIs" [star trek, ui design, tools] (20291ms)

GPT-5.4 mini: ideas (80%) — "Pretext for Star Trek UIs" [ui, design, star-trek] (3498ms)

GPT-5 mini: ideas (93%) — "Pretext for Star Trek UIs" [pretext, ui, github] (10806ms)

Raptor mini: ideas (88%) — "Pretext library for Star Trek UIs" [star-trek, ui-design, tooling] (5474ms)

Entry 13: Bryce physical therapy. Research hippa compliant AI. Github copilot?

Current: projects → "HIPAA-Compliant AI Research"

Claude Haiku 4.5: people (85%) — "Bryce PT - HIPAA compliant AI" [HIPAA, ai tools, physical therapy] (4520ms)

GPT-5.4 mini: actions (84%) — "Research HIPAA-compliant AI" [ai, hipaa, research] (1338ms)

GPT-5 mini: actions (95%) — "Bryce PT and HIPAA AI research" [physical-therapy, hipaa, github-copilot] (30122ms)

Raptor mini: people (82%) — "Bryce PT: HIPAA compliant AI research" [healthcare, hipaa, ai-compliance] (5177ms)

Entry 14: Look into how they are using github copilot sdk and agents! https://github.blog/...

Current: projects → "Research GitHub Copilot Agent-Driven Development"

Claude Haiku 4.5: ideas (90%) — "GitHub Copilot SDK and agents usage" [github copilot, ai agents, sdk] (4268ms)

GPT-5.4 mini: ideas (86%) — "Copilot SDK and agents" [github-copilot, agents, ai] (1258ms)

GPT-5 mini: ideas (95%) — "GitHub Copilot SDK and agents" [github-copilot, agents, sdk] (18984ms)

Raptor mini: ideas (88%) — "GitHub Copilot SDK and agents research" [github-copilot, ai-agents, sdk] (6759ms)

Entry 15: Stripe minions https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-codin...

Current: ideas → "Stripe Minions Coding Agents"

Claude Haiku 4.5: ideas (90%) — "Stripe Minions one-shot coding agents" [ai agents, coding agents, stripe] (4051ms)

GPT-5.4 mini: ideas (87%) — "Stripe minions" [stripe, agents, coding] (3668ms)

GPT-5 mini: ideas (95%) — "Stripe Minions one-shot agents" [stripe, ai-agents, minions] (20598ms)

Raptor mini: ideas (87%) — "Stripe Minions one-shot coding agents" [ai-agents, stripe, coding-agents] (5289ms)

Latency

Model	Avg	Min	Max
Claude Haiku 4.5	7890ms	3445ms	20291ms
GPT-5.4 mini	4328ms	1196ms	9108ms
GPT-5 mini	21063ms	10806ms	30122ms
Raptor mini	11538ms	3989ms	32972ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification Benchmark Results

Summary

Detail

Entry 1: Need to pick up some items from the store, milk, hotdogs, and pizza

Entry 2: I need to get a birthday present for my mom

Entry 3: Study the only begotten references

Entry 4: Teaching in the saviors way for parents : higbys

Entry 5: Squad for ai agent flow, what can we learn? https://github.blog/ai-and-ml/github...

Entry 6: Ai jobs and skills, where do I stand? https://youtu.be/4cuT-LKcmWs?si=CqAizAMZy0...

Entry 7: In brain app the scriptures dont show the body of the memorize items

Entry 8: Im pretty sure AI opus/sonnet wrote the article, but im also 95% sure he used AI...

Entry 9: Claude scientific research https://www.anthropic.com/research/long-running-Claud...

Entry 10: Study enhancement? See if there are other ways of digesting our study materials,...

Entry 11: Agent sandbox as a basis for our brain sandbox? https://sandbox.agent-infra.com/...

Entry 12: Could I use this for my star trek UIs? https://github.com/chenglou/pretext

Entry 13: Bryce physical therapy. Research hippa compliant AI. Github copilot?

Entry 14: Look into how they are using github copilot sdk and agents! https://github.blog/...

Entry 15: Stripe minions https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-codin...

Latency

FilesExpand file tree

results-copilot.md

Latest commit

History

results-copilot.md

File metadata and controls

Classification Benchmark Results

Summary

Detail

Entry 1: Need to pick up some items from the store, milk, hotdogs, and pizza

Entry 2: I need to get a birthday present for my mom

Entry 3: Study the only begotten references

Entry 4: Teaching in the saviors way for parents : higbys

Entry 5: Squad for ai agent flow, what can we learn? https://github.blog/ai-and-ml/github...

Entry 6: Ai jobs and skills, where do I stand? https://youtu.be/4cuT-LKcmWs?si=CqAizAMZy0...

Entry 7: In brain app the scriptures dont show the body of the memorize items

Entry 8: Im pretty sure AI opus/sonnet wrote the article, but im also 95% sure he used AI...

Entry 9: Claude scientific research https://www.anthropic.com/research/long-running-Claud...

Entry 10: Study enhancement? See if there are other ways of digesting our study materials,...

Entry 11: Agent sandbox as a basis for our brain sandbox? https://sandbox.agent-infra.com/...

Entry 12: Could I use this for my star trek UIs? https://github.com/chenglou/pretext

Entry 13: Bryce physical therapy. Research hippa compliant AI. Github copilot?

Entry 14: Look into how they are using github copilot sdk and agents! https://github.blog/...

Entry 15: Stripe minions https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-codin...

Latency