Skip to content

Latest commit

 

History

History
244 lines (133 loc) · 11.2 KB

File metadata and controls

244 lines (133 loc) · 11.2 KB

Classification Benchmark Results

Generated: 2026-04-02 22:12

Summary

# Raw Input Claude Haiku 4.5 GPT-5.4 mini GPT-5 mini Raptor mini
1 Need to pick up some items from the store, milk, h... actions (95%) actions (99%) actions (98%) actions (95%)
2 I need to get a birthday present for my mom actions (92%) actions (98%) actions (98%) actions (92%)
3 Study the only begotten references study (88%) study (92%) study (98%) study (92%)
4 Teaching in the saviors way for parents : higbys people (82%) study (82%) actions (95%) people (72%)
5 Squad for ai agent flow, what can we learn? https:... ideas (90%) ideas (72%) ideas (95%) ideas (88%)
6 Ai jobs and skills, where do I stand? https://yout... ideas (85%) journal (71%) ideas (92%) ideas (82%)
7 In brain app the scriptures dont show the body of ... projects (88%) projects (86%) actions (95%) projects (90%)
8 Im pretty sure AI opus/sonnet wrote the article, b... ideas (82%) journal (77%) people (95%) ideas (85%)
9 Claude scientific research https://www.anthropic.c... ideas (88%) ideas (67%) ideas (95%) ideas (85%)
10 Study enhancement? See if there are other ways of ... projects (88%) projects (90%) projects (95%) projects (88%)
11 Agent sandbox as a basis for our brain sandbox? ht... ideas (90%) ideas (88%) ideas (95%) ideas (90%)
12 Could I use this for my star trek UIs? https://git... ideas (85%) ideas (80%) ideas (93%) ideas (88%)
13 Bryce physical therapy. Research hippa compliant A... people (85%) actions (84%) actions (95%) people (82%)
14 Look into how they are using github copilot sdk an... ideas (90%) ideas (86%) ideas (95%) ideas (88%)
15 Stripe minions https://stripe.dev/blog/minions-str... ideas (90%) ideas (87%) ideas (95%) ideas (87%)

Detail

Entry 1: Need to pick up some items from the store, milk, hotdogs, and pizza

Current: actions → "Grocery shopping list"

Claude Haiku 4.5: actions (95%) — "Pick up items from store" [groceries, errands] (5152ms)

GPT-5.4 mini: actions (99%) — "Pick up store items" [groceries, errand, shopping] (9108ms)

GPT-5 mini: actions (98%) — "Pick up groceries from store" [groceries, errand, shopping] (18974ms)

Raptor mini: actions (95%) — "Pick up groceries from store" [groceries, errands] (3989ms)


Entry 2: I need to get a birthday present for my mom

Current: actions → "Get birthday present for mom"

Claude Haiku 4.5: actions (92%) — "Get birthday present for mom" [birthday, gift, family] (16249ms)

GPT-5.4 mini: actions (98%) — "Buy mom a birthday present" [gift, birthday, family] (2935ms)

GPT-5 mini: actions (98%) — "Buy mom's birthday present" [gift, family, birthday] (21559ms)

Raptor mini: actions (92%) — "Get birthday present for mom" [birthday, gift, family] (16622ms)


Entry 3: Study the only begotten references

Current: study → "Study the Only Begotten References"

Claude Haiku 4.5: study (88%) — "Study only begotten references" [scripture study, christology, only begotten] (6205ms)

GPT-5.4 mini: study (92%) — "Only Begotten references" [scripture, doctrine, jesus-christ] (4272ms)

GPT-5 mini: study (98%) — "Study only begotten references" [scripture, theology, study] (24940ms)

Raptor mini: study (92%) — "Study only begotten references" [scripture study, christology, only begotten] (6082ms)


Entry 4: Teaching in the saviors way for parents : higbys

Current: study → "Teaching Children in Savior's Way"

Claude Haiku 4.5: people (82%) — "Teaching in Savior's way - Higbys" [church, teaching, family ministry] (5982ms)

GPT-5.4 mini: study (82%) — "Teaching in the Savior's Way" [gospel-learning, parents, teaching] (5087ms)

GPT-5 mini: actions (95%) — "Teach Savior's Way to Higbys" [teaching, parents, Higbys] (21410ms)

Raptor mini: people (72%) — "Higbys: Teaching in Savior's Way" [church, teaching, family ministry] (20524ms)


Entry 5: Squad for ai agent flow, what can we learn? https://github.blog/ai-and-ml/github...

Current: study → "Learning from SQUAD AI Agent Flow"

Claude Haiku 4.5: ideas (90%) — "Learn from Squad AI agent flow" [ai agents, architecture, research] (11199ms)

GPT-5.4 mini: ideas (72%) — "Squad AI agent flow" [ai, agents, workflow] (4901ms)

GPT-5 mini: ideas (95%) — "Squad AI agent flow insights" [ai-agents, github, research] (22049ms)

Raptor mini: ideas (88%) — "Learn from Squad AI agent flow" [ai-agents, github-copilot, research] (19797ms)


Entry 6: Ai jobs and skills, where do I stand? https://youtu.be/4cuT-LKcmWs?si=CqAizAMZy0...

Current: study → "Assessing AI Skills and Career Positioning"

Claude Haiku 4.5: ideas (85%) — "Assess AI jobs and skills standing" [career, ai skills, self-assessment] (14620ms)

GPT-5.4 mini: journal (71%) — "Where I stand in AI" [ai, career, skills] (8849ms)

GPT-5 mini: ideas (92%) — "Assess AI jobs and skills" [ai, career, skills] (22360ms)

Raptor mini: ideas (82%) — "Evaluate my AI job market position" [career, ai-skills, self-assessment] (32972ms)


Entry 7: In brain app the scriptures dont show the body of the memorize items

Current: study → "Scriptures not showing in Brain app"

Claude Haiku 4.5: projects (88%) — "Brain app scripture memorize bug" [bug, brain app, scriptures] (3445ms)

GPT-5.4 mini: projects (86%) — "Fix scripture body display bug" [bug, app, scriptures] (4053ms)

GPT-5 mini: actions (95%) — "Brain app scripture display bug" [bug, brain-app, scripture] (20110ms)

Raptor mini: projects (90%) — "Brain app scripture display bug" [brain-app, bug, scriptures] (4568ms)


Entry 8: Im pretty sure AI opus/sonnet wrote the article, but im also 95% sure he used AI...

Current: people → "Uncertainty about AI's role in content creation"

Claude Haiku 4.5: ideas (82%) — "AI-written article and comment detection" [ai detection, authenticity, observation] (8761ms)

GPT-5.4 mini: journal (77%) — "AI-written article suspicion" [ai, writing, observation] (2865ms)

GPT-5 mini: people (95%) — "Suspect AI-authored article/comments" [ai-detection, authorship, dev.to] (24008ms)

Raptor mini: ideas (85%) — "AI-written article and comment detection" [ai-detection, authenticity, content-creation] (6056ms)


Entry 9: Claude scientific research https://www.anthropic.com/research/long-running-Claud...

Current: projects → "Claude AI Research Project"

Claude Haiku 4.5: ideas (88%) — "Claude long-running scientific research" [ai research, anthropic, claude] (4007ms)

GPT-5.4 mini: ideas (67%) — "Claude research review" [ai, research, claude] (1196ms)

GPT-5 mini: ideas (95%) — "Long-running Claude research" [anthropic, claude, research] (10810ms)

Raptor mini: ideas (85%) — "Claude long-running scientific research" [anthropic, claude, ai-research] (27996ms)


Entry 10: Study enhancement? See if there are other ways of digesting our study materials,...

Current: projects → "Enhance Study Material Digestion System"

Claude Haiku 4.5: projects (88%) — "Enhance study material digestion methods" [brain app, study tools, knowledge management] (5077ms)

GPT-5.4 mini: projects (90%) — "Enhance study digestion" [study, workflow, knowledge-management] (7461ms)

GPT-5 mini: projects (95%) — "Improve study materials digestion" [study, corpus, ingestion] (30090ms)

Raptor mini: projects (88%) — "Enhance study corpus digestion methods" [study-tools, knowledge-management, ai-assisted] (6545ms)


Entry 11: Agent sandbox as a basis for our brain sandbox? https://sandbox.agent-infra.com/...

Current: ideas → "Agent Sandbox for Brain Sandbox Foundation"

Claude Haiku 4.5: ideas (90%) — "Agent sandbox for brain sandbox basis" [brain app, sandbox, ai agents] (4526ms)

GPT-5.4 mini: ideas (88%) — "Agent sandbox for brain sandbox" [sandbox, agents, brain] (4432ms)

GPT-5 mini: ideas (95%) — "Agent sandbox as brain sandbox" [agent-sandbox, brain-sandbox, agent-infra] (19127ms)

Raptor mini: ideas (90%) — "Agent sandbox for brain sandbox basis" [brain-app, sandbox, agent-infra] (5222ms)


Entry 12: Could I use this for my star trek UIs? https://github.com/chenglou/pretext

Current: ideas → "Star Trek UI with Pretext"

Claude Haiku 4.5: ideas (85%) — "Pretext library for Star Trek UIs" [star trek, ui design, tools] (20291ms)

GPT-5.4 mini: ideas (80%) — "Pretext for Star Trek UIs" [ui, design, star-trek] (3498ms)

GPT-5 mini: ideas (93%) — "Pretext for Star Trek UIs" [pretext, ui, github] (10806ms)

Raptor mini: ideas (88%) — "Pretext library for Star Trek UIs" [star-trek, ui-design, tooling] (5474ms)


Entry 13: Bryce physical therapy. Research hippa compliant AI. Github copilot?

Current: projects → "HIPAA-Compliant AI Research"

Claude Haiku 4.5: people (85%) — "Bryce PT - HIPAA compliant AI" [HIPAA, ai tools, physical therapy] (4520ms)

GPT-5.4 mini: actions (84%) — "Research HIPAA-compliant AI" [ai, hipaa, research] (1338ms)

GPT-5 mini: actions (95%) — "Bryce PT and HIPAA AI research" [physical-therapy, hipaa, github-copilot] (30122ms)

Raptor mini: people (82%) — "Bryce PT: HIPAA compliant AI research" [healthcare, hipaa, ai-compliance] (5177ms)


Entry 14: Look into how they are using github copilot sdk and agents! https://github.blog/...

Current: projects → "Research GitHub Copilot Agent-Driven Development"

Claude Haiku 4.5: ideas (90%) — "GitHub Copilot SDK and agents usage" [github copilot, ai agents, sdk] (4268ms)

GPT-5.4 mini: ideas (86%) — "Copilot SDK and agents" [github-copilot, agents, ai] (1258ms)

GPT-5 mini: ideas (95%) — "GitHub Copilot SDK and agents" [github-copilot, agents, sdk] (18984ms)

Raptor mini: ideas (88%) — "GitHub Copilot SDK and agents research" [github-copilot, ai-agents, sdk] (6759ms)


Current: ideas → "Stripe Minions Coding Agents"

Claude Haiku 4.5: ideas (90%) — "Stripe Minions one-shot coding agents" [ai agents, coding agents, stripe] (4051ms)

GPT-5.4 mini: ideas (87%) — "Stripe minions" [stripe, agents, coding] (3668ms)

GPT-5 mini: ideas (95%) — "Stripe Minions one-shot agents" [stripe, ai-agents, minions] (20598ms)

Raptor mini: ideas (87%) — "Stripe Minions one-shot coding agents" [ai-agents, stripe, coding-agents] (5289ms)


Latency

Model Avg Min Max
Claude Haiku 4.5 7890ms 3445ms 20291ms
GPT-5.4 mini 4328ms 1196ms 9108ms
GPT-5 mini 21063ms 10806ms 30122ms
Raptor mini 11538ms 3989ms 32972ms