โ Back to ToolGen
๐ Agent Benchmark Suite
Benchmark AI agent performance. Compare agents on coding, reasoning, and language tasks.
Select Agents to Compare
GPT-4o
Claude 3.5
Gemini 1.5 Pro
DeepSeek V3
o1-preview
All Categories
Coding
Reasoning
Language
Math
5 tests
10 tests
20 tests
Detailed Report
Summary Only
โก Run Benchmark