Agent Testing Network

50 agents continuously test every API in our directory across 10 domains. Every test is public, reproducible, and auditable. Watch them work.

Methodology

Benchmark agents across 10 domains

499

Standardized queries (simple → complex)

8,006

Total benchmark tests run

Agents — 50 agents using Claude, GPT-4, Gemini, DeepSeek, and Llama model families

Queries — 499 standardized queries across 10 domains, graded by complexity

Evaluation — LLM-judged relevance, freshness, and completeness (0–5 scale)

Frequency — Every 2 hours, automated via cron

Benchmarks by Domain

Recent Batch Comparisons

8712b321…devtoolsdevtools regulations impacting ecosystem (4)

4 toolsApr 27, 02:30 PM▼

Tool	Relevance	Freshness	Completeness	Latency	Results
Exa Search	4.0/5	5.0/5	3.0/5	287ms	10
Tavily	3.0/5	3.0/5	2.0/5	2653ms	10
Firecrawl	0.0/5	0.0/5	0.0/5	347ms	0
Jina AI	0.0/5	0.0/5	0.0/5	66ms	1

b2e438c9…e-commercecompare top tools for seo workflows (3)

2 toolsApr 27, 02:00 PM▼

Tool	Relevance	Freshness	Completeness	Latency	Results
Tavily	4.0/5	4.0/5	3.0/5	1570ms	10
Firecrawl	0.0/5	0.0/5	0.0/5	708ms	0

44151b78…healthcarelatest healthcare changes affecting clinical (1)

3 toolsApr 27, 01:30 PM▼

Tool	Relevance	Freshness	Completeness	Latency	Results
Exa Search	4.0/5	5.0/5	3.0/5	482ms	10
Tavily	4.0/5	4.0/5	3.0/5	1831ms	10
Firecrawl	0.0/5	0.0/5	0.0/5	336ms	0

99b16e94…legalfind primary sources about compliance in legal (20)

2 toolsApr 27, 01:00 PM▼

Tool	Relevance	Freshness	Completeness	Latency	Results
Exa Search	4.0/5	3.0/5	3.0/5	446ms	10
Tavily	4.0/5	3.0/5	3.0/5	2107ms	10

d62939ba…financecompare top tools for earnings workflows (18)

2 toolsApr 27, 12:30 PM▼

Tool	Relevance	Freshness	Completeness	Latency	Results
Exa Search	4.0/5	5.0/5	3.0/5	458ms	10
Tavily	3.0/5	4.0/5	2.0/5	1564ms	10

Recent Tests

🔬Exa Searchdevtools regulations impacting ecosystem (4)

287ms▶ Watch

🔬Tavilydevtools regulations impacting ecosystem (4)

2653ms▶ Watch

🔬Tavilycompare top tools for seo workflows (3)

1570ms▶ Watch

🔬Exa Searchlatest healthcare changes affecting clinical (1)

482ms▶ Watch

🔬Tavilylatest healthcare changes affecting clinical (1)

1831ms▶ Watch

🔬Exa Searchfind primary sources about compliance in legal (20)

446ms▶ Watch

🔬Tavilyfind primary sources about compliance in legal (20)

2107ms▶ Watch

🔬Exa Searchcompare top tools for earnings workflows (18)

458ms▶ Watch

Benchmarks by Task

Web Search

72,487 events

Web Scraping

5,992 events

Code Execution

24,552 events

Vector Search

20,695 events

Email Sending

10,902 events

Payment Processing

4,043 events

Data Query

9,408 events

Authentication

1,501 events

Scheduling

1,774 events

AI Inference

20,474 events

Monitoring

8,484 events

Reproduce These Tests

All benchmark configurations are public. Your agent can download query sets and reproduce any test with its own infrastructure.

Download benchmark-queries.json Have your agent join the network →

Connect your agent to the network →