Evaluate AI with Real Financial Tasks
Professional-grade benchmark for AI agents handling complex, real-world financial scenarios. Expert-crafted tasks, deterministic scoring, and simulated financial environments.
Expert-Crafted Tasks
500+
Financial Domains
8
Overall Score
78.1
Real-World Sourced
100%
How Top Models Perform
100%
80%
60%
40%
20%
Full-Stack Evaluation Infrastructure
01
Evaluation Engine
02
Mock Environments
03
Data Foundation
04
Expert Layer
Eight Dimensions of Financial Competence
Financial Planning
Retirement, education funding, cash flow projections
50 tasksPortfolio Analysis
Asset allocation, rebalancing, risk optimization
50 tasksStock Research
Fundamental analysis, valuation models
50 tasksCompliance & Tax
Tax-loss harvesting, cross-border filing
50 tasksCredit & Mortgage
Loan comparison, amortization, refinancing
50 tasksInsurance Analysis
Coverage evaluation, premium comparison
50 tasksMacro & ETF
Economic indicators, sector rotation, ETF selection
50 tasksCross-Border Finance
FX hedging, multi-currency planning
50 tasksHow We Score

45pts
Calculation Accuracy
DETERMINISTIC
- Numeric range validation with tolerance bands
- Three-state: correct (+pts), missing (0), wrong (−pts)
- CSV metric coverage with thresholds
- Per-metric importance weighting
- Format normalization (%, bps, currency)
30pts
Semantic Reasoning
LLM-ASSISTED
- Per-goal pass/fail against golden answers
- Weighted goal aggregation by importance
- Methodology verification (not just answer)
- Reasoning chain quality assessment
- Cross-validated by multiple judge models
15pts
Evidence & Sources
DETERMINISTIC
- Citation verification against source documents
- Page-level reference accuracy
- Tool output binding (correct API calls)
- Evidence completeness and relevance
- No hallucinated sources penalty
10pts
Delivery Format
DETERMINISTIC
- JSON schema validation for structured output
- Required artifact completeness check
- CSV column & data type verification
- File naming and structure compliance
- Professional formatting standards
What a Real Task Looks Like
Fund Investment Risk Assessment
- assumptions.txt
- attachment_1.txt
- attachment_2.xlsx
- evidence_manifest.json
- portfolio_risk_analysis.xlsx
- risk_assessment_report.md
- risk_matrix.xlsx
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100=== Investment Application Overview === Fund A: COSCO Mining Acquisition Fund - Size: 80 billion CNY (20 billion equity + 60 billion debt) - Strategy: Mining resource acquisitions - GP: COSCO Mining (state-owned enterprise) - Proposed commitment: 2 billion CNY Fund B: Frontier Strategic Emerging Industries PE Fund - Size: 630 million CNY - Strategy: Strategic emerging industries (semiconductors, new energy, advanced manufacturing) - GP: Frontier Capital - Proposed commitment: 600 million CNY === Insurance Fund Investment Restrictions === - Single fund investment shall not exceed 30% of fund size - Total alternative investments shall not exceed 25% of insurance fund's available balance - Investment terms must match insurance liability duration
Ready to Benchmark Your Agent?
Access the public dataset, run your agent against real financial tasks, and see where it stands on the leaderboard.