Reinforcement Learning Environments for Agents

Gyms for GUl agents, web, mobile and desktop

Sample Gyms

Real-World Training Data

Test your agents with our exclusive datasets designed for real-world tasks like financial analysis, customer service, and enterprise workflows.

Airbnb

Real Airbnb listings and walkthrough data.

121
Tasks
2K
Real house data
10K
Video feeds

Instagram

Instagram posts and audience analytics.

100
Tasks
20K
Instagram posts
100K
Real users

LinkedIn

LinkedIn profiles and company records.

110
Tasks
100K
Real profiles
30K
Companies

Expedia

Expedia flights and hotel inventory.

238
Tasks
10K
Flights
10K
Hotels
Private Benchmarks

Evaluate agents on tasks

Evaluate your agents with diversified long-horizon tasks in controllable environments.

Airbnb

Train Travel Booking Agents.

Booking House
Family Vacation Planner
Publish House
Business Trip Planner
Scenic Drive Itinerary

Instagram

Train Social Media Agents.

Publish Photo Post
Weekly Content Scheduler
Reel Trend Optimizer
Pet KOL Daily Monitor
Smart DM Concierge
Verifier Coverage

Smart Validation for Complex Tasks

Combine data-aware and perception-driven validation layers to confirm successful task completion even in complex, multi-step scenarios.
Database Verifier

Database Verifier

For example, when an Agent clicks the 'like' button on a post, a new record is created in the database table; the validator checks the table's data to determine whether the task has been completed.

Fig. 01
UI Verifier

UI Verifier

Determine whether a task is completed by observing changes in the UI — for example, by using Android UI automator to output XML layout files, or by using CUA models such as UI-TARS or Gelato.

Fig. 02
On-Premise

On-Premise Deployment with Full Customization

Install on your own servers with air-gapped security. Modify benchmarks, create custom environments, and keep all data within your infrastructure.
On-Premise Deployment
Fig. 03

Air-gapped runtime

Deploy the container on isolated clusters with encrypted volume mounts and zero outbound traffic.

Customizable stacks

Swap benchmark suites, inject proprietary datasets, and wire your own validators without breaking the core framework.

Enterprise governance

Integrate with SSO, audit logging, and policy engines so every experiment is compliant by design.

Accelerate your RL training with GBOX