Pokemon Emerald

1st Place — NeurIPS 2025 PokéAgent Challenge

Solving a 40-Minute Game with RL

1st Place Solution for NeurIPS 2025 Pokémon Emerald Speedrunning Challenge

Junik Bae, Seoul National University

Agent Demonstrations

Watch our trained agent speedrun through Pokémon Emerald. The agent operates purely on raw pixels without any VLM at inference time.

ROUTE101_TO_OLDALE — RL agent discovers efficient routing and autonomously selects RUN to skip wild battles (2x speed)

EXIT_BIRCH_LAB — Agent learns to skip the Pokémon nickname input UI through emergent behavior (2x speed)

The Challenge

The NeurIPS 2025 PokéAgent Challenge asks: Can we build an AI agent that speedruns Pokémon Emerald?

This is a uniquely difficult problem for two reasons:

Why Existing Approaches Fall Short

LLM/VLM Agents

  • Can generate high-level plans and understand game context
  • Often suboptimal at low-level control
  • High inference latency makes real-time play impractical

RL Agents

  • Struggle with long-horizon tasks and sparse rewards
  • Can achieve near-optimal performance with sufficient training
  • Fast inference enables real-time play

Can we combine the strengths of both?

Our Approach: VLM Code Expert + Expert-Guided RL

We propose a pipeline that uses VLMs as teachers rather than players:

  1. Milestone Decomposition — Break the long-horizon game into manageable subgoals (e.g., "exit house", "reach Route 101", "defeat Roxanne")
  2. VLM Code Expert Generation — For each milestone, prompt a VLM (GPT-4o, Gemini 2.5) to generate Python code that solves the task using game state information
  3. Expert-Guided Policy Learning — Use the code experts to guide RL training:
    • DAgger — Collect expert demonstrations on the learner's state distribution
    • Double DQN — Reinforce successful behaviors and improve beyond the expert baseline
  4. Pure Pixel Policy — The final agent is a CNN that maps raw pixels → actions, with no VLM at inference time

This approach combines VLM's game knowledge with RL's ability to optimize for speed — without the latency cost of running a VLM during gameplay.

Results

NeurIPS 2025 PokéAgent Challenge Leaderboard

Rank Team Method Time to First Gym
🥇 1st Ours VLM Code Expert + Expert-Guided RL 40:13
🥈 2nd Hamburg PokeRunners PPO with recurrent network 01:14:43
🥉 3rd anthonys Tool-Calling VLM Policy 01:29:17

Quantitative Analysis

Expert-guided RL significantly outperforms both naive RL and expert-only baselines:

Milestone Naive RL Expert-only Expert-guided RL
LITTLEROOT_TO_ROUTE101 timeout 90.15 steps (±33.7) 55.75 steps (±12.9)
EXIT_BIRCH_LAB timeout 64.40 steps (±1.0) 56.35 steps (±1.1)

Emergent Behaviors

The RL agent discovered strategies not explicitly encoded in the expert code:

Summary

We present a knowledge-based, expert-guided reinforcement learning approach for playing Pokémon Emerald. Our method externalizes the game knowledge of a Vision-Language Model (VLM) into Python code expert policies, which serve as teachers for training pixel-based neural network agents.

The trained agent ranked 1st place on the NeurIPS 2025 PokéAgent Challenge Speedrun track, achieving the first gym (Roxanne) in 40 minutes and 13 seconds — without complex reward engineering or large-scale human demonstrations.