Pokemon Emerald

1st Place — NeurIPS 2025 PokéAgent Challenge

Solving a 40-Minute Game with RL

1st Place Solution for NeurIPS 2025 Pokémon Emerald Speedrunning Challenge

Junik Bae, Seoul National University

Agent Demonstrations

Watch our trained agent speedrun through Pokémon Emerald. The agent operates purely on raw pixels without any VLM at inference time.

ROUTE101_TO_OLDALE — RL agent discovers efficient routing and autonomously selects RUN to skip wild battles (2x speed)

EXIT_BIRCH_LAB — Agent learns to skip the Pokémon nickname input UI through emergent behavior (2x speed)

The Challenge

The NeurIPS 2025 PokéAgent Challenge asks: Can we build an AI agent that speedruns Pokémon Emerald?

This is a uniquely difficult problem for two reasons:

Why Existing Approaches Fall Short

LLM/VLM Agents

  • Can generate high-level plans and understand game context
  • Often suboptimal at low-level control
  • High inference latency makes real-time play impractical

RL Agents

  • Struggle with long-horizon tasks and sparse rewards
  • Can achieve near-optimal performance with sufficient training
  • Fast inference enables real-time play

Can we combine the strengths of both?

Our Approach: VLM Code Expert + Expert-Guided RL

Overview of Scripted Policy Distillation (SPD)
Overview of Scripted Policy Distillation (SPD). Our approach consists of three stages.

(1) Subgoal Generation: Given a long-horizon task specification, an LLM decomposes the task into sequential subgoals, each paired with an executable success-condition function success_cond(state) that determines task completion.

(2) Scripted Policy Generation: For each subgoal, the LLM generates a scripted policy that maps states to actions. The policy can invoke a VLM tool (extract_feature) to parse visual information not available in the structured state, and uses logging statements (print(log)) to record execution traces for later analysis. The policy interacts with the environment until success_cond returns true or a timeout occurs. On failure, the LLM analyzes the logged traces and revises either the policy code or the subgoal specification.

(3) Script-Guided RL: Once all scripted policies succeed reliably, we distill them into neural network policies via supervised learning on expert trajectories, followed by reinforcement learning with expert action guidance. The resulting neural policy exhibits more efficient behavior than policies trained without distillation.

Results

NeurIPS 2025 PokéAgent Challenge Leaderboard

Rank Team Method Time to First Gym
🥇 1st Ours VLM Code Expert + Expert-Guided RL 40:13
🥈 2nd Hamburg PokeRunners PPO with recurrent network 01:14:43
🥉 3rd anthonys Tool-Calling VLM Policy 01:29:17

Quantitative Analysis

Expert-guided RL significantly outperforms both naive RL and expert-only baselines:

Milestone Naive RL Expert-only Expert-guided RL
LITTLEROOT_TO_ROUTE101 timeout 90.15 steps (±33.7) 55.75 steps (±12.9)
EXIT_BIRCH_LAB timeout 64.40 steps (±1.0) 56.35 steps (±1.1)

Emergent Behaviors

The RL agent discovered strategies not explicitly encoded in the expert code:

Summary

We present a knowledge-based, expert-guided reinforcement learning approach for playing Pokémon Emerald. Our method externalizes the game knowledge of a Vision-Language Model (VLM) into Python code expert policies, which serve as teachers for training pixel-based neural network agents.

The trained agent ranked 1st place on the NeurIPS 2025 PokéAgent Challenge Speedrun track, achieving the first gym (Roxanne) in 40 minutes and 13 seconds — without complex reward engineering or large-scale human demonstrations.