Craftax Demo
Create an AI agent that plays the Craftax game, and evaluate its shortcomings using the Synth platform
Overview
This tutorial demonstrates how to evaluate an LLM agent’s shortcomings at playing Craftax, a Minecraft-inspired game environment, using the Synth SDK and platform. The agent uses a ReAct (Reasoning + Action) approach to make decisions and interact with the game world.
Game Rules and Actions
We’ll use the CraftaxLM to render the game in a text format that the LLM can engage with.
The LLM’s state and surroundings are rendered in the prompt, and it’s able to take between 1 and 8 consecutive actions like
- Basic movement (up, down, left, right)
- Resource gathering and crafting
- Combat and tool usage
- Building and construction
to make headway each step. Because it uses the Re-Act framework, the agent definition is rather simple:
Configuration
We can configure what model underlies the agent, how long to give the agent before cutting off the trajectory, and the number of agents to run at once via the config. In this example, we’ll run the agent a handful of times to help the Synth platform identify some common failures.
We run a batch of agent episodes and give the Synth platform time to analyze them. Soon enough, we can find breakdowns of each trajectory
along with an analysis of which errors plague our agent most
The agent seems to consistently struggle with obtaining wood, although it often figures it out eventually!
For the complete implementation, including game rules, agent logic, and configuration options, check out the full source code:
And follow the walkthrough here