Croissant Game Research Project

This repo was created for a research project. It contains a simple text game and scripts to play the game via AI models in two different ways. See the blog post for more about the research and results.

Feel free to contact me with questions.

The Croissant Game

This simple game involves investing money to buy croissants. There are two versions: the regular one, and one with a "stash" function that allows you to save money for future games.

If you would like to play, examine, or modify the game itself, there are three relevant files:

play_console.py, which contains the console interface for the game. Run it with no arguments to play the regular version. Run with the --enable_stash flag for the version with the stash enabled.
game_model.py, which contains model code for running the game, but no interface. This file is shared with the AI scripts.
game_config.toml, which contains all tweakable numbers for the game in a convenient config file.

Training an RL model with PPO

The first method of playing with AI uses PPO to train a Reinforcement Learning model on the Croissant Game.

Required libraries: torch torchrl tensordict gymnasium<1.0 numpy.

Run ai_train.py to train a model. It will take its hyperparameters and whether to enable the stash from hyperparams.toml. It will save the model checkpoint to the checkpoints/ folder using torch.save. The gymnasium environment for the game is in game_env.py.

To run a previously-trained model, use ai_play.py and specify the checkpoint file with the --model flag.

Use the flag --enable_stash to play the game with the stash enabled. This must match the setting the model was trained with.
Use the flag --iterations to specify how many games the model should play in a row.
Use the flag --deterministic if you would like the model to always play the action with the highest probability, rather than sampling from the action probabilities.

Prompting Claude to play the game

The second method of playing with AI uses the Anthropic API to prompt Claude for an action with the game state. It uses the specific model claude-3-5-sonnet-20241022.

Required libraries: anthropic.

All code for this method is in claude_play.py. By default, it calls a fake API defined in fake_anthropic_api.py for testing purposes. Use the --real_api flag to send messages to the Anthropic server. The API key is read from environment variables.

The message log, final score, and temperature parameter are saved as JSON to the logs/ folder.

Claude currently only plays the regular version of the game. The game aborts if Claude returns an illegal or invalid action (a log file is still written). Enable chain-of-thought prompting with the --thoughts flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Croissant Game Research Project

The Croissant Game

Training an RL model with PPO

Prompting Claude to play the game

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_play.py		ai_play.py
ai_train.py		ai_train.py
claude_play.py		claude_play.py
fake_anthropic_client.py		fake_anthropic_client.py
game_config.toml		game_config.toml
game_env.py		game_env.py
game_model.py		game_model.py
hyperparams.toml		hyperparams.toml
play_console.py		play_console.py

License

bunny-baxter/croissant-game

Folders and files

Latest commit

History

Repository files navigation

Croissant Game Research Project

The Croissant Game

Training an RL model with PPO

Prompting Claude to play the game

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages