Welcome to ACM AI's Fall 2025 Competition, Blockography.AI! In this competition, you will train your models to classify Minecraft biomes from real gameplay images. Compete against your peers to win Qualcomm Rubik Pi 3's, and get a chance to network with Qualcomm engineers!
Important
How to get help
There are a few ways to get help, so you can pick the way that works best for you:
- Discord: You can reach us on Discord in the #ai channel.
- In person: You can find us in the Fishbowl (room B225 in the basement of the CSE building). We'll also be periodically walking around.
- 10 am - 11 am: check-in
- 12 pm or 1 pm: lunch
- 5 pm: Submission ends
- 5 pm - 6 pm Qualcomm social time
- 6 pm: announce winners
- Overview
- Competition Timeline
- Repo Structure
- Installation
- Instruction
- Submission and Evaluation
- Competition Rules
- FAQ
- Resources
blockography-ai/
│
├── assets/ # logo
│
├── outputs/ # Storing outputs for your models
│ └── sample_submission.csv
│
├── models/ # Model training notebooks
│ ├── rf.ipynb # Random Forest baseline
│ ├── ridge.ipynb # Ridge Regression baseline
│ └── xgboost.ipynb # XGBoost baseline
│
├── userkits/ # Utility code and analysis tools
│ ├── EDA.ipynb # Exploratory Data Analysis notebook
│ ├── features.py # Feature extraction and transformation functions
│ ├── utils.py # Helper functions for preprocessing and evaluation
│
├── train_data/ # Training data (download from Google Drive)
│ ├── bamboo_jungle/ # Example biome folder containing several images
│ │ ├── <file_name>.jpg # Image files for training
│ │ └── ...
│ └── ... # Other biome folders with several images each
│
├── eval_data/ # Evaluation data (download from Google Drive)
│ ├── <file_name>.jpg # Several image files for evaluation
│ └── ...
│
├── .gitignore # Files and directories to ignore in version control
├── environment.yml # Conda environment setup file
├── requirements.txt # Pip dependency list (optional)
└── README.md # Main documentation
In this section, you'll install Git, Miniconda, and a code editor. If you already have these tools installed, you can skip to the Clone Repo and Create Environment section.
Please make sure you've installed the followign tools:
- Git, which should already be installed on your system if you use MacOS or Linux. If you're on Windows, you can download it from https://git-scm.com/download/win, or use Windows Subsystem for Linux (WSL).
- A code editor. We strongly recommend using Visual Studio Code, but you can also use other code editors.
- Miniconda, which is a package that will install Python and other tools that you will need for this competition.
First you need to install miniconda from https://www.anaconda.com/download to create environment. After logging in, choose "Miniconda Installers" and choose the one that matches your operating system (MacOS/Windows/Linux). Click the file you just download and follow the setup wizard. Remember to choose “Add Miniconda to my PATH environment variable.” Next, restart the terminal.
Note: you may need to run conda init to finish setup.
To clone this repository, open your git bash or terminal and type in the following command:
git clone https://github.com/acmucsd/blockography-ai.git
cd blockography-aiCreate environment and install required package:
conda env create -f environment.yml
conda activate ai-comp-env# use pip
pip install numpy
# or use conda
conda install numpyIf you have never used Git in Visual Studio Code, read through this documentation: https://code.visualstudio.com/docs/sourcecontrol/intro-to-git
After using VS Code to open the repository, open rf.ipynb, click "Select Kernel" on the top right. If your VS Code doesn't have Python and Jupyter extensions, click "Install/Enable suggested extensions." After the extensions are installed, click "Select Kernel" again, and choose "Python Environments" and then "ai-comp-dev" on the top command bar. After the kernel is set, run the first code block. If it runs successfully, you are good to go!
You need to download data from Google Drive here:
- train_data: https://drive.google.com/drive/folders/12ZZKnWFAvVhDVx37DvqizEtQ19LURluD?usp=sharing
- eval_data: https://drive.google.com/drive/folders/1Ehg7rtya5xwpzxG4q68F2AT6DVBjRhEo
You then need to unzip the download zip file and extract it in the root of this directory (e.g. /Users/<username>/blockography-ai in MacOS). Make sure train_data and eval_data are stored in the root level or you'll have path issue when you run the notebooks.
In this competition, we expect you to mainly work on finding meaningful features to improve the performance of your model. We have provided a list of possible features in (userkits/features.py), and you should look over these features and understand why they might be helpful to classify minecraft biomes. You are also welcome to design your custom features.
We have provided with starter notebooks using classic machine learning models, such as random forest classifier, where you can change the features used in your model. You can also tune the hyperparameters of your model to achieve better accuracy.
While we primarily encourage you to focus on feature engineering and 'classical' machine learning models, we understand that deep learning models can also be used to solve this problem. You're welcome to use other models including deep learning models like convolutional neural networks (CNN) and multi-layer perceptrons (MLP).
If you'd like to use deep learning models, we recommend using PyTorch, as we've provided a dataset and starter code.
Few things to note:
-
We've provided a PyTorch dataset, which you can import as such:
from userkits.torch_dataset import MinecraftTorchDataset
-
To help with this, we've also provided a rough "scaffold" notebook that you can use as a starting point (see
userkits/pytorch.ipynb) -
However, unlike the starter notebooks for classical machine learning, you'll need to implement the model yourself.
-
We may not be able to help you with the setup of dependencies and implementations.
We split the dataset into two: train data and eval data. For the train data, you are given the features and the true labels (in \train_data). For the eval data (in \eval_data) you are only given the features but no labels. You are supposed to train your model using the train data and submit the prediction of the eval data.
To submit your prediction, run the final block (submission) in the starter notebooks. It will store the results of your prediction in a CSV file at the path specified (you can change this too). You need to name the file as submission.csv exactly in order to successfully submit your prediction. Next, go to https://ai.acmucsd.com/portal to manually upload the CSV file. Please note that the submission portal will be closed at 5 pm.
On the website portal, you will see the public score of your submission, which is calculated using 50% of the test data. The final ranking will be based on the other 50% data (private score).
- You are welcome to use any model for this competition as long as you can explain the algorithm or logic behind your solution.
- Winners will be interviewed to discuss their approach and solution before prizes are awarded.
- The use of LLM is permitted, but participants are responsible for reviewing and testing all submitted code.
- Any attempt to cheat, including hacking, exploiting the evaluation server, or other dishonest behavior, will result in disqualification.
Q: Where can I see the current leaderboard?
A: You can go to https://ai.acmucsd.com/portal to see your submissions and the current leaderboard.
Q: What if I have questions related to the competition?
A: Go to our discord server ACM AI @ UCSD and find the channel for blockagraphy.ai/q-and-a. You can also directly reach out to our staff, but we may not help you with your solution.
Are you a beginner to Python? Check out these tutorials: https://www.w3schools.com/python/
- Feature Engineering for Image Classification (you need to login to Medium)
- ACM AI School #1 Slide
What is feature engineering?
Classical machine learning focuses on algorithms like logistic regression, support vector machines, and decision trees. These require manual feature selection, or extracting features from data to use in training a model. Feature selection and feature engineering is part of preprocessing (cleaning, organizing, and transforming raw data into a clean and structured format that a model can use).
Choosing the right features can have a huge impact on model accuracy. For example, let’s say you want to identify the breed of dogs. The features you are told in advance to look for are the number of legs each dog has, the color of its nose, and the sound of its bark. The last two features might be a little helpful, but even with a great classification model you’re not really going to get anywhere. Some better features could be coat color, average size/weight, or temperament. With strong features like these, recognizing certain breeds becomes much easier!
Now imagine if you had to keep track of all 6 aforementioned features, and also average max speed, eye color, type of coat, snout shape, number of teeth, number of eyes, sound of bark… at a certain point, it becomes difficult and taxing to keep track of all of these traits. Feature selection is when you select only the most important features and prune out the rest to improve performance.
For image classification, we can build features from the pixels in our images. A grayscale image (black and white) can be represented as a 2D array of numbers (pixel brightness). A color image has 3 channels (Red, Green, Blue), so it’s a 3D array (height × width × 3).
We have provided you with some features in userkits/features.py. Get creative and figure out what combination of features work best!
