(# ball-and-stick-rl
This repo uses the SAC algorithm to train a robot to balance on top of a rolling sphere while tracking a target velocity. The robot is essentially an inverted pendulum with three omni-directional wheels in contact with the sphere. The agent must control the three motor torques to keep the pendulum upright and track the target velocity.
This repo depends on a fork of MuJoCo which contains a small change to support anisotropic friction for the omni-wheels in contact with the sphere. You'll first need to clone and build the fork, including the python bindings, and then edit the absolute path to the mujoco-3.3.5.tar.gz in the pyproject.toml file.
Install dependencies with poetry
poetry installLaunch training with
./train_sac.shTo visualize a trained model in MuJoCo run
./test_sac.shTo launch the MuJoCoviewer and imported the ball-and-stick, run
./viewer.shThe training metrics are logged to https://wanb.ai
For SAC they look something like this:
The PPO algorithm was also tested but it did not work well (as implemented anyway). The SAC algorithm does learn to balance but seems plateau at a sub-optimal performance level and has difficulty learning to track the target velocity.

