This guide provides instructions on how to build and run the oobabooga text-generation-webui on macOS, specifically on Apple Silicon.
This repository is primarily for oobabooga users at the moment, many of the Python libraries and packages used here may also be used for Data Analytics, Machine Learning and other purposes.
I have a new repository on the way to assist with Apple Silicon M1/M2 and GPU performance VENV builds. This will produce configurable, repeatable, consistent VENV builds for Python packages and modules in all types of layering/stacking and at some point soon, branching builds. This will allow different installation procedures to be compared and evaluated for performance and through regression tests. Getting these consistent, working builds has been a bit difficult as new packages come out all the time and there are many cross-module/package dependencies, some incompatible, and some in conflict.
Latest != Greatest, Latest + Greatest != Best, Stable == None
- Python: Install Python 3.10 using Miniconda. Create a virtual environment and install pip.
- CMake: Install CMake from source to avoid potential issues with universal binaries. This is used for building other software.
- oobabooga Base: Clone the oobabooga GitHub repository and install the Python modules listed in its requirements.txt file.
- Llama for macOS and MPS: Uninstall any existing version of llama-cpp-python, then reinstall it with specific CMake arguments to enable Metal support.
- PyTorch for macOS and MPS: Install PyTorch, torchvision, and torchaudio from the PyTorch Conda channel.
Check out oobabooga macOS Apple Silicon Quick Start for the Impatient for the short method without explanations.
Throughout the process, you're advised to create clones of your Conda environments at various stages. This allows you to easily roll back to a previous state if something goes wrong.
Please note that the guide is incomplete and is expected to be continued.
- Apple Silicon Support for oobabooga text-generation-webui
- 02 Jun 2024 - Rolled back Jinja, should be fine now
- TL;DR
- Building for macOS and Apple Silicon
- Pre-requisites
- Some initial setup
- Get Conda (Miniconda)
- CMake
- Verify we have everything set up for the rest of the build and install
- Clone my oobabooga macOS GitHub Repository
- Pip Install the PyTorch Daily Build
- Llama for macOS and MPS (Metal Performance Shaders)
- NunPy
- CTransformers
- Nearly finished
- Where We Are
- Extensions
This guide is quite comprehensive and covers everything from getting the necessary prerequisites to building and installing all the required components. It also includes a section on how to clone and install the oobabooga repository and its requirements. The guide is still a work in progress and will be updated with more information in the future.
You will likely need the pre-requisites regardless. This document is a work in progress. If you notice anything incorrect, unclear, or outdated, please let me know.
Many people might suggest using Brew, but I am old-school and have been building Open Source before package managers existed. Package managers are both a blessing and a curse. I've had bad experiences with Brew and other package managers that manage Open Source and other source-distributed installations.
-
Advantages of Package Managers:
- Package managers handle dependencies.
- Package managers help keep your system up to date.
- With package managers, everything is pre-configured.
- Package managers automatically provide updates.
-
Disadvantages of Package Managers:
- Package managers handle dependencies, which can sometimes lead to unwanted changes.
- Package managers keep your system up to date, but sometimes you might want to stick with a specific version.
- With package managers, everything is pre-configured, which can limit customization.
- Package managers automatically provide updates, which can sometimes break things.
These points illustrate why package managers can be both good and not so good. If you want maximum control over your environment, build it yourself, document it, write some scripts to help automate the process, and figure out something that works for you. However, building everything yourself comes at a cost: it's time-consuming, you need to keep things up to date (though version inconsistencies can still exist either way), and you need to know what you're doing to debug odd problems during a build.
Building is sometimes the best option, such as when you need special options built in or want to use a version other than what is typically distributed. There are many ways to do this, but I'm going to present one method. While it may not be the best, could probably be improved upon, or there's always another way, this is what we're going to do and hopefully, it's simple enough for anyone to follow the directions. In fact, as I am updating this file, I have completely torn down my build environment (making a backup) and am going to follow the steps through here to validate.
I did it the long way so I could ensure I had the proper versions of libraries and modules which, for many reasons, get overlaid, reverted, or uninstalled, and a different version gets installed from a different repository. Some repositories are better in sync than others, but I tried going to the source for these things. I mention using the --dry-run argument, but sometimes the output is difficult to sift through. I will also explain setting up virtual environments or VENV using Conda.
Before you begin, there are a few things you'll need.
-
iTerm2
This should be the first thing you download.
Download iTerm2 here: https://iterm2.com/downloads.html
If you spend any time on the command line, this is a must-have, unless you're content with Terminal.app. There are many configuration options to explore. PROTIP: Set it up for tabbed windows.
IMPORTANT NOTE: This is a universal application. Before you run it, find the application where you installed it, "Right Click" on it, select "Get Info", and ensure that "Open using Rosetta" is not checked. If it is, iTerm will think it's running on an Intel machine, which can cause problems during software builds.
-
Xcode
You'll need a compiler. While it would be ideal if GCC ran on macOS, Xcode is a sufficient alternative.
You can download Xcode from the App Store.
IMPORTANT NOTE: If you ONLY want the command line tools and not the complete Xcode IDE, you can get just the command line tools for Xcode by running the xcode-select command. Open up the iTerm2 you downloaded and installed earlier or Terminal.app in /Applications/Utilities and get the command line tools for Xcode like this:
xcode-select --install
-
VSCode
Yes, two IDE's, but there are many plugins for VSCode. Unless you're developing macOS or other Apple apps, this is a great IDE. I like it because there's a Vi/Vim mode for it. If you're familiar with the keystrokes for Vi, you're good to go. An integrated terminal means you can run a command line while in the IDE, and it can even do ssh tunneling so you can develop on a remote machine appearing as though everything was local. There are many options, settings, and plugins for this, and finding the right ones may be challenging. However, if you're working with AI or Data Science, you'll likely want this for the Jupyter Notebooks support alone. It also integrates seamlessly with GitHub.
Make sure you get the "Apple Silicon" zip file. The universal and the Intel version caused me problems when I migrated from my Intel Mac to Apple Silicon because it would run in Rosetta, and everything on the system would report that it was running on Intel when using the terminal. Universal could possibly run inside Rosetta as there is an option on some applications, like iTerm, where when you open the "Get Info" for the application, there is an option to run it using Rosetta. Make sure you don't have the universal build and this box is unchecked.
Download VSCode here: https://code.visualstudio.com/Download
Unzip the file wherever your downloads are and copy the application to your preferred location.
-
GNU Coreutils (Optional, but a matter of taste)
This isn't strictly a requirement, but more of a personal preference. You might want to use something like Brew to install this. The problem is that the new ls command that comes with macOS displays directory listings in color, just like GNU ls which comes with most Linux distributions. However, the colors and configuration of your colors for macOS is not compatible with GNU ls, is extremely difficult to configure using the scant information in the man page, but the bottom line is the colors they chose for things gives me a headache, so any screenshots of my terminal will be done using GNU ls for directory listings.
My theory is they are trying to actively discourage people from using the command line.
You will need to have your environment set up for all the following steps to work. These need to be done so your installation will go as smoothly as possible. You may wish to change some of these items for how you like to do things. However they should work for pretty much any non-privileged user.
### These commands are for a bash shell. I tryed switching to Zsh, but I have to much legacy with bash it
### wasted a lot of my time trying to get all my accumulated stuff to work with Zsh.
###
### So, let's make sure we are using a fresh bash shell.
exec bash -l
cd "${HOME}"
### Choose a target directory for everything to be put into, I'm using "${HOME}/projects/ai-projects" You
### may use whatever you wish. These must be exported because we will exec a new login shell later. "Normal" shell variables will not be passed to th enew login shell, we are just setting them up front.
export TARGET_DIR="${HOME}/projects/ai-projects"
### Run this commadn to ad the MACOS_LLAMA_ENV variable to your .bashrc
### we will being it inrothe environment after teh PATH is modified below.
echo 'export MACOS_LLAMA_ENV="macOS-llama-env"' >> ~/.bashrc
### Set a reasonable umask - this controls the default permissions for your files when they are created.
umask 0022
### Create the target directory where we sill be dowloading, building and installing from.
mkdir -p "${TARGET_DIR}"
cd "${TARGET_DIR}"
### Be sure to add ${HOME}/local/bin to your path **Add to your .profile, .bashrc, etc...**
export PATH=${HOME}/local/bin:${PATH}
### Thwe following Sed line will add it permanantly to your .bashrc if it's not already there.
sed -i.bak '
/export PATH=/ {
h; s|$|:${HOME}/local/bin|
}
${
x; /./ { x; q0 }
x; s|.*|export PATH=${HOME}/local/bin:\$PATH|; h
}
/export DYLD_LIBRARY_PATH=/ {
h; s|$|:${HOME}/local/lib|
}
${
x; /./ { x; q0 }
x; s|.*|export DYLD_LIBRARY_PATH=${HOME}/local/lib:\$SYLD_LIBRARY_PATH|; h
}
' ~/.bashrc && source ~/.bashrc
NOTE: If Conda is already installed on your machine, skip this step, but this will also ensure your Conda setup is up-to-date. We're going to skip over the NumPy rebuild here because the llama-cpp-python build will bring NumPy along with it, and the Conda installation of PyTorch also brings along a different NumPy with support libraries in a "hidden" package called "numpy-base".
During this process, be cautious as some libraries require the properly compiled version rather than the version that comes with pip or conda. This is important because some extensions for oobabooga may uninstall perfectly fine versions of libraries and downgrade them due to dependencies. This can lead to performance loss and troubleshooting issues. This has happened to me with NumPy and llama.cpp. My goal here is to pay close attention to the libraries during the construction of the environment for running and managing LLMs using oobabooga. I aim to catch as many potential issues as possible.
One way to avoid conflicts, downgrades, and other issues is to use the "--dry-run" argument. This will show you what it plans to do without actually doing it. The output can be lengthy and you might miss things. As an extra precaution, I clone my virtual environments (VENV), then switch to the new one before making any potentially harmful changes.
cd
mkdir tmp
cd tmp
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o miniconda.sh
# Do a non-destructive Conda install whcih will preserve existing VENV's
sh miniconda.sh -b -u
# Activate the conda environment.
source ${HOME}/miniconda3/bin/activate
# Initialize Conda which will add initialization functions to your shell's profile.
conda init $(basename ${SHELL})
# Update your Conda environment with the latest updates to th ebase environment
conda update -n base -c defaults conda -y
# Grab a new login shell - this will work for any shell aand you wil enter back in the
# tmp directory we just created..
exec bash -l
umask 022
Create a new VENV using Python 3.10. This will serve as your base virtual environment for anything you wish to use with Python 3.10. This is the version you need for running oobabooga. If you have another project, you can always return to the base and build from there. This helps avoid the issue of conflicting versions resulting from using package managers.
#### Create the base Python 3.10 and the llama-env VENV.
conda create -n ${MACOS_LLAMA_ENV} python=3.10 -y
conda activate ${MACOS_LLAMA_ENV}
This gives us a clean environment to return to as a base. I tend to clone my conda VENVs so it's easy to roll back any changes that have negatively impacted my environment. It saves time to be able to roll back to a known good environment and move forward again. These VENVs are useful for rolling back to a known configuration. I recommend cloning your good VENV, activating it, and applying any changes to that. Many packages or updates affect multiple python modules at once, and this is an easy way to roll back and then move forward, creating a new VENV cloned from the previous one. Then, new items are installed into that VENV. When it's working, clone that one, activate it, and do the next round of updates or changes. At any point, VENVs can be completely removed and even renamed. So, you can take your final VENV, if you're happy with it, and rename it back to the base for your application. I will try to do this as I go along in this installation, taking VENV checkpoints which I can roll back to if needed.
Cloning a VENV can also help you quickly determine if a compile, or some other module, provides any performance advantage. I can explain some of these techniques at another time.
You will need the latest version of CMake, at least version 3.29.3. Make sure you have it installed and working. You may already have CMake installed, if you do, skip this step, but verify you are using the proper version. Many dependencies rely on CMake, which is beneficial as it builds based on the original hardware and software configuration of your machine.
You can find it here: https://cmake.org/download/
CMake is easy to install and will be needed for later steps like llama.cpp, llama-cpp-python, and other modules.
Download the latest source version of CMake. Avoid using the packages as they are universal binaries, and you might accidentally end up building something with x86_64 architecture. This is unverified, but it's better to be safe.
A lot of issues surrounding getting all this to work stem from various machines building libraries and packages running universal binaries through Rosetta. This allows them to run on macOS, but not necessarily take advantage of the M1/M2 system on chip and unified memory. I discovered that a number of libraries are universal binaries, which could be an issue. I first noticed this when I was looking in the "Activity Monitor" and was surprised when oobabooga came up running as "Intel". This was a result of my VSCode running using Rosetta.
NOTE: I am using a recent copy of GNU Make, which is a parallelizing make. Apple's make with macOS is an older version of GNU Make - 3.81, so it should be fine as well.
NOTE This will want to install in ${HOME}/local/bin. Alternatively, you could install in /usr/local but you will need administrator access and possibly have to disable Apples SIP (System Integrity Protection), a process I will not go into here as it affects overall system protection. Either way, you will need to make sure that where ever you install it, it is in your path. I Am going to assume ${HOME}/local/bin in these instructions.
NOTE: This will want to install in /usr/local. You may not want it installing there, and there are some special things you may have to do for it to install there. I will update this later with information on how to get it installed in something like ${HOME}/local/bin, which works just fine too, as long as it's in your PATH.
The steps are pretty simple and only take about 5 minutes:
### Clone the CMake repository, build, and install CMake
git clone https://github.com/Kitware/CMake.git
cd CMake
git checkout tags/v3.29.3
mkdir build
cd build
### This will configure the installation of cmake to be in your home directory under local, rather than /usr/local
### This is just preference and will work for a non-privilged user.
../bootstrap --prefix=${HOME}/local
make -j
make -j test
make install
This creates 24 compile jobs. I have 12 cores on my MBP, so I use 2 times cores. This works rather well and builds quickly. Make should parallelize as much as it can based on dependencies.
Make sure we can find the CMAke we installed earlier and make sure we are in the target directory.
### Be sure to add ${HOME}/local/bin to your path **Add to your .profile, .bashrc, etc...**
export PATH=${HOME}/local/bin:${PATH}
### Verify the installation
which cmake # Should say $HOME/local/bin
### Verify you are running cmake v3.30.2
cmake --version
### Change to the target directory.
cd "${TARGET_DIR}"
NOTE THIS IS A DEVELOPMENT BUILD - IT WILL BE PROMOTED TO TEET SOON = I need feeback
At this point, get started setting up oobabooga in your working location, we'll use it later, referring to the requirements.txt to see which Python modules we will need.
Pick a good location for your clone of the project. I have a projects directory with several sub-directories off of it to contain certain projects and source code, but you can pick any place you want. For me, I use ~/.projects/AI as the location where I place anything related to AI. So, open up iTerm and create a location and get the oobabooga text-generation-webui repository.
This will pull clone my repository which has some changes that allow it to run with GPU acceleration. This is unsupported, by the oobabooga people, but I will try to keep my information as up-to-date as possible along with merging code into the repository on a regular basis.
## Get my oobabooga and checkout macOS-dev branch
git clone https://github.com/unixwzrd/text-generation-webui-macos.git textgen-macOS
cd textgen-macOS
git checkout macOS-dev
pip install -r requirements.txt
The one loaded with the requirements for oobabooga is not compiled for MPS (Metal Performance Shaders) installed from PyPi at this time. It is also probably best to build your own anyway.
You're going to need the llama library and the Python module for it. You should recompile it, and I have validated that my build using OpenBLAS. I will also add instructions later for building a stand-alone llama.cpp which can run by itself. This is handy in case you don't want the entire UI running, you want to use it for testing, or you only need the stand-alone version.
The application llama.cpp compiles with MPS support. I'm not sure if the cmake configuration takes care of it in the llama-cpp repository build, but the flag -DLLAMA_METAL=on is required here. When I compiled llama-cpp in order to compare its performance to the llama-cpp-python. I didn’t have to specify any flags and it just built right out of the box. This could have been due to the configuration of CMake as it thoroughly probes the system for its installed software and capabilities in order to make decisions when it creates the makefile. It is required in this case.
## llamacpp-python
CMAKE_ARGS="-DLLAMA_METAL=on" \
FORCE_CMAKE=1 \
PATH=/usr/local/bin:$PATH \
pip install llama-cpp-python==0.2.90 --force-reinstall --no-cache --no-binary :all: --compile --no-deps --no-build-isolation
This will install the latest PyTorch optimized for Apple Silicon.
## Pip install from daily build
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu --force-reinstall --no-deps
Now, at this point, we have everything we need to run the basic server with no extensions. However, we should have a look at the llama.cpp and llama-cpp-python as we may need to build them ourselves.
NumPy is finally supporting Apple Silicon. You will have to compile it on install. Many packages I've found want to install their preferred version of NumPy or other NumPy support libraries. This will likely uninstall your NumPy in your VENV. You should be on the lookout when you install anything new that it does not overlay your NumPy with a previous version or a different installation of the current version.
CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" pip install numpy==1.26.* --force-reinstall --no-deps --no-cache --no-binary :all: --compile -Csetup-args=-Dblas=accelerate -Csetup-args=-Dlapack=accelerate -Csetup-args=-Duse-ilp64=true
I include this one, but haven't tested it and it's unclear is it works properly on macOS.
CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" CT_METAL=1 pip install ctransformers --no-binary :all: --no-deps --compile --force-reinstall
While there are more advanced instructions in the "QuickStart" guide, basically you are now finished. In the command window you are in, you can set the preferred VENV to use, and the start options for the webui. This creates a short script for starting the webui from the command line or you may open Finder to the location you gave installed and simply double click on the file "start-webui.sh and it should run in a terminal window. The options may be edited later in the start-webui.sh created here.
# Add any startup options you wich to this here:
START_OPTIONS=""
#START_OPTIONS="--verbose "
#START_OPTIONS="--verbose --listen"
cat <<_EOT_
#!/bin/bash
# >>> conda initialize >>>
__conda_setup="$('${HOME}/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "${HOME}/miniconda3/etc/profile.d/conda.sh" ]; then
. "${HOME}/miniconda3/etc/profile.d/conda.sh"
else
export PATH="${HOME}/miniconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
cd "${TARGET_DIR}/textgen-macOS"
conda activate ${PREFERRED_VENV}
python server.py ${START_OPTIONS}
_EOT_ > start-webui.sh
chmod +x start-webui.sh
This is a lot to cover, but there are more modules which get mis-installed and need to be repaired, re-installed, or built from source. This package has a lot of modules and a lot of dependencies, so expect breakage from time to time. Making checkpoints for rollback along the way will help a lot if you get a bad module, you won't have to destroy your whole VENV or figure out which modules need to be uninstalled and re-installed.
Once you feel comfortable with your checkpoints and working VENV, you can remove some of the ones you aren't using and this will improve Conda's performance.
At his point, LLaMA models should start up just fine as long as they are GGUF formatted models and you should see a noticeable performance improvement. Put as many GPU layers as you possibly can and set the threads at a reasonable number like 8.
Some other numbers, and parameters of note which I have verified through testing. If you have others, please let me know and I'll have a look at them and add them is they work well.
Parameter | Value |
---|---|
n_gpu_layers | Set this to the number of n_layers in the output of llama.cpp when it starts |
mlock | Set this on, this will pin the memory so it doesn't get paged out or compressed |
n_batch | the number of batches for each iteration, if someone has guidance for this, please let me know. |
no_mmap | I use this because with mlock set, there should be no need to reference the model as a file. |
There are a number of extensions you can use with oobabooga textgen, but som break other things with their requirements. At this point, here are the extensions I have had no problems with so far:
All the extensions should work with this version
-
Elevenlabs
I've included this one because i use it, but am working on other TTS options which will run locally, like AllTalk and other Coqui-based solutions.
-
Silero
These are both for TTS, Text To Speech, one relies on ElevenLabs to generate the speech, and the other runs locally using a torch speech model. I'd be interested in discovering other TTS packages available which could be hosted locally without relying on th eInternet.
-
Whisper
This is the speech to text utility, modules or libraries from OpenAI, which I believe may be hosted locally as the model it uses is fairly small, though I haven't had a chance to check it out yet.
The issue with Whisper is that it requires some older Python packages which will cause NumPy to be downgraded and there you have a problem. Hopefully this will be sorted out soon.