Intro
v0.1.0 has been released!
Here's a tweet thread explaining the update in detail: https://twitter.com/cocktailpeanut/status/1635394517615652866
v0.1.0 Goal
- Fixing as many bugs as possible, so the installation succeeds as much as possible
- Making the Web UI more usable
Changelog
Important Fixes
- Deterministic Install: using virtualenv to make sure that everything related to python/pip installs deterministically (Thanks to @EliasVincent #19)
- ALL models work: Previously, the only model that was working was the 7B one. The following cases have all been fixed, and ALL models should work (7B, 13B, 30B, 65B) (Thanks to @rizenfrmtheashes #9)
- If you were getting some gibberish when you queried against the 13B model or others, this was what was happening.
- Or in other cases, the models didn't even install at all to begin with.
- More efficient: Thanks to @marcuswestin #16
- Clone or Pull: installation now tries to clone, or if already exists, pull the latest from the llama.cpp repository
- Only download when file doesn't exist: Previously, installing would always try to re-download everything from the llama-dl CDN. Now the code checks whether the files exist before attempting to download. If a file already exists, it immediately skips to the next step.
- Only create model when it doesn't exist: The model creation steps can be skipped if the model already exists.
- Custom workspace folder: Now dalai supports custom workspace folder. (Previously it was always using the
$HOME
directory. Now you can pass in a custom parameter to point to an existing llama.cpp workspace folder.
API
- ALL customization flags exposed: exposes every configuration attribute supported by llama.cpp
- top_k
- top_p
- n_predict
- seed
- temp
- repeat_last_n
- repeat_penalty
- install() API documented: You can programmatically install
- [NEW] installed() API: an API that returns all the currently installed models
- End marker: Previously it was impossible to know when a streaming response has finished. Now every response session ends with
\n\n<end>
to mark the end. Now you know that the response has finished when you see\n\n<end>
, and even write code to programmatically trigger other actions based on this. (Thanks to @marcuswestin #16)- This feature can be skipped by passing a
skip_end: true
to the request payload.
- This feature can be skipped by passing a
- url: Previously when connecting to a remote dalai server, you would specify it in the constructor (like
new Dalai("ws://localhost:3000")
). But this is not a correct place to take the url as an input since theurl
is only used when a client is making a request to a server. Therefore moved theurl
attribute to therequest()
method.- Now, to make a request to a remote dalai, you can now simply attach a url parameter to your request payload (example:
url: "ws://localhost:3000"
)
- Now, to make a request to a remote dalai, you can now simply attach a url parameter to your request payload (example:
Web UI
- You can now customize all configurations in the dashboard (previously you could only enter the promopt)
- top_k
- top_p
- n_predict
- seed
- temp
- repeat_last_n
- repeat_penalty
- model
Dev experience
Thanks to @marcuswestin #16
./dalai
shell script added: You can clone the repository and run the command locally instead ofnpx dalai
- For example you can locally run the equivalent of
npx dalai install
with~/dalai install
after cloning the repository.
- For example you can locally run the equivalent of
- better log statements and exception handling
How to get this update
This version is 0.1.0
Method 1
You can upgrade by specifying the version:
npx [email protected] llama
Method 2
The reason you need to specify the version is because npx caches the packages. Normally you can just run npx dalai install
but npx seems to cache everything, so you will need to delete the npx cache:
rm -rf ~/.npm/_npx
and then install
npx dalai llama