-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26
base: main
Are you sure you want to change the base?
Conversation
53bd1f6
to
e18b755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is amazing @gohai!
First, as we work on documenting this, a wonderful reference is Let Us Show You How GPT Works by @aatishb. Hi Aatish, tagging you mostly to say hi, but would of course welcome your input!
I added a few comments @gohai. In my experience with teaching the previous charRNN
models with ml5.js, the main pain points were:
- Generating character by character or token by token was very confusing for students and should probably be reserved for more advanced use cases (though great to include in ml5.js if we can?)
- What students often want to do is train their own model, but I was never able to successfully pull this off with all the steps required to train in python then convert to JS. A clear and easy to use colab notebook could absolutely work though!
My other "worry" relates to treading into this territory and providing "out of the box" models. LLMs, as is well documented, have all sorts of bias and other issues. It would wonderful to collaborate with the team members who are working on educational materials and documentation to think about how we guide users to be aware of the limitations and possible dangers of certain models and datasets. The code of conduct can be a good reference as well for considering use cases and applications. This could be a great discussion for the full group! cc @sproutleaf
examples/LanguageModel/sketch.js
Outdated
createCanvas(400, 400); | ||
background(0); | ||
|
||
lm = ml5.languageModel(onModelLoaded); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it would be great to support preload()
.
Also, I wonder if we should require a "string" that references a specific model to load. Even though this is perhaps an extra, unecessary step it emphasizes to the user that this isn't magic, and requires an acknowledgement of the specific model they are using. Some options:
lm = ml5.languageModel("TinyStories", onModelLoaded);
lm = ml5.languageModel("TinyLlamas", onModelLoaded);
Referencing the model name is probably more important, but it's also important to emphasize the specific dataset that was used for training.
examples/LanguageModel/sketch.js
Outdated
let options = { | ||
temperature: 0.9 | ||
}; | ||
lm.generate(prompt, options, onToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea of an onToken()
event, but for beginners it would be incredibly useful to have an API that mirrors the HuggingFace inference API. In this case, you pass a prompt and a desired number of tokens and you get the entire result back. Something along the lines of:
let options = {
temperature: 0.9,
maxTokens: 100
};
lm.generate(prompt, options, gotText);
I might also consider just be the form prompt, maxTokens, callback
and then a JS object with more properties if you wanted to set temperature and other more, maybe:
lm.generate(prompt, 100, gotText);
and then:
let options = {
prompt: "How are you?",
temperature: 0.9,
maxTokens: 100
};
lm.generate(options, gotText);
Though maybe it's good to always have the prompt separatel
@aatishb I so love this article, particularly how it raises the transparency/blackbox-ism issues with OpenAI to a general audience! (Believe that's also at the core for why bother with toy models - not to suggest a similar type of performance is achievable in the browser, but to unpack this technology, and how it got built/trained, as widely as possibly.) Thank you for taking the time to think through this, @shiffman!
|
examples/LanguageModel/sketch.js
Outdated
lm.generate(prompt, options, gotText); | ||
} | ||
|
||
function gotText(out, lm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how simple it is to have just a single string that comes back, but it might match how other models in ml5.js work if we instead of wrap everything inside an object, for example:
function gotText(results) {
console.log(results.text);
console.log(results.words);
}
{
prompt: "How are you",
text: "I am doing fine.",
words: ["I", "am", "doing", "fine"]
}
I could also imagine returning a prompt
property. Sometimes students want to include the prompt in what they display back and sometimes they don't. Keeping track of it in a global variable can be awkward (especially if there are multiple calls to generate()
so having it available in the results can be helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shiffman words
and out
are properties of the instance currently - would passing the instance as the result make sense, or would you feel a "result" object value is nicer, which the user is able to mutate as they like etc? (text
over out
for the literal output?) I'll add prompt
, although currently its strictly the beginning of the response - more of a "prefix" really.
Thank you for your review! I'll try it out
2e5365d
to
347f5b4
Compare
… llama2.c and Emscripten See https://github.com/gohai/llama2.c-emscripten for details.
Besides providing a model name, the user can also pass an object containing the URL to a custom model. In both cases, they're explicit about the model they're exploring. As suggested by @shiffman
…-token As suggested by @shiffman. This makes the most basic example easier to understand.
Previously this sometimes only passed the LanguageModel instance. Instead, always pass the best possible "value" as the first, and the instance as an (optional) second.
This drops all examples that use async/await.
Unsure if calling _{inc,de}crementPreload() manually is the best way to accomplish this. I first tried the p5PreloadHelper from the old repo, but this never made window._preloadCount go above zero for me. (Maybe @ziyuan-linn has some idea?)
…r top-p sampling (used by default) From @karpathy's commit message: Quick note on sampling, the recommendation for good results is to use `-t 1.0 -p 0.9`, i.e. top-p sampling at 0.9 with temperature 1.0 (this is the default). To control the diversity of samples use either the temperature (i.e. vary `-t` between 0 and 1 and keep top-p off with `-p 0`) or the top-p value (i.e. vary `-p` between 0 and 1 and keep `-t 1`), but not both. Nice explainers on LLM sampling strategies include [this](https://peterchng.com/blog/2023/05/02/token-selection-strategies-top-k-top-p-and-temperature/), [this](https://docs.cohere.com/docs/controlling-generation-with-top-k-top-p) or [this](https://huggingface.co/blog/how-to-generate).
… topp This automatically picks reasonable default value for the other parameter, if not explicity specified, and prints a message if both are.
This matches upstream llama2.c, and prevents a confusing message with the basic example, which specifies a temperature (thus disabling the default top-p sampling).
This makes it possible to do inference on toy-models (15M, 42M, 110M parameters) using the Llama2 architecture, as implemented in llama2.c.
The included emscripten build artifacts came from this tree: https://github.com/gohai/llama2.c-emscripten
TODO: