Skip to content

Conversation

dmahurin
Copy link

These changes add support for training with tinyshakesphere (change from llama2.py), and simple blank line separated text.

@xpww
Copy link

xpww commented Jul 5, 2024

Hello! Excuse me, I wrote a tinytext.txt of about dozens of lines.
When I used

python tinyshakespeare.py 
pretokenize and python train.py --dataset=tinyshakespeare

, the following error occurred:

assert num_batches > 0, "this split is way too small? investigate." 

I just started to use llm. llama2.c can make it run on my own computer, but I don't have enough basic knowledge to quickly start training a large model of my own.

Could you please provide me with an example of a related tinytext.txt file? Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants