Simple text #528

dmahurin · 2024-06-26T00:03:35Z

These changes add support for training with tinyshakesphere (change from llama2.py), and simple blank line separated text.

xpww · 2024-07-05T08:23:12Z

Hello! Excuse me, I wrote a tinytext.txt of about dozens of lines.
When I used

python tinyshakespeare.py 
pretokenize and python train.py --dataset=tinyshakespeare

, the following error occurred:

assert num_batches > 0, "this split is way too small? investigate."

I just started to use llm. llama2.c can make it run on my own computer, but I don't have enough basic knowledge to quickly start training a large model of my own.

Could you please provide me with an example of a related tinytext.txt file? Thank you very much!

wlamond and others added 3 commits June 25, 2024 16:45

Add tinyshakespeare dataset

909a7e4

Update tinyshakespeare.py from llama2.py for current llama2.c

9f3f0f6

Add support for simple blank line delimited text data set

4fac82d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simple text #528

Simple text #528

Uh oh!

dmahurin commented Jun 26, 2024

Uh oh!

xpww commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Simple text #528

Are you sure you want to change the base?

Simple text #528

Uh oh!

Conversation

dmahurin commented Jun 26, 2024

Uh oh!

xpww commented Jul 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants