Harcoded incorrect (and repeated) validation example #796

DavidGOrtega · 2023-12-06T17:47:11Z

👋 I have found this code to be surprisingly incorrect as my dataset has nothing to do with this validation example. Its also repeated in every single finetuning script.

# produce an example:
    instruction = "Recommend a movie for me to watch during the weekend and explain the reason."
    fabric.print(instruction)
    sample = {"instruction": instruction, "input": ""}
    prompt = generate_prompt(sample)
    encoded = tokenizer.encode(prompt, device=fabric.device)
    with fabric.init_tensor():
        # do not set `max_seq_length=max_returned_token` because memory is not a concern here
        model.set_kv_cache(batch_size=1)
    output = generate(model, encoded, max_returned_tokens=len(encoded) + eval_max_new_tokens, temperature=0.8)
    model.clear_kv_cache()
    output = tokenizer.decode(output)
    fabric.print(output)

carmocca · 2023-12-07T12:44:53Z

This piece of validation was useful to subjectively evaluate the output of a fine-tuned model on the alpaca dataset as training progressed. Reusing it is on purpose so that you can see how it evolves with more steps.

You are correct that it might be completely irrelevant for other datasets. Feel free to completely remove those lines from your script or change the instruction to be more relevant to your use case.

DavidGOrtega · 2023-12-07T13:00:11Z

why not just take it from your test split?

carmocca · 2023-12-08T17:15:06Z

You could do that, but then it might be different if you shuffle or change your dataset splits

awaelchli · 2023-12-09T02:33:40Z

When I added that prompt to the scripts, I was legit actually looking for a movie to watch and that's why it is there.

DavidGOrtega · 2023-12-09T08:37:03Z

You could do that, but then it might be different if you shuffle or change your dataset splits

we are indeed randomly shuffling the dataset when preparing it but the seed is always the same in every prepare script 😉

When I added that prompt to the scripts, I was legit actually looking for a movie to watch and that's why it is there.

Not questioning that sir but it took me a bit to check that I was not messing with the datasets 😁

awaelchli · 2023-12-10T12:41:48Z

@DavidGOrtega What do you think we could do? The requirement is that the lit-gpt scripts stay as simple as possible. We want people to read the entire script and understand it in and out. The scripts are more like templates that the user adapts to their need. So we wouldn't want to add too much code here to select a sample from the dataset directly. As Carlos said that would require instantiating the dataset separately.

Maybe we could add a comment above the code that selects the sample, something like
"Fixed example prompt as a sanity check to compare across validation runs, please adapt to your data."
?

I would also be fine with removing it if it leads to too much confusion. I just thought that printing some output as a sanity check could help.

DavidGOrtega · 2023-12-15T13:00:21Z

Any thought here @awaelchli ?

awaelchli · 2023-12-16T00:47:16Z

My thoughts are feel free make a concrete suggestion in form of a PR regarding the sample, or remove it entirely. We appreciate and encourage active contributions if they are within the project's scope.

Unfortunately, I find the scripts pretty difficult to understand by obscurity...

A human error of leaving one redundant line of code in there is nowhere near anything like a deliberate act of obfuscation or anything like that. Let's just remove this line

"input_ids_no_response": encoded_full_prompt,

and the world is in balance again. If you see dead code elsewhere please call it out but we need some concrete evidence first that there are real problems with the code being obscure and hard to understand (like what else is there that you are hinting at?). Of course, lit-gpt is an NLP-focused template so certain terminology is expected to be known. But by all means if we think that terms like eos, bos are too abbreviated, let's add a comment to explain them at the appropriate places!

The scripts being versions of each other is intentional. Lit-GPT actively tries not to be a framework, and thus favors single-file templates over abstractions, even if that means certain blocks of code are repeated. Think of them as templates!

awaelchli · 2023-12-18T14:38:18Z

What does that mean concretely in lit-gpt? I'm still missing concrete examples. I'd like to talk specifics to make progress on the issue.

DavidGOrtega · 2023-12-18T14:51:37Z

I can do a PR if you are able to accept it.
My proposal for this issue is to take one example from the eval split

awaelchli · 2023-12-19T14:37:03Z

Happy to take a look at your proposal. Since the data comes pre-tokenized, you'll have to change the code to decode it (to print it for the human).

DavidGOrtega · 2023-12-19T21:54:11Z

Happy to hear! Let me prepare something.

rhiever · 2024-03-31T23:30:30Z

Did anything ever come of this issue/PR? I notice that the hardcoded example is still in the code, and I also ran into it as a bug.

carmocca · 2024-04-02T17:08:01Z

This has come up several times already and people are often confused by it (see also #1179 (comment)). What if we simply remove this bit @rasbt, @awaelchli? This is also a challenge for thunder because generation has a prefill and next-token stages with different shapes

rasbt · 2024-04-02T17:14:53Z

How about we add a "validate_prompt: Your text" to the configs but leave it empty (disable) it by default. This way users can choose to use it because it can sometimes be useful, but it will be off by default to not confuse people.

carmocca · 2024-04-02T17:21:23Z

That works but the counterargument is to keep things simple. If we won't have it by default then I wouldn't have it. But if we don't want anybody having to write any code then we should have it.

rasbt · 2024-04-02T17:29:04Z

But if we don't want anybody having to write any code then we should have it.

For my part, I actually like seeing the text generation for a fixed prompt during training because it's useful, so I'd like to have that option in the config files to (re)enable it for selfish reasons 😅

rhiever · 2024-04-02T17:42:17Z

IMO the proposed solution to use a fixed query from the test split makes a lot of sense.

DavidGOrtega changed the title ~~Incorrect (and repeated) validation example~~ Harcoded incorrect (and repeated) validation example Dec 6, 2023

carmocca added the question Further information is requested label Dec 7, 2023

This comment was marked as off-topic.

Sign in to view

awaelchli mentioned this issue Dec 18, 2023

Remove input_ids_no_response from instruction finetuning scripts #831

Merged

carmocca added enhancement New feature or request and removed question Further information is requested labels Apr 2, 2024

rasbt mentioned this issue Oct 24, 2024

Choose evaluation example from test set #1804

Merged

rasbt closed this as completed in #1804 Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harcoded incorrect (and repeated) validation example #796

Harcoded incorrect (and repeated) validation example #796

DavidGOrtega commented Dec 6, 2023 •

edited

Loading

carmocca commented Dec 7, 2023

DavidGOrtega commented Dec 7, 2023

carmocca commented Dec 8, 2023

awaelchli commented Dec 9, 2023

DavidGOrtega commented Dec 9, 2023 •

edited

Loading

awaelchli commented Dec 10, 2023 •

edited

Loading

This comment was marked as off-topic.

DavidGOrtega commented Dec 15, 2023

awaelchli commented Dec 16, 2023 •

edited

Loading

This comment was marked as off-topic.

awaelchli commented Dec 18, 2023

DavidGOrtega commented Dec 18, 2023 •

edited

Loading

awaelchli commented Dec 19, 2023

DavidGOrtega commented Dec 19, 2023

rhiever commented Mar 31, 2024

carmocca commented Apr 2, 2024 •

edited

Loading

rasbt commented Apr 2, 2024

carmocca commented Apr 2, 2024

rasbt commented Apr 2, 2024 •

edited

Loading

rhiever commented Apr 2, 2024

Harcoded incorrect (and repeated) validation example #796

Harcoded incorrect (and repeated) validation example #796

Comments

DavidGOrtega commented Dec 6, 2023 • edited Loading

carmocca commented Dec 7, 2023

DavidGOrtega commented Dec 7, 2023

carmocca commented Dec 8, 2023

awaelchli commented Dec 9, 2023

DavidGOrtega commented Dec 9, 2023 • edited Loading

awaelchli commented Dec 10, 2023 • edited Loading

This comment was marked as off-topic.

DavidGOrtega commented Dec 15, 2023

awaelchli commented Dec 16, 2023 • edited Loading

This comment was marked as off-topic.

awaelchli commented Dec 18, 2023

DavidGOrtega commented Dec 18, 2023 • edited Loading

awaelchli commented Dec 19, 2023

DavidGOrtega commented Dec 19, 2023

rhiever commented Mar 31, 2024

carmocca commented Apr 2, 2024 • edited Loading

rasbt commented Apr 2, 2024

carmocca commented Apr 2, 2024

rasbt commented Apr 2, 2024 • edited Loading

rhiever commented Apr 2, 2024

DavidGOrtega commented Dec 6, 2023 •

edited

Loading

DavidGOrtega commented Dec 9, 2023 •

edited

Loading

awaelchli commented Dec 10, 2023 •

edited

Loading

awaelchli commented Dec 16, 2023 •

edited

Loading

DavidGOrtega commented Dec 18, 2023 •

edited

Loading

carmocca commented Apr 2, 2024 •

edited

Loading

rasbt commented Apr 2, 2024 •

edited

Loading