Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3.2 training loss is always zero #198

Open
hessaAlawwad opened this issue Oct 26, 2024 · 1 comment
Open

llama3.2 training loss is always zero #198

hessaAlawwad opened this issue Oct 26, 2024 · 1 comment

Comments

@hessaAlawwad
Copy link

hessaAlawwad commented Oct 26, 2024

Hello,

I am tryoing to SFT train llama3.2 11B vision instruct model. on a dataset that answer a question on an image using a context (could be more than one image). My code is:

def format_data(sample):
    # Load images from the sample
    images = load_images(sample.get("image", []))

    # Extract images as needed
    q_image = images[0] 

    # Extract the answer
    answer = next((conv["value"] for conv in sample.get("conversations", []) if conv.get("from") == "gpt"), "no answer")

    # Prepare the messages array for the model input
    # now we define an initial model prompt defining the task and giving the model the context passage
    instruction_prompt_template = '''
    You are a helpful assistant tasked with answering questions from a given multimodal context (images and texts). Please infer the answer from the context and respond.'

    Context: {context}'''

    # Prepare the messages array for the model input
    messages = [{"role": "user", "content": []}]
    messages[0]['content'].append({"type": "text", "text": instruction_prompt_template.format(context=context)})
    messages[0]["content"].append({"type": "image", "image": q_image})
    messages[0]["content"].append({"type": "text", "text": question})

    sample_conversation = tokenizer.apply_chat_template(messages, tokenize=False)
    return {"text": sample_conversation, "messages": messages, "answer": answer}

and I am trying to define a collator function for the SFT trainer.

  1. My first question, when I prepare the text column, do I format it like:

<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant tasked with answering questions from a given context. Please infer the answer from the context and respond. Context: Most fossils are preserved by one of five processes outlined below (Figure 1.1): 1. What is the traditional definition of gravity? 2. Identify factors that influence the strength of gravity between two objects. Despite these problems, there is a rich fossil record. How does an organism become fossilized? <|image|> How many actions are depicted in the diagram?<|eot_id|> <|start_header_id|>assistant<|end_header_id|>7<|eot_id|>
by placing this <|image|> placeholder? or shall insert the actual image? or the path?

  1. My second question. I am not sure how to define the collator function. I am getting all zero training loss and I think this is due to calculating the loss for the whole response. how can I defin this collator?

thank you in advance

@varunfb
Copy link
Contributor

varunfb commented Nov 11, 2024

Please check out the documentation in the prompt format guide from here. The placement of the <|image|> tag is important. Text prompt should always be after the image tag not before that. Image is not part of the prompt. You can learn more about how the images are handled in llama from here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants