llama3.2 training loss is always zero #198

hessaAlawwad · 2024-10-26T12:41:29Z

Hello,

I am tryoing to SFT train llama3.2 11B vision instruct model. on a dataset that answer a question on an image using a context (could be more than one image). My code is:

def format_data(sample):
    # Load images from the sample
    images = load_images(sample.get("image", []))

    # Extract images as needed
    q_image = images[0] 

    # Extract the answer
    answer = next((conv["value"] for conv in sample.get("conversations", []) if conv.get("from") == "gpt"), "no answer")

    # Prepare the messages array for the model input
    # now we define an initial model prompt defining the task and giving the model the context passage
    instruction_prompt_template = '''
    You are a helpful assistant tasked with answering questions from a given multimodal context (images and texts). Please infer the answer from the context and respond.'

    Context: {context}'''

    # Prepare the messages array for the model input
    messages = [{"role": "user", "content": []}]
    messages[0]['content'].append({"type": "text", "text": instruction_prompt_template.format(context=context)})
    messages[0]["content"].append({"type": "image", "image": q_image})
    messages[0]["content"].append({"type": "text", "text": question})

    sample_conversation = tokenizer.apply_chat_template(messages, tokenize=False)
    return {"text": sample_conversation, "messages": messages, "answer": answer}

and I am trying to define a collator function for the SFT trainer.

My first question, when I prepare the text column, do I format it like:

My second question. I am not sure how to define the collator function. I am getting all zero training loss and I think this is due to calculating the loss for the whole response. how can I defin this collator?

thank you in advance

The text was updated successfully, but these errors were encountered:

varunfb · 2024-11-11T23:27:49Z

Please check out the documentation in the prompt format guide from here. The placement of the <|image|> tag is important. Text prompt should always be after the image tag not before that. Image is not part of the prompt. You can learn more about how the images are handled in llama from here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3.2 training loss is always zero #198

llama3.2 training loss is always zero #198

hessaAlawwad commented Oct 26, 2024 •

edited

Loading

varunfb commented Nov 11, 2024

llama3.2 training loss is always zero #198

llama3.2 training loss is always zero #198

Comments

hessaAlawwad commented Oct 26, 2024 • edited Loading

varunfb commented Nov 11, 2024

hessaAlawwad commented Oct 26, 2024 •

edited

Loading