chatgpt_synthesis.jsonl file #219

austinmw · 2023-12-07T00:35:14Z

Hi, do you have the example chatgpt_synthesis.jsonl file available?

The text was updated successfully, but these errors were encountered:

ybalbert001 · 2023-12-16T01:27:18Z

The original experiment data is from customer, I just mock the single example in notebook to show its schema. So It's not available to share true data. sorry for that

austinmw · 2023-12-29T17:05:58Z

Hi @ybalbert001 , thanks for your reply. Is the data generation process code available? For example should I do something like this?

import pandas as pd
import json
from tqdm.auto import tqdm
from langchain.llms import Bedrock

# Replace with your actual file path and Bedrock model ID
csv_file_path = 'qa_pairs.csv'
bedrock_model_id = 'anthropic.claude-v1'

def generate_enhanced_data(csv_path, model_id):
    # Read CSV file
    df = pd.read_csv(csv_path)

    # Instantiate the Bedrock LLM
    llm = Bedrock(model_id=model_id, model_kwargs={'max_tokens_to_sample': 2000})

    # Container for the enhanced data
    enhanced_data = []

    # Loop through the DataFrame
    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Generating Data"):
        original_question = row['Question']
        original_answer = row['Answer']
        source_context = row['Source Text']

        # Generate a new question
        new_question_prompt = f"\n\nHuman:Based on this context <context>{source_context}</context>, generate a related question.\n\nAssistant: Here is a sensible question based on the context provided: "
        new_question_result = llm.generate([new_question_prompt])
        new_question = new_question_result.generations[0][0].text.strip()
        print(new_question)

        # Generate a new answer
        new_answer_prompt = f"\n\nHuman: Answer this question: {new_question}. Use this context to answer: <context>{source_context}</context>.\n\nAssistant: Here is an answer based on the context provided: "
        new_answer_result = llm.generate([new_answer_prompt])
        new_answer = new_answer_result.generations[0][0].text.strip()

        # Append to the list
        enhanced_data.append({
            'origin_question': original_question,
            'origin_answer': original_answer,
            'generate_question': new_question,
            'generate_answer': new_answer
        })


    # Save the enhanced data to a JSONL file
    with open('bedrock_synthesis.jsonl', 'w') as outfile:
        for entry in enhanced_data:
            outfile.write(json.dumps(entry) + '\n')

# Call the function
generate_enhanced_data(csv_file_path, bedrock_model_id)

631068264 · 2024-04-13T01:45:41Z

@ybalbert001 I want to know how to did you generate the generate_question and generate_answer in the
example bge_zh_research.ipynb

ybalbert001 · 2024-05-02T01:09:47Z

Hi @ybalbert001 , thanks for your reply. Is the data generation process code available? For example should I do something like this?

import pandas as pd
import json
from tqdm.auto import tqdm
from langchain.llms import Bedrock

# Replace with your actual file path and Bedrock model ID
csv_file_path = 'qa_pairs.csv'
bedrock_model_id = 'anthropic.claude-v1'

def generate_enhanced_data(csv_path, model_id):
    # Read CSV file
    df = pd.read_csv(csv_path)

    # Instantiate the Bedrock LLM
    llm = Bedrock(model_id=model_id, model_kwargs={'max_tokens_to_sample': 2000})

    # Container for the enhanced data
    enhanced_data = []

    # Loop through the DataFrame
    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Generating Data"):
        original_question = row['Question']
        original_answer = row['Answer']
        source_context = row['Source Text']

        # Generate a new question
        new_question_prompt = f"\n\nHuman:Based on this context <context>{source_context}</context>, generate a related question.\n\nAssistant: Here is a sensible question based on the context provided: "
        new_question_result = llm.generate([new_question_prompt])
        new_question = new_question_result.generations[0][0].text.strip()
        print(new_question)

        # Generate a new answer
        new_answer_prompt = f"\n\nHuman: Answer this question: {new_question}. Use this context to answer: <context>{source_context}</context>.\n\nAssistant: Here is an answer based on the context provided: "
        new_answer_result = llm.generate([new_answer_prompt])
        new_answer = new_answer_result.generations[0][0].text.strip()

        # Append to the list
        enhanced_data.append({
            'origin_question': original_question,
            'origin_answer': original_answer,
            'generate_question': new_question,
            'generate_answer': new_answer
        })


    # Save the enhanced data to a JSONL file
    with open('bedrock_synthesis.jsonl', 'w') as outfile:
        for entry in enhanced_data:
            outfile.write(json.dumps(entry) + '\n')

# Call the function
generate_enhanced_data(csv_file_path, bedrock_model_id)

correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chatgpt_synthesis.jsonl file #219

chatgpt_synthesis.jsonl file #219

austinmw commented Dec 7, 2023

ybalbert001 commented Dec 16, 2023

austinmw commented Dec 29, 2023 •

edited

Loading

631068264 commented Apr 13, 2024 •

edited

Loading

ybalbert001 commented May 2, 2024

chatgpt_synthesis.jsonl file #219

chatgpt_synthesis.jsonl file #219

Comments

austinmw commented Dec 7, 2023

ybalbert001 commented Dec 16, 2023

austinmw commented Dec 29, 2023 • edited Loading

631068264 commented Apr 13, 2024 • edited Loading

ybalbert001 commented May 2, 2024

austinmw commented Dec 29, 2023 •

edited

Loading

631068264 commented Apr 13, 2024 •

edited

Loading