-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chatgpt_synthesis.jsonl file #219
Comments
The original experiment data is from customer, I just mock the single example in notebook to show its schema. So It's not available to share true data. sorry for that |
Hi @ybalbert001 , thanks for your reply. Is the data generation process code available? For example should I do something like this? import pandas as pd
import json
from tqdm.auto import tqdm
from langchain.llms import Bedrock
# Replace with your actual file path and Bedrock model ID
csv_file_path = 'qa_pairs.csv'
bedrock_model_id = 'anthropic.claude-v1'
def generate_enhanced_data(csv_path, model_id):
# Read CSV file
df = pd.read_csv(csv_path)
# Instantiate the Bedrock LLM
llm = Bedrock(model_id=model_id, model_kwargs={'max_tokens_to_sample': 2000})
# Container for the enhanced data
enhanced_data = []
# Loop through the DataFrame
for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Generating Data"):
original_question = row['Question']
original_answer = row['Answer']
source_context = row['Source Text']
# Generate a new question
new_question_prompt = f"\n\nHuman:Based on this context <context>{source_context}</context>, generate a related question.\n\nAssistant: Here is a sensible question based on the context provided: "
new_question_result = llm.generate([new_question_prompt])
new_question = new_question_result.generations[0][0].text.strip()
print(new_question)
# Generate a new answer
new_answer_prompt = f"\n\nHuman: Answer this question: {new_question}. Use this context to answer: <context>{source_context}</context>.\n\nAssistant: Here is an answer based on the context provided: "
new_answer_result = llm.generate([new_answer_prompt])
new_answer = new_answer_result.generations[0][0].text.strip()
# Append to the list
enhanced_data.append({
'origin_question': original_question,
'origin_answer': original_answer,
'generate_question': new_question,
'generate_answer': new_answer
})
# Save the enhanced data to a JSONL file
with open('bedrock_synthesis.jsonl', 'w') as outfile:
for entry in enhanced_data:
outfile.write(json.dumps(entry) + '\n')
# Call the function
generate_enhanced_data(csv_file_path, bedrock_model_id) |
@ybalbert001 I want to know how to did you generate the generate_question and generate_answer in the |
correct |
Hi, do you have the example
chatgpt_synthesis.jsonl
file available?The text was updated successfully, but these errors were encountered: