Slice any datasets on any semantic criteria with LLMs.
To reproduce our experiments (automated setups), please run sbatch run.sh {dataset} on a cluster, using the according datasets. Please specify your OPENAI KEY in the command line first.
To reproduce our experiements (human in the loop setups), please replace the generated prompts with our edited prompts in data/hai-prompts
%env OPENAI_API_KEY={put your key here}
from semslicer.slicer import InteractiveSlicer
import pandas as pd
data = pd.read_csv("data/data/civil_comments_sampled.csv").sample(20)
concept = "Muslim"
slicer = InteractiveSlicer(concept, data,
{
'few-shot': True,
'few-shot-size': 8,
'instruction-source': 'template',
'student-model': 'gpt-3.5-turbo',
'teacher-model': 'gpt-4-turbo-preview'
}
)
slicer.show_prompt()llm_slicing = slicer.gen_slicing_func()
m_slice = data[data['context'].map(llm_slicing)]
m_slice['context'].sample(2)