Skip to content

malusamayo/SemSlicer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemSlicer

Slice any datasets on any semantic criteria with LLMs.

Experiments Reproduction

To reproduce our experiments (automated setups), please run sbatch run.sh {dataset} on a cluster, using the according datasets. Please specify your OPENAI KEY in the command line first.

To reproduce our experiements (human in the loop setups), please replace the generated prompts with our edited prompts in data/hai-prompts

Notebook Usage

%env OPENAI_API_KEY={put your key here}
from semslicer.slicer import InteractiveSlicer
import pandas as pd
data = pd.read_csv("data/data/civil_comments_sampled.csv").sample(20)
concept = "Muslim"

slicer = InteractiveSlicer(concept, data,
    {
        'few-shot': True,
        'few-shot-size': 8,
        'instruction-source': 'template',
        'student-model': 'gpt-3.5-turbo',
        'teacher-model': 'gpt-4-turbo-preview'
    }
)

slicer.show_prompt()
llm_slicing = slicer.gen_slicing_func()
m_slice = data[data['context'].map(llm_slicing)]
m_slice['context'].sample(2)

About

Slice any datasets on any semantic concepts with LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.3%
  • Shell 1.7%