SemSlicer

Slice any datasets on any semantic criteria with LLMs.

Experiments Reproduction

To reproduce our experiments (automated setups), please run sbatch run.sh {dataset} on a cluster, using the according datasets. Please specify your OPENAI KEY in the command line first.

To reproduce our experiements (human in the loop setups), please replace the generated prompts with our edited prompts in data/hai-prompts

Notebook Usage

%env OPENAI_API_KEY={put your key here}
from semslicer.slicer import InteractiveSlicer
import pandas as pd
data = pd.read_csv("data/data/civil_comments_sampled.csv").sample(20)
concept = "Muslim"

slicer = InteractiveSlicer(concept, data,
    {
        'few-shot': True,
        'few-shot-size': 8,
        'instruction-source': 'template',
        'student-model': 'gpt-3.5-turbo',
        'teacher-model': 'gpt-4-turbo-preview'
    }
)

slicer.show_prompt()

llm_slicing = slicer.gen_slicing_func()
m_slice = data[data['context'].map(llm_slicing)]
m_slice['context'].sample(2)

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data		data
semslicer		semslicer
.gitignore		.gitignore
README.md		README.md
appendix.md		appendix.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemSlicer

Experiments Reproduction

Notebook Usage

About

Uh oh!

Releases

Packages

Languages

malusamayo/SemSlicer

Folders and files

Latest commit

History

Repository files navigation

SemSlicer

Experiments Reproduction

Notebook Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages