-
Notifications
You must be signed in to change notification settings - Fork 433
Add audio NIM model and audio probes #1163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
65c2dae
to
633969e
Compare
633969e
to
3a5282a
Compare
audio_achilles_data_dir = ( | ||
_config.transient.cache_dir / "data" / "audio_achilles" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer load using the data pattern:
audio_achilles_data_dir = ( | |
_config.transient.cache_dir / "data" / "audio_achilles" | |
) | |
from garak.data import path as data_path | |
audio_achilles_data_dir = ( | |
data_path / "audio_achilles" | |
) |
This will move the location expected to $XDG_DATA_HOME/garak/data/audio_achilles
.
This can serve as a good example of how a user can bring their own data files. The data_path
should be treated as read only by garak
and can be provided by the project installation or overridden/extended by the user. The _config.transient.cache_dir
paths should not be user provided files as garak
expected to manage files in that path.
def write_audio_to_file(audio_data, file_path, sampling_rate): | ||
"""Writes audio data to a file. | ||
|
||
Args: | ||
audio_data: A 1D numpy array containing the audio data. | ||
file_path: The path to the output audio file. | ||
sampling_rate: The sampling rate of the audio data. | ||
""" | ||
sf.write(file_path, audio_data, sampling_rate) | ||
|
||
import soundfile as sf | ||
from datasets import load_dataset | ||
|
||
os.makedirs(audio_achilles_data_dir) | ||
dataset = load_dataset("garak-llm/audio_achilles_heel") | ||
for item in dataset["train"]: | ||
audio_data = item["audio"]["array"] | ||
sampling_rate = item["audio"]["sampling_rate"] | ||
file_path = str(audio_achilles_data_dir / item["audio"]["path"]) | ||
write_audio_to_file(audio_data, file_path, sampling_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explains the reasoning for searching the cache_dir
path, I assume this is meant to mirror the visual_jailbreak
probe. In the visual_jailbreak
probe however the cache_dir usage is due to the direct download of data into a cached location, is this really analogous to that?
I theory the the structure of the tests sent by this probe could allow user provided audio files in data
however connecting the content of that file to an expected mitigation seems to be missing a link in the chain to allow/enable inspection of the data the detector result is based on.
Adds support for audio probes using [the datasets I have somewhere] and multimodal NIM.