-
-
Notifications
You must be signed in to change notification settings - Fork 166
Description
Description
Claude Code:
1. 24-bit PCM output support via soundfile
Files:
audio_separator\separator\common_separator.py—__init__()+write_audio_pydub()audio_separator\separator\separator.py—__init__()+ config dictaudio_separator\utils\cli.py— argument parser + Separator instantiation
Date: 2025-02-12
Problem:
Models output float32 (~24 bits of precision) but write_audio_pydub() truncates all output to int16 (16-bit) before writing. The existing write_audio_soundfile() path (--use_soundfile) has a bug where float data is assigned to an int16 interleave array without scaling by 32767, producing silent output. Documentation claimed 24-bit output but the library was always writing 16-bit.
Changes:
common_separator.py
Added output_subtype from config (after line ~85):
self.output_subtype = config.get("output_subtype", "PCM_16")Added early-return soundfile path in write_audio_pydub() (before the int16 conversion):
# For WAV/FLAC with PCM_24, use soundfile directly (pydub can't do 24-bit)
file_format = stem_path.lower().split(".")[-1]
if self.output_subtype == "PCM_24" and file_format in ("wav", "flac"):
import soundfile as sf
sf.write(stem_path, stem_source, self.sample_rate, subtype="PCM_24")
self.logger.debug(f"Exported 24-bit {file_format.upper()} via soundfile to {stem_path}")
returnseparator.py
Added output_subtype="PCM_16" parameter to __init__(), stored as self.output_subtype, and included in the config dict passed to architecture-specific separators.
cli.py
Added --output_subtype argument (default "PCM_16") and passed it to the Separator constructor.
Why: Preserves the full precision of model output. 24-bit PCM captures the float32 model output without the quantization noise from 16-bit truncation. The existing pydub int16 path remains untouched for 16-bit output and lossy formats (MP3, M4A, etc.).
Note: The broken write_audio_soundfile() path was not modified — instead, the fix uses soundfile.write() directly inside the existing write_audio_pydub() method with an early return for 24-bit lossless formats. This avoids the interleaving bugs in the soundfile path while keeping the change minimal.
Configuration: Controlled by separation_bit_depth in the [output_processing] section of UpMixery.ini. The value is passed to audio_separator_launcher.py via --bit-depth, which maps it to --output_subtype PCM_24 or PCM_16.