Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kaldi: add an switch/option to read the durations from kaldi utt2dur … #832

Merged
merged 13 commits into from
Oct 4, 2022

Conversation

jtrmal
Copy link
Collaborator

@jtrmal jtrmal commented Sep 30, 2022

…instead of touching the audio file
should resolve #788
also, let me know if -d is the option that feels right for it, to me it does not exactly do, but didn't feel another is better

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

I tested:

 lhotse kaldi import -d   . 16000 kaldi2/

 and compared the duration with original files. They weren't equal to the durations in utt2dur but I assume it's just float representation issue

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

😱

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some issues, could you add a unit test to cover those? The existing tests didn't catch them.

Regarding reco2dur, IIRC it only represents durations up to two decimal points so the tolerance is +/- 5ms. Default Lhotse tolerance for mismatch is +/-0.25s (which makes Lhotse happy with how most MP3 decoders are diverging in terms of the total number of samples), so it should be OK to set this option to true as a default as save everybody's time. Historically this code used reco2dur already, but Lhotse tolerated zero mismatch between manifest and audio duration at the time, so I removed it (and much later realized there is no way that approach can work with various audio codecs and added the tolerance).

lhotse/kaldi.py Outdated
] = f"sph2pipe {source.source} -f wav -c {channel+1} -p | ffmpeg -threads 1 -i pipe:0 -ar {sampling_rate} -f wav -threads 1 pipe:1 |"
audios[channel] = (
f"sph2pipe {source.source} -f wav -c {channel+1} -p | ffmpeg -threads 1 "
"-i pipe:0 -ar {sampling_rate} -f wav -threads 1 pipe:1 |"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-i pipe:0 -ar {sampling_rate} -f wav -threads 1 pipe:1 |"
f"-i pipe:0 -ar {sampling_rate} -f wav -threads 1 pipe:1 |"

lhotse/kaldi.py Outdated
] = f"ffmpeg -threads 1 -i {source.source} -ar {sampling_rate} -map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |"
audios[channel] = (
f"ffmpeg -threads 1 -i {source.source} -ar {sampling_rate} "
"-map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |"
f"-map_channel 0.0.{channel} -f wav -threads 1 pipe:1 |"

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

Added tests for my incorrect formatting and added test for the new feature. Lets wait. But tests pass fine on my setup
y.

@pzelasko
Copy link
Collaborator

Can you also change to use reco2dur by default, if present? It should speed up things and shouldn't hurt anybody.

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

OK, I'm done with the changes (including making the option enabled by default).
Just waiting for the tests

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

I'm investigating. I assume its something with my cast/converstion to float

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

OK, should be fixed now. It was just an issue in the test, due to change of defaults (I was much stricter than I should have been)

@jtrmal
Copy link
Collaborator Author

jtrmal commented Sep 30, 2022

OK, we survived. Ready for review :)

@@ -220,6 +221,7 @@ def test_ok_on_file_singlechannel_sph_source_type(tmp_path, channel):
assert list(out.keys()) == [channel]
assert out[channel].startswith("sph2pipe")
assert "nonexistent.sph" in out[channel]
assert "{" not in out[channel]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to know what's this assert verifying?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just detect someone has goofed up and didn't fully formatted the f-string (or forgot to add f"")

@@ -42,7 +42,7 @@ def kaldi():
@click.option(
"-d",
"--use-reco2dur",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with default=True and is_flag=True it is impossible to turn this off. The idiomatic click way would be @click.option("--use-reco2dur/--compute-durations", default=True, help="...") and then name the Python function argument use_reco2dur: bool (the same as the first option before slash, i.e. the same as right now).

lhotse/kaldi.py Outdated
num_samples=compute_num_samples(durations[recording_id], sampling_rate),
duration=durations[recording_id],
num_samples=compute_num_samples(
float(durations[recording_id]), sampling_rate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be cleaner to do sth like load_kaldi_text_mapping(path, float_vals: bool = False) and when float_vals=True, cast the second column to float in the function. Otherwise you have to cast in multiple places, looks like a nasty surprise for anybody modifying these things in the future :)

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 3, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 3, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 3, 2022

@pzelasko your comments are addressed. I don't understand the 3.7 unit tests failing reason

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 3, 2022

@pzelasko
Copy link
Collaborator

pzelasko commented Oct 3, 2022

could this be an issue? https://stackoverflow.com/questions/73929564/entrypoints-object-has-no-attribute-get-digital-ocean

🤦🏻‍♂️

do you mind adding importlib-metadata<5.0.0 to requirements and checking if it helps? Thanks...

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 3, 2022 via email

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one minor comment. Don't worry about failing tests, it'll either sort itself out (pypi/packaging flake for some third party dependency) or I will look into it later.

setup.py Outdated
@@ -141,6 +141,7 @@ def mark_lhotse_version(version: str) -> None:


install_requires = [
"importlib-metadata<5.0.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these lines don't help, let's remove them

setup.py Outdated
@@ -180,6 +181,7 @@ def mark_lhotse_version(version: str) -> None:

docs_require = (project_root / "docs" / "requirements.txt").read_text().splitlines()
tests_require = [
"importlib-metadata<5.0.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these lines don't help, let's remove them

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 4, 2022 via email

@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 4, 2022

I added one more test, but IMO I'm done now :)

@pzelasko pzelasko merged commit dfc8e6d into lhotse-speech:master Oct 4, 2022
@jtrmal
Copy link
Collaborator Author

jtrmal commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support reading from reco2dur in lhotse.kaldi.load_kaldi_data_dir
3 participants