Wish list #1

tombackstrom · 2022-05-18T06:15:10Z

tombackstrom
May 18, 2022
Maintainer

A collection of ideas from different contributors:

A larger comprehensive intro to machine learning, as well as corresponding parts in many different chapters (will probably have to wait for a while)
Medical analysis of speech (Paavo has promised some material)
Working code demos and examples (Jupyter) in all applicable sections (everyone)
Add links to google colab and/or mybinder to allow for running the jupyter demos (only possible after the git is switched to public)
Get a zenodo-doi
Hearing of speech (regular wikipedia is bit of a mess regarding that)
Some reorganization/added content for Speech production
Acoustics of vocal tract from the partial differential equations (as a background for all-pole models)
Perhaps separate subsections for classical GMM-HMM and neural ASR + a more general "overview" page as it's now?
A sub-section on articulatory synthesis
A section on self-supervised learning (cf. APC, CPC, HuBERT, Wav2vec 2.0 etc.), as that's pretty much becoming a standard pre-processing step nowadays.

Old / Mostly solved:

Autocorrelation and covariance variants of LP more explicitly and perhaps (also) with the classical derivation from normal equations - (Tom: here I wanted to hear your opinions, as it's pretty much your turf).
- Added covariance method and Yule-Walker as well as more numerical examples.
Speech recognition & NLP
- Added a link to Jurafsky's book on the front page instead.

tombackstrom · 2022-05-18T13:19:32Z

tombackstrom
May 18, 2022
Maintainer Author

With regard to the extended LP text; I'm wondering to which extent LP is still relevant? I know it has been extremely important, but do you see it still used in the future? Personally, I'm undecided. LP-type analysis leads to computational problems, like non-linear modelling and danger of instability etc. whereas operations in the STFT domain are WYSIWYG. For that reason, I've pretty much stopped using LP, but that's just me. The question is more whether there is, for example, solid science with methods combining ML and LP?

One thing which I'm planning to add is a vocoder example using LP, where I would apply a formant structure to something like a trumpet sound. That would be a nice demo which simultaneously demonstrates the effect of formants.

0 replies

orasanen · 2022-05-18T13:48:42Z

orasanen
May 18, 2022
Maintainer

I'm also a bit ambivalent with respect to this. However, on my current speech course, I do teach LPC. More specifically, and after teaching basics of speech production on an earlier lecture, I start from the acoustic theory of speech production (source-filter model) by giving an overview of the concept and reviewing the three components (source, tract, lip radiation). Then I proceed to modeling of vocal tract acoustics with lossless tubes in more detail, and proceed from continuous physical domain to digital lossless tube model(s) of the tract. The motivation is to explain the connection of the physics/physiology and why the tract can be seen as a filter. After those, I present LPC as a practical means to estimate AR model parameters and demonstrate the equivalence of LPC with the lossless tube model. I also went through both autocorrelation and covariance estimators for LP.

In our exercises,

E1 is about speech annotation & manual acoustic analysis with praat (+ the mandatory manual concatenative synthesis)
E2 is about windowing & time-frequency analysis with FFT (spectrum, spectrogram, energy and ZCR features), and about self-implemented autocorr LP, including copysynthesis with LP using OLA.
E3 is phone recognition with GMMs with Librispeech annotated data, where students implement the GMMs themselves.
E4 is then statistical parametric speech synthesis with MFCCs&GMMs, and using MFCC --> LP conversion and then synthesis with LP "vocoder".

So, LPC is also used in the exercises (implemented first, later used for speech synthesis).

At least our machine learning oriented students liked this type of "low-level" treatment of the topic a lot. Some commented that they anyway encounter so much "data driven blackbox" stuff on machine learning studies that having "old-school" content is useful. Also, many of them had already had several other courses with statistical signal processing etc. where they have used different stuff from Wiener filters to HMMs, LPC etc., but with limited connection to any physical/domain-specific phenomena. So I got the impression they appreciated the contents, although I did not specifically ask about the LP part. Feedback from the course and exercises (which I asked separately about for each) was extremely good though (clearly above 6 on scale 1–7).

Whether any of the LPC stuff is relevant for modern speech tech use is another question... However, at least the aim on my course is to get the students deeply familiar with speech as a phenomenon, and with ways to tackle it with engineering. Not so much about teaching them state-of-the-art machine learning methods in speech.

0 replies

tombackstrom · 2022-05-25T13:42:46Z

tombackstrom
May 25, 2022
Maintainer Author

Another issue is the table of contents, which has organically grown into what it is now, but I've never put much thought into it. So any suggestions on a better organization of the content?
Among the current problems is that the division into basic representations, pre-processing and modelling tools is arbitrary. Many of the entries could appear under any one of the three chapters.

Therefore, one idea could be for example

Preface - about this book
Speech and language - description of the speech signal and linguistic structure
Analysis and processing modules
Applications and systems
Evaluation
Then I would make the above numbering correspond to "Parts" whereas currently, the top-level numbering is "chapters".

Edit on 17.6.2022: Added evaluation

2 replies

daniel00ramos Jun 10, 2022

With respect to the ToC, I suggest to change the chapter "forensic speaker recognition" will be under "speaker recognition and verification". It would make much more sense.

tombackstrom Aug 14, 2023
Maintainer Author

A problem with that is that then we'd have a third level, chapters, subchapter, and subsubchapters. I'll add a "see also"-link to the speaker recognition chapter, perhaps that's sufficient?

josharian · 2023-04-06T22:38:28Z

josharian
Apr 6, 2023

Small thing, but I'd love an easy way to download a PDF of the whole book so I can read it on my Kindle. I only found a way to export to PDF chapter by chapter.

3 replies

tombackstrom Apr 7, 2023
Maintainer Author

Good idea! Have to try out the extent to which jupyterbook supports compilation to pdf. I'll put it onto my todo list unless someone beats me to it ;)

josharian May 2, 2023

I tried the steps at the link you provided. It almost "just worked"...except that the resulting document is really large, so the "print HTML to PDF" stage takes a long time, and it times out (see below). There's probably a config somewhere to increase the timeout, but I didn't see it.

However, manually opening the single-page HTML file that it generated in a browser and then printing that to PDF worked!

It generated a 35mb pdf, which was too big to email to Kindle, but it was easy to split in half manually once I had the original PDF. In an ideal world I'd split on some chapter break, but I just cut at an arbitrary page for now.

I'm trying both US Letter and A6 page size to see which works better in practice on the Kindle.

I'll report back after I've done some actual reading, which might take a while. :)

The HTML page is in _build/html.
Finished generating HTML for book...
Converting book HTML into PDF...
Traceback (most recent call last):
  File "/Users/josh/.pyenv/versions/3.11.2/bin/jupyter-book", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/jupyter_book/cli/main.py", line 317, in build
    builder_specific_actions(
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/jupyter_book/cli/main.py", line 575, in builder_specific_actions
    html_to_pdf(output_path.joinpath("index.html"), path_pdf_output)
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/jupyter_book/pdf.py", line 31, in html_to_pdf
    asyncio.get_event_loop().run_until_complete(_html_to_pdf(html_file, pdf_file))
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/jupyter_book/pdf.py", line 50, in _html_to_pdf
    await page.goto(f"file:///{html_file}", {"waitUntil": ["networkidle2"]})
  File "/Users/josh/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pyppeteer/page.py", line 837, in goto
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.

tombackstrom Aug 14, 2023
Maintainer Author

Finally returned to this issue. I installed pyppeteer and compiled the book with jb build . --builder pdfhtml and voila it worked out of the box. There has been some updates though to the underlying packages, so they might have solved your problem.

I however found that a better approach is to first compile the whole material to a single HTML using jb build . --builder singlehtml and then use for example pandoc to convert it to pdf. Pandoc can also convert it to latex, which gives, in principle, nicer quality, but there are a lot of errors with transcoding characters (phonetic symbols) and .svg figures. The .svg figures probably should be converted to pdf anyway, but I'm not sure if the problem with phonetic symbols can be fixed easily.

Another possibility is to use pandoc to convert to epub, which works to some extent. Links within the document seem to be broken though.

In any case, I don't plan to include pdf's in the official distribution unless there is overwhelming support. In a smaller scale, the above instructions could be copy-pasted to the user guide for future reference.

tombackstrom · 2023-09-18T10:52:56Z

tombackstrom
Sep 18, 2023
Maintainer Author

Reviewing content this year, I've found quite a bit of things which require improvements. Here's a list of things that I'm currently aware of:

Windowing currently appears in many different places, including at least the sections "windowing", "short time analysis", "short time processing", "MDCT". Better structuring would be useful.
Vector quantization (in the transmission chapter) would really need an interactive visualization. Not difficult, just needs to be done.
The entropy coding section would also likely benefit from an interactive section.
Neural coding should be added to the speech coding section.
The coding chapter should have sound examples.
Neural enhancement should be added to noise attenuation with an example.
The source-filter model should be more prominent in the speech production chapter.
Add ViSQOL and ABC-MRT16 to objective evaluation
Paralinguistic processing, like emotion classification
Disentanglement for privacy-preserving processing (the privacy section should anyway be factored into digestible pieces)

I'll continue the list as I progress in reviewing it.

6 replies

orasanen Sep 22, 2023
Maintainer

Lots of good suggestions, some of them critical. I agree that a section on self-supervised learning from speech/audio would be a necessity. I would love to do it myself, but I can't promise any delivery date for that at the moment. If someone else jumps on this, please let me know so that we don't work on the same thing. Also, good visualizations of the key approaches (especially prediction- and masking based algorithms) that we can freely use in the book would be valuable here, as it'll take a lot of time to make them beautiful and illustrative.

tombackstrom Sep 22, 2023
Maintainer Author

I'd be very happy if you contribute self-supervised learning because I don't have experience with that. We are currently looking into contributing modules for at least enhancement, gender classification, coding, wake-word detection. Will take a while though, so as usual, we are happy if someone contributes faster!

orasanen Oct 3, 2023
Maintainer

I'm working on the self-supervised chapter now, as I think it needs to be in the book.

tombackstrom Oct 23, 2023
Maintainer Author

Another wish from students:

Glossary should be improved

tombackstrom Nov 16, 2023
Maintainer Author

Another topic we really should add:

Voice conversion, as background material e.g. this review Perhaps we should invite someone to write it?

Wish list #1

Uh oh!

Uh oh!

tombackstrom May 18, 2022 Maintainer

Replies: 5 comments · 11 replies

Uh oh!

Uh oh!

tombackstrom May 18, 2022 Maintainer Author

Uh oh!

orasanen May 18, 2022 Maintainer

Uh oh!

Uh oh!

tombackstrom May 25, 2022 Maintainer Author

Uh oh!

daniel00ramos Jun 10, 2022

Uh oh!

tombackstrom Aug 14, 2023 Maintainer Author

Uh oh!

josharian Apr 6, 2023

Uh oh!

tombackstrom Apr 7, 2023 Maintainer Author

Uh oh!

josharian May 2, 2023

Uh oh!

Uh oh!

tombackstrom Aug 14, 2023 Maintainer Author

Uh oh!

Uh oh!

tombackstrom Sep 18, 2023 Maintainer Author

Uh oh!

orasanen Sep 22, 2023 Maintainer

Uh oh!

tombackstrom Sep 22, 2023 Maintainer Author

Uh oh!

Uh oh!

orasanen Oct 3, 2023 Maintainer

Uh oh!

Uh oh!

tombackstrom Oct 23, 2023 Maintainer Author

Uh oh!

Uh oh!

tombackstrom Nov 16, 2023 Maintainer Author

tombackstrom
May 18, 2022
Maintainer

Replies: 5 comments 11 replies

tombackstrom
May 18, 2022
Maintainer Author

orasanen
May 18, 2022
Maintainer

tombackstrom
May 25, 2022
Maintainer Author

tombackstrom Aug 14, 2023
Maintainer Author

josharian
Apr 6, 2023

tombackstrom Apr 7, 2023
Maintainer Author

tombackstrom Aug 14, 2023
Maintainer Author

tombackstrom
Sep 18, 2023
Maintainer Author

orasanen Sep 22, 2023
Maintainer

tombackstrom Sep 22, 2023
Maintainer Author

orasanen Oct 3, 2023
Maintainer

tombackstrom Oct 23, 2023
Maintainer Author

tombackstrom Nov 16, 2023
Maintainer Author