Human Health Sounds

An interactive visualization to organize thousands of human health sounds via t-SNE

About

Human health sounds — like coughing, sneezing, wheezing, and laughing — carry valuable diagnostic information. These sounds vary widely across individuals but can reveal deep insights into respiratory and overall health.

Understanding these sounds purely through their acoustic properties offers an efficient tool for healthcare. For instance, a model can compare an individual's throat-clearing sound to typical patterns of throat clearing for healthy populations to potentially diagnose an illness.

This project takes a first step toward that goal by clustering human health sounds using machine learning. We organize thousands of human health sounds among six classes: cough, sneeze, sniff, sigh, throat-clearing, and laughter, from the open-source VocalSound dataset. This visualization is built entirely through unsupervised learning, in this case simply t-SNE. No labels (such as sound type or speaker identity) were provided; the resulting map is based purely on acoustic features. We observe that similar sounds naturally cluster together, demonstrating that even a simple unsupervised method can uncover clear structure given high-quality embeddings from audio foundation models.

The project provides an interactive grid visualization of clustered audio clips. Users can click on images to view metadata, click and drag to play several related clips simultaneously, and filter by metadata to discover patterns.

Additionally, users can record their own audio, and view similar clips in the grid. Such a tool enables healthcare experts to accurately compare new audio with pre-existing data, to reveal underlying patterns and promote accurate diagnoses. Note that if not running the Flask server locally, it may take ~1 minute for the first time, but subsequent records should be processed quickly.

A demo can be viewed from the following link: https://hishambhatti.github.io/human-health-sounds/

Usage

Here we describe the basic pipeline for transforming raw audio files into an effective visualization. If you want to create a similar visualization for a different audio dataset, follow these instructions with a different audio folder.

human-health-sounds/
├── Notebooks/
│   ├── Audio_Processing.ipynb
│   ├── HeAR_embeddings.ipynb
│   └── t-SNE_and_grid_clustering.ipynb
│   └── requirements.txt
├── ca-cough-ony/  # React frontend
│   ├── public/
│   ├── src/
│   └── package.json
├── backend/ # Flask server for audio recording feature
│   ├── models--google--hear/
│   ├── app.py
│   ├── match_audio.py
│   ├── ...
└── README.md

Backend Setup (Python)

To build the backend Python, enter the notebooks folder by running cd Notebooks. Create a virtual environment and install the required dependencies.

Some commands to create a virtual environment

python3 -m venv .venv source
.venv/bin/activate
pip install -r requirements.txt

After installing dependencies, run the following notebooks in order, modifying the folder name for audio data:

First, save your dataset locally. In our example, we have a source directory vs_release_16k/audio_16k.

Audio_Preprocessing.ipynb: Run the cells for Pre-Processing

Pre-processes the human health audio data to create audio suitable for the HeAR model
Trims silence, removes short/quiet files, and caps the length of clip
Creates spectrograms for each audio clip

HeAR_embeddings.ipynb

Uses Google’s HeAR model (via Hugging Face) to generate embeddings
Tests embeddings on the preprocessed data

t-SNE_and_grid_clustering.ipynb

Runs the t-SNE algorithm to cluster the HeAR embeddings, searching over various perplexities
Runs the LAP solver to convert the t-SNE output into a 2D grid
Saves the output as a JSON for the frontend visualization

Audio_Preprocessing.ipynb: Run the cells for Post-Processing

Arranges spectrograms into a single large grid for frontend visualization
Combines individual audio clips into a single file, adding start and end times in metadata

Frontend Setup (React)

To build the client-side React, make sure you are in the ca-cough-ony folder. Then install node and run npm install.

Place the generated JSON file (either vocalsound_wav.json or vocalsound_mp3.json) into the ca-cough-ony/src folder. Copy the spectrogram grid (precomposed_grid_32.png) and the audio file (all_sounds_combined.wav or all_sounds_combined.mp3) and processed audio folders into thepublic/directory. Then run:

npm run dev

Backend Server (for Audio Recording feature)

If you want to enable the audio recording feature locally (not necessary), enter the /backend folder by running cd backend from the main diretory. Then, to start the flask server, run:

flask run

Currently, the frontend server connects to Google Cloud. To switch to connecting locally, uncomment the line in const BACKEND_URL = "http://127.0.0.1:5000/get-grid-indices" in ca-cough-ony/src/components/RecordPanel.jsx.

Results

Below is a visualization of our generated t-SNE cloud and LAP 2D grid for the processed VocalSound audio clips

As you can see, sound types are generally clustered together. The misgroupings often come from mislabelings in VocalSound, or poor audio quality. Feel free to explore them yourself!

Credit

Developed by Hisham Bhatti, working with Zhihan Zhang, at the Ubiquitous Computing Lab in the University of Washington Paul G. Allen School of Computer Science & Engineering. We thank Jake Garrison for discussion.

This project was based on Bird Sounds at Google Creative Lab, but designed with modern tooling, and for others to test with their own datasets. In particular, below are some notebooks that I took inspiration from:

The core embedding model is Google’s HeAR model, available on Hugging Face

The dataset used is VocalSound, an open-source collection of human health sounds.

Built With

Backend Logic:

Backend Server (for Audio recording):

Frontend:

Frameworks:

Libraries:

Tools:

Disclaimer

We do not store audio information provided by users. This tool is for research only.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Notebooks		Notebooks
backend		backend
ca-cough-ony		ca-cough-ony
img		img
plots		plots
t-SNE		t-SNE
vs_release_16k		vs_release_16k
.gitignore		.gitignore
README.md		README.md
vocalsound_grid_index_p50.csv		vocalsound_grid_index_p50.csv
vocalsound_grid_index_p50_mp3_with_timing.csv		vocalsound_grid_index_p50_mp3_with_timing.csv
vocalsound_grid_index_p50_with_timing.csv		vocalsound_grid_index_p50_with_timing.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Health Sounds

About

Usage

Backend Setup (Python)

Frontend Setup (React)

Backend Server (for Audio Recording feature)

Results

Credit

Built With

Disclaimer

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ubicomplab/human-health-sounds

Folders and files

Latest commit

History

Repository files navigation

Human Health Sounds

About

Usage

Backend Setup (Python)

Frontend Setup (React)

Backend Server (for Audio Recording feature)

Results

Credit

Built With

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages