Skip to content

Commit 461f82b

Browse files
authored
Merge pull request #2767 from pazeshun/add-data_to_spectrum
[audio_to_spectrogram, sound_classification] Add data_to_spectrogram
2 parents f264687 + e853d4b commit 461f82b

35 files changed

+1101
-349
lines changed

audio_to_spectrogram/CMakeLists.txt

+7-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ find_package(catkin REQUIRED COMPONENTS
88
catkin_python_setup()
99

1010
generate_dynamic_reconfigure_options(
11-
cfg/AudioAmplitudePlot.cfg
11+
cfg/DataAmplitudePlot.cfg
1212
)
1313

1414
catkin_package()
@@ -41,4 +41,10 @@ if(CATKIN_ENABLE_TESTING)
4141
find_package(catkin REQUIRED COMPONENTS rostest roslaunch)
4242
add_rostest(test/audio_to_spectrogram.test)
4343
roslaunch_add_file_check(launch/audio_to_spectrogram.launch)
44+
if(NOT $ENV{ROS_DISTRO} STRLESS "kinetic")
45+
# Under kinetic, eval cannot be used in launch files
46+
# http://wiki.ros.org/roslaunch/XML#substitution_args
47+
add_rostest(test/wrench_to_spectrogram.test)
48+
roslaunch_add_file_check(launch/wrench_to_spectrogram.launch)
49+
endif()
4450
endif()

audio_to_spectrogram/README.md

+191-12
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# audio_to_spectrogram
22

3-
This package converts audio data to spectrum and spectrogram data.
3+
This package converts audio data (or other time-series data) to spectrum and spectrogram data.
44

55
# Usage
66
By following command, you can publish audio, spectrum and spectrogram topics. Please set correct args for your microphone configuration, such as mic\_sampling\_rate or bitdepth.
@@ -9,6 +9,13 @@ By following command, you can publish audio, spectrum and spectrogram topics. Pl
99
roslaunch audio_to_spectrogram audio_to_spectrogram.launch
1010
```
1111

12+
Its data conversion pipeline is as follows:
13+
```
14+
audio_to_spectrum.py -> spectrum
15+
-> normalized_half_spectrum
16+
-> log_spectrum -> preprocess node(s) -> preprocessed spectrum -> spectrum_to_spectrogram.py -> spectrogram
17+
```
18+
1219
Here is an example using rosbag with 300Hz audio.
1320
```bash
1421
roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
@@ -18,19 +25,48 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
1825
|---|---|---|
1926
|<img src="docs/images/audio_amplitude.jpg" width="429">|![](https://user-images.githubusercontent.com/19769486/82075694-9a7ac300-9717-11ea-899c-db6119a76d52.png)|![](https://user-images.githubusercontent.com/19769486/82075685-96e73c00-9717-11ea-9abc-e6e74104d666.png)|
2027

28+
You can also convert data other than audio to spectrum and spectrogram data using this package.
29+
Here is an example using rosbag of a force torque sensor sensing drill vibration.
30+
```bash
31+
roslaunch audio_to_spectrogram sample_wrench_to_spectrogram.launch
32+
```
33+
34+
|Z-axis Force Amplitude|Normalized Half Spectrum|Spectrogram Source Spectrum|Spectrogram|
35+
|---|---|---|---|
36+
|<img src="docs/images/wrench_amplitude.jpg">|<img src="docs/images/wrench_normalized_half_spectrum.jpg">|<img src="docs/images/wrench_spectrogram_source.jpg">|<img src="docs/images/wrench_spectrogram.jpg">|
37+
2138
# Scripts
2239

2340
## audio_to_spectrum.py
41+
2442
A script to convert audio to spectrum.
2543

2644
- ### Publishing topics
27-
2845
- `~spectrum` (`jsk_recognition_msgs/Spectrum`)
2946

30-
Spectrum data calculated from audio by FFT.
47+
Spectrum data calculated from audio by FFT.
48+
It is usual "amplitude spectrum".
49+
See https://ryo-iijima.com/fftresult/ for details.
50+
51+
- `~normalized_half_spectrum` (`jsk_recognition_msgs/Spectrum`)
52+
53+
Spectrum data which is "half" (having non-negative frequencies (0Hz-Nyquist frequency)) and is "normalized" (consistent with the amplitude of the original signal).
54+
See the following for details.
55+
- https://ryo-iijima.com/fftresult/
56+
- https://stackoverflow.com/questions/63211851/why-divide-the-output-of-numpy-fft-by-n
57+
- https://github.com/jsk-ros-pkg/jsk_recognition/issues/2761#issue-1550715400
58+
59+
- `~log_spectrum` (`jsk_recognition_msgs/Spectrum`)
60+
61+
Log-scaled spectrum data.
62+
It is calculated by applying log to the absolute value of the FFT result.
63+
Usually, log is applied to "power spectrum", but we don't use it for simplicity.
64+
See the following for details.
65+
- https://github.com/jsk-ros-pkg/jsk_recognition/issues/2761#issuecomment-1445810380
66+
- http://makotomurakami.com/blog/2020/05/23/5266/
3167

3268
- ### Subscribing topics
33-
- `audio` (`audio_common_msgs/AudioData`)
69+
- `~audio` (`audio_common_msgs/AudioData`)
3470

3571
Audio stream data from microphone. The audio format must be `wave`.
3672

@@ -55,15 +91,94 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
5591

5692
Number of bits per audio data.
5793

58-
- `~high_cut_freq` (`Int`, default: `800`)
94+
- `~fft_exec_rate` (`Double`, default: `50`)
95+
96+
Rate [Hz] to execute FFT and publish its results.
97+
98+
## data_to_spectrum.py
99+
100+
Generalized version of `audio_to_spectrum.py`.
101+
This script can convert multiple message types to spectrum.
102+
103+
- ### Publishing topics
104+
105+
Same as `audio_to_spectrum.py`.
106+
107+
- ### Subscribing topics
108+
- `~input` (`AnyMsg`)
109+
110+
Topic to which message including data you want to convert to spectrum is published.
111+
112+
- ### Parameters
113+
- `~expression_to_get_data` (`String`, default: `m.data`)
114+
115+
Python expression to get data from the input message `m`. For example, if your input is `std_msgs/Float64`, it is `m.data`.
116+
Just accessing a field of `m` is recommended.
117+
If you want to do a complex calculation (e.g., using `numpy`), use `transform` of `topic_tools` before this node.
118+
119+
- `~data_sampling_rate` (`Int`, default: `500`)
120+
121+
Sampling rate [Hz] of input data.
122+
123+
- `~fft_sampling_period` (`Double`, default: `0.3`)
124+
125+
Period [s] to sample input data for one FFT.
126+
127+
- `~fft_exec_rate` (`Double`, default: `50`)
128+
129+
Rate [Hz] to execute FFT and publish its results.
130+
131+
- `~is_integer` (`Bool`, default: `false`)
132+
133+
Whether input data is integer or not. For example, if your input is `std_msgs/Float64`, it is `false`.
134+
135+
- `~is_signed` (`Bool`, default: `true`)
136+
137+
Whether input data is signed or not. For example, if your input is `std_msgs/Float64`, it is `true`.
138+
139+
- `~bitdepth` (`Int`, default: `64`)
140+
141+
Number of bits per input data. For example, if your input is `std_msgs/Float64`, it is `64`.
142+
143+
- `~n_channel` (`Int`, default: `1`)
144+
145+
If your input is scalar, it is `1`.
146+
If your input is flattened 2D matrix, it is number of channel of original matrix.
147+
148+
- `~target_channel` (`Int`, default: `0`)
149+
150+
If your input is scalar, it is `0`.
151+
If your input is flattened 2D matrix, it is target channel.
152+
153+
## spectrum_filter.py
154+
155+
A script to filter spectrum.
156+
157+
- ### Publishing topics
158+
- `~output` (`jsk_recognition_msgs/Spectrum`)
159+
160+
Filtered spectrum data (`low_cut_freq`-`high_cut_freq`).
161+
162+
- ### Subscribing topics
163+
- `~input` (`jsk_recognition_msgs/Spectrum`)
164+
165+
Original spectrum data.
166+
167+
- ### Parameters
168+
- `~data_sampling_rate` (`Int`, default: `500`)
169+
170+
Sampling rate [Hz] of data used in generation of original spectrum data.
171+
172+
- `~high_cut_freq` (`Int`, default: `250`)
59173

60174
Threshold to limit the maximum frequency of the output spectrum.
61175

62-
- `~low_cut_freq` (`Int`, default: `1`)
176+
- `~low_cut_freq` (`Int`, default: `0`)
63177

64178
Threshold to limit the minimum frequency of the output spectrum.
65179

66180
## spectrum_to_spectrogram.py
181+
67182
A script to convert spectrum to spectrogram.
68183

69184
- ### Publishing topics
@@ -128,7 +243,7 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
128243

129244
Number of bits per audio data.
130245

131-
- `~maximum_amplitude` (`Int`, default: `10000`)
246+
- `~maximum_amplitude` (`Double`, default: `10000.0`)
132247

133248
Maximum range of amplitude to plot.
134249

@@ -140,6 +255,66 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
140255

141256
Publish rate [Hz] of audio amplitude image topic.
142257

258+
## data_amplitude_plot.py
259+
260+
Generalized version of `audio_amplitude_plot.py`.
261+
262+
- ### Publishing topics
263+
264+
- `~output/viz` (`sensor_msgs/Image`)
265+
266+
Data amplitude plot image.
267+
268+
- ### Subscribing topics
269+
- `~input` (`AnyMsg`)
270+
271+
Topic to which message including data whose amplitude you want to plot is published.
272+
273+
- ### Parameters
274+
- `~expression_to_get_data` (`String`, default: `m.data`)
275+
276+
Python expression to get data from the input message `m`. For example, if your input is `std_msgs/Float64`, it is `m.data`.
277+
Just accessing a field of `m` is recommended.
278+
If you want to do a complex calculation (e.g., using `numpy`), use `transform` of `topic_tools` before this node.
279+
280+
- `~data_sampling_rate` (`Int`, default: `500`)
281+
282+
Sampling rate [Hz] of input data.
283+
284+
- `~is_integer` (`Bool`, default: `false`)
285+
286+
Whether input data is integer or not. For example, if your input is `std_msgs/Float64`, it is `false`.
287+
288+
- `~is_signed` (`Bool`, default: `true`)
289+
290+
Whether input data is signed or not. For example, if your input is `std_msgs/Float64`, it is `true`.
291+
292+
- `~bitdepth` (`Int`, default: `64`)
293+
294+
Number of bits per input data. For example, if your input is `std_msgs/Float64`, it is `64`.
295+
296+
- `~n_channel` (`Int`, default: `1`)
297+
298+
If your input is scalar, it is `1`.
299+
If your input is flattened 2D matrix, it is number of channel of original matrix.
300+
301+
- `~target_channel` (`Int`, default: `0`)
302+
303+
If your input is scalar, it is `0`.
304+
If your input is flattened 2D matrix, it is target channel.
305+
306+
- `~maximum_amplitude` (`Double`, default: `10.0`)
307+
308+
Maximum range of amplitude to plot.
309+
310+
- `~window_size` (`Double`, default: `10.0`)
311+
312+
Window size of input data to plot.
313+
314+
- `~rate` (`Double`, default: `10.0`)
315+
316+
Publish rate [Hz] of data amplitude image topic.
317+
143318
## spectrum_plot.py
144319

145320
A script to publish frequency vs amplitude plot image.
@@ -159,14 +334,18 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
159334
Spectrum data calculated from audio by FFT.
160335

161336
- ### Parameters
162-
- `~plot_amp_min` (`Double`, default: `0.0`)
337+
- `~min_amp` (`Double`, default: `0.0`)
163338

164-
Minimum value of amplitude in plot
339+
Minimum value of amplitude in plot.
165340

166-
- `~plot_amp_max` (`Double`, default: `20.0`)
341+
- `~max_amp` (`Double`, default: `20.0`)
167342

168-
Maximum value of amplitude in plot
343+
Maximum value of amplitude in plot.
169344

170345
- `~queue_size` (`Int`, default: `1`)
171346

172-
Queue size of spectrum subscriber
347+
Queue size of spectrum subscriber.
348+
349+
- `~max_rate` (`Double`, default: `-1`)
350+
351+
Maximum publish rate [Hz] of frequency vs amplitude plot image. Setting this value low reduces CPU load. `-1` means no maximum limit.

audio_to_spectrogram/cfg/AudioAmplitudePlot.cfg

-13
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#! /usr/bin/env python
2+
3+
PACKAGE = 'audio_to_spectrogram'
4+
5+
from dynamic_reconfigure.parameter_generator_catkin import *
6+
7+
8+
gen = ParameterGenerator()
9+
10+
gen.add("maximum_amplitude", double_t, 0, "Maximum range of amplitude to plot", 10.0)
11+
gen.add("window_size", double_t, 0, "Window size of data input to plot", 10.0)
12+
13+
exit(gen.generate(PACKAGE, PACKAGE, "DataAmplitudePlot"))
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)