1
1
# audio_to_spectrogram
2
2
3
- This package converts audio data to spectrum and spectrogram data.
3
+ This package converts audio data (or other time-series data) to spectrum and spectrogram data.
4
4
5
5
# Usage
6
6
By following command, you can publish audio, spectrum and spectrogram topics. Please set correct args for your microphone configuration, such as mic\_ sampling\_ rate or bitdepth.
@@ -9,6 +9,13 @@ By following command, you can publish audio, spectrum and spectrogram topics. Pl
9
9
roslaunch audio_to_spectrogram audio_to_spectrogram.launch
10
10
```
11
11
12
+ Its data conversion pipeline is as follows:
13
+ ```
14
+ audio_to_spectrum.py -> spectrum
15
+ -> normalized_half_spectrum
16
+ -> log_spectrum -> preprocess node(s) -> preprocessed spectrum -> spectrum_to_spectrogram.py -> spectrogram
17
+ ```
18
+
12
19
Here is an example using rosbag with 300Hz audio.
13
20
``` bash
14
21
roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
@@ -18,19 +25,48 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
18
25
| ---| ---| ---|
19
26
| <img src =" docs/images/audio_amplitude.jpg " width =" 429 " >| ![ ] ( https://user-images.githubusercontent.com/19769486/82075694-9a7ac300-9717-11ea-899c-db6119a76d52.png ) | ![ ] ( https://user-images.githubusercontent.com/19769486/82075685-96e73c00-9717-11ea-9abc-e6e74104d666.png ) |
20
27
28
+ You can also convert data other than audio to spectrum and spectrogram data using this package.
29
+ Here is an example using rosbag of a force torque sensor sensing drill vibration.
30
+ ``` bash
31
+ roslaunch audio_to_spectrogram sample_wrench_to_spectrogram.launch
32
+ ```
33
+
34
+ | Z-axis Force Amplitude| Normalized Half Spectrum| Spectrogram Source Spectrum| Spectrogram|
35
+ | ---| ---| ---| ---|
36
+ | <img src =" docs/images/wrench_amplitude.jpg " >| <img src =" docs/images/wrench_normalized_half_spectrum.jpg " >| <img src =" docs/images/wrench_spectrogram_source.jpg " >| <img src =" docs/images/wrench_spectrogram.jpg " >|
37
+
21
38
# Scripts
22
39
23
40
## audio_to_spectrum.py
41
+
24
42
A script to convert audio to spectrum.
25
43
26
44
- ### Publishing topics
27
-
28
45
- ` ~spectrum ` (` jsk_recognition_msgs/Spectrum ` )
29
46
30
- Spectrum data calculated from audio by FFT.
47
+ Spectrum data calculated from audio by FFT.
48
+ It is usual "amplitude spectrum".
49
+ See https://ryo-iijima.com/fftresult/ for details.
50
+
51
+ - ` ~normalized_half_spectrum ` (` jsk_recognition_msgs/Spectrum ` )
52
+
53
+ Spectrum data which is "half" (having non-negative frequencies (0Hz-Nyquist frequency)) and is "normalized" (consistent with the amplitude of the original signal).
54
+ See the following for details.
55
+ - https://ryo-iijima.com/fftresult/
56
+ - https://stackoverflow.com/questions/63211851/why-divide-the-output-of-numpy-fft-by-n
57
+ - https://github.com/jsk-ros-pkg/jsk_recognition/issues/2761#issue-1550715400
58
+
59
+ - ` ~log_spectrum ` (` jsk_recognition_msgs/Spectrum ` )
60
+
61
+ Log-scaled spectrum data.
62
+ It is calculated by applying log to the absolute value of the FFT result.
63
+ Usually, log is applied to "power spectrum", but we don't use it for simplicity.
64
+ See the following for details.
65
+ - https://github.com/jsk-ros-pkg/jsk_recognition/issues/2761#issuecomment-1445810380
66
+ - http://makotomurakami.com/blog/2020/05/23/5266/
31
67
32
68
- ### Subscribing topics
33
- - ` audio ` (` audio_common_msgs/AudioData ` )
69
+ - ` ~ audio` (` audio_common_msgs/AudioData ` )
34
70
35
71
Audio stream data from microphone. The audio format must be ` wave ` .
36
72
@@ -55,15 +91,94 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
55
91
56
92
Number of bits per audio data.
57
93
58
- - ` ~high_cut_freq ` (` Int ` , default: ` 800 ` )
94
+ - ` ~fft_exec_rate ` (` Double ` , default: ` 50 ` )
95
+
96
+ Rate [ Hz] to execute FFT and publish its results.
97
+
98
+ ## data_to_spectrum.py
99
+
100
+ Generalized version of ` audio_to_spectrum.py ` .
101
+ This script can convert multiple message types to spectrum.
102
+
103
+ - ### Publishing topics
104
+
105
+ Same as ` audio_to_spectrum.py ` .
106
+
107
+ - ### Subscribing topics
108
+ - ` ~input ` (` AnyMsg ` )
109
+
110
+ Topic to which message including data you want to convert to spectrum is published.
111
+
112
+ - ### Parameters
113
+ - ` ~expression_to_get_data ` (` String ` , default: ` m.data ` )
114
+
115
+ Python expression to get data from the input message ` m ` . For example, if your input is ` std_msgs/Float64 ` , it is ` m.data ` .
116
+ Just accessing a field of ` m ` is recommended.
117
+ If you want to do a complex calculation (e.g., using ` numpy ` ), use ` transform ` of ` topic_tools ` before this node.
118
+
119
+ - ` ~data_sampling_rate ` (` Int ` , default: ` 500 ` )
120
+
121
+ Sampling rate [ Hz] of input data.
122
+
123
+ - ` ~fft_sampling_period ` (` Double ` , default: ` 0.3 ` )
124
+
125
+ Period [ s] to sample input data for one FFT.
126
+
127
+ - ` ~fft_exec_rate ` (` Double ` , default: ` 50 ` )
128
+
129
+ Rate [ Hz] to execute FFT and publish its results.
130
+
131
+ - ` ~is_integer ` (` Bool ` , default: ` false ` )
132
+
133
+ Whether input data is integer or not. For example, if your input is ` std_msgs/Float64 ` , it is ` false ` .
134
+
135
+ - ` ~is_signed ` (` Bool ` , default: ` true ` )
136
+
137
+ Whether input data is signed or not. For example, if your input is ` std_msgs/Float64 ` , it is ` true ` .
138
+
139
+ - ` ~bitdepth ` (` Int ` , default: ` 64 ` )
140
+
141
+ Number of bits per input data. For example, if your input is ` std_msgs/Float64 ` , it is ` 64 ` .
142
+
143
+ - ` ~n_channel ` (` Int ` , default: ` 1 ` )
144
+
145
+ If your input is scalar, it is ` 1 ` .
146
+ If your input is flattened 2D matrix, it is number of channel of original matrix.
147
+
148
+ - ` ~target_channel ` (` Int ` , default: ` 0 ` )
149
+
150
+ If your input is scalar, it is ` 0 ` .
151
+ If your input is flattened 2D matrix, it is target channel.
152
+
153
+ ## spectrum_filter.py
154
+
155
+ A script to filter spectrum.
156
+
157
+ - ### Publishing topics
158
+ - ` ~output ` (` jsk_recognition_msgs/Spectrum ` )
159
+
160
+ Filtered spectrum data (` low_cut_freq ` -` high_cut_freq ` ).
161
+
162
+ - ### Subscribing topics
163
+ - ` ~input ` (` jsk_recognition_msgs/Spectrum ` )
164
+
165
+ Original spectrum data.
166
+
167
+ - ### Parameters
168
+ - ` ~data_sampling_rate ` (` Int ` , default: ` 500 ` )
169
+
170
+ Sampling rate [ Hz] of data used in generation of original spectrum data.
171
+
172
+ - ` ~high_cut_freq ` (` Int ` , default: ` 250 ` )
59
173
60
174
Threshold to limit the maximum frequency of the output spectrum.
61
175
62
- - ` ~low_cut_freq ` (` Int ` , default: ` 1 ` )
176
+ - ` ~low_cut_freq ` (` Int ` , default: ` 0 ` )
63
177
64
178
Threshold to limit the minimum frequency of the output spectrum.
65
179
66
180
## spectrum_to_spectrogram.py
181
+
67
182
A script to convert spectrum to spectrogram.
68
183
69
184
- ### Publishing topics
@@ -128,7 +243,7 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
128
243
129
244
Number of bits per audio data.
130
245
131
- - ` ~maximum_amplitude ` (` Int ` , default: ` 10000 ` )
246
+ - ` ~maximum_amplitude ` (` Double ` , default: ` 10000.0 ` )
132
247
133
248
Maximum range of amplitude to plot.
134
249
@@ -140,6 +255,66 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
140
255
141
256
Publish rate [ Hz] of audio amplitude image topic.
142
257
258
+ ## data_amplitude_plot.py
259
+
260
+ Generalized version of ` audio_amplitude_plot.py ` .
261
+
262
+ - ### Publishing topics
263
+
264
+ - ` ~output/viz ` (` sensor_msgs/Image ` )
265
+
266
+ Data amplitude plot image.
267
+
268
+ - ### Subscribing topics
269
+ - ` ~input ` (` AnyMsg ` )
270
+
271
+ Topic to which message including data whose amplitude you want to plot is published.
272
+
273
+ - ### Parameters
274
+ - ` ~expression_to_get_data ` (` String ` , default: ` m.data ` )
275
+
276
+ Python expression to get data from the input message ` m ` . For example, if your input is ` std_msgs/Float64 ` , it is ` m.data ` .
277
+ Just accessing a field of ` m ` is recommended.
278
+ If you want to do a complex calculation (e.g., using ` numpy ` ), use ` transform ` of ` topic_tools ` before this node.
279
+
280
+ - ` ~data_sampling_rate ` (` Int ` , default: ` 500 ` )
281
+
282
+ Sampling rate [ Hz] of input data.
283
+
284
+ - ` ~is_integer ` (` Bool ` , default: ` false ` )
285
+
286
+ Whether input data is integer or not. For example, if your input is ` std_msgs/Float64 ` , it is ` false ` .
287
+
288
+ - ` ~is_signed ` (` Bool ` , default: ` true ` )
289
+
290
+ Whether input data is signed or not. For example, if your input is ` std_msgs/Float64 ` , it is ` true ` .
291
+
292
+ - ` ~bitdepth ` (` Int ` , default: ` 64 ` )
293
+
294
+ Number of bits per input data. For example, if your input is ` std_msgs/Float64 ` , it is ` 64 ` .
295
+
296
+ - ` ~n_channel ` (` Int ` , default: ` 1 ` )
297
+
298
+ If your input is scalar, it is ` 1 ` .
299
+ If your input is flattened 2D matrix, it is number of channel of original matrix.
300
+
301
+ - ` ~target_channel ` (` Int ` , default: ` 0 ` )
302
+
303
+ If your input is scalar, it is ` 0 ` .
304
+ If your input is flattened 2D matrix, it is target channel.
305
+
306
+ - ` ~maximum_amplitude ` (` Double ` , default: ` 10.0 ` )
307
+
308
+ Maximum range of amplitude to plot.
309
+
310
+ - ` ~window_size ` (` Double ` , default: ` 10.0 ` )
311
+
312
+ Window size of input data to plot.
313
+
314
+ - ` ~rate ` (` Double ` , default: ` 10.0 ` )
315
+
316
+ Publish rate [ Hz] of data amplitude image topic.
317
+
143
318
## spectrum_plot.py
144
319
145
320
A script to publish frequency vs amplitude plot image.
@@ -159,14 +334,18 @@ roslaunch audio_to_spectrogram sample_audio_to_spectrogram.launch
159
334
Spectrum data calculated from audio by FFT.
160
335
161
336
- ### Parameters
162
- - ` ~plot_amp_min ` (` Double ` , default: ` 0.0 ` )
337
+ - ` ~min_amp ` (` Double ` , default: ` 0.0 ` )
163
338
164
- Minimum value of amplitude in plot
339
+ Minimum value of amplitude in plot.
165
340
166
- - ` ~plot_amp_max ` (` Double ` , default: ` 20.0 ` )
341
+ - ` ~max_amp ` (` Double ` , default: ` 20.0 ` )
167
342
168
- Maximum value of amplitude in plot
343
+ Maximum value of amplitude in plot.
169
344
170
345
- ` ~queue_size ` (` Int ` , default: ` 1 ` )
171
346
172
- Queue size of spectrum subscriber
347
+ Queue size of spectrum subscriber.
348
+
349
+ - ` ~max_rate ` (` Double ` , default: ` -1 ` )
350
+
351
+ Maximum publish rate [ Hz] of frequency vs amplitude plot image. Setting this value low reduces CPU load. ` -1 ` means no maximum limit.
0 commit comments