Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acoustics Module #349

Merged
merged 78 commits into from
Oct 29, 2024
Merged

Acoustics Module #349

merged 78 commits into from
Oct 29, 2024

Conversation

jmcvey3
Copy link
Contributor

@jmcvey3 jmcvey3 commented Aug 29, 2024

MHKiT Acoustics Module

This pull request introduces a new acoustics module to MHKiT, aimed at providing comprehensive tools for processing and analyzing passive acoustic data from hydrophone recordings. The module is designed to facilitate compliance with the IEC-TS 62600-40 standard for assessing the effects of marine energy devices on the marine environment. It includes functionalities for reading hydrophone data, performing spectral analyses, applying calibrations, calculating sound pressure levels, and visualizing results.

The addition of this acoustics module significantly enhances MHKiT's capabilities in passive acoustic data analysis, providing standardized, robust tools for the marine renewable energy community. It facilitates compliance with international standards and supports detailed acoustic monitoring and environmental impact assessments of marine energy devices.

Dependencies and Compatibility

  • Hydrophone Models: Initially supports SoundTrap and icListen hydrophones, with the potential to add more models.

Key Components

analysis.py

Provides core analytical functions that process sound pressure data in both the frequency and time domains:

  • Frequency Validation and Warning:

    • _fmax_warning: Ensures the specified maximum frequency does not exceed the Nyquist frequency, adjusting it if necessary to prevent aliasing issues.
  • Shallow Water Cutoff Frequency:

    • minimum_frequency: Calculates the minimum frequency cutoff based on water depth and the speed of sound in water and seabed materials, addressing limitations in shallow water environments.
  • Spectral Density Calculations:

    • sound_pressure_spectral_density: Computes the mean square sound pressure spectral density (SPSD) using FFT binning with Hanning windowing and 50% overlap, converting time-series data into the frequency domain.
  • Calibration:

    • apply_calibration: Applies calibration adjustments to the spectral density data using a sensitivity curve, ensuring accurate representation of the hydrophone's frequency response.
  • Spectral Density Level Calculation:

    • sound_pressure_spectral_density_level: Converts mean square spectral density values to sound pressure spectral density levels (SPSDL) in decibels.
  • Spectral Density Aggregation:

    • band_aggregate: Aggregates spectral density data into fractional octave bands (e.g., third-octave, decidecade) using specified statistical methods like median or mean.
    • time_aggregate: Aggregates spectral density data into specified time windows, facilitating statistical analyses over time.
  • Sound Pressure Level Calculation:

    • sound_pressure_level: Computes the overall sound pressure level (SPL) within a frequency band from SPSD data.
    • third_octave_sound_pressure_level and decidecade_sound_pressure_level: Calculate SPLs across third-octave and decidecade bands, respectively, providing detailed frequency band analysis per IEC standards.

io.py

Offers input/output functionalities specific to hydrophone data handling:

  • Data Reading:

    • read_hydrophone: Main function to read a WAV file from a hydrophone, converting raw data into voltage or pressure time series based on available sensitivity data.
    • read_soundtrap: Specialized reader for Ocean Instruments SoundTrap hydrophone files, automatically incorporating appropriate metadata.
    • read_iclisten: Specialized reader for Ocean Sonics icListen hydrophone files, including metadata processing to apply hydrophone sensitivity for direct sound pressure calculation.
  • Audio Export:

    • export_audio: Converts processed sound pressure data back into a WAV file format, with optional gain adjustment to enhance playback quality.
  • Data Extraction:

    • _read_wav_metadata: Extracts metadata from a WAV file, such as bit depth and header information.
    • _calculate_voltage_and_time: Converts raw WAV data into voltage values and generates a time index based on the sampling frequency.

graphics.py

Provides plotting functions for visualizing passive acoustics data:

  • Spectrogram Plotting:

    • plot_spectrogram: Generates a spectrogram plot from SPSDL data, using a logarithmic frequency scale by default for improved readability and analysis of acoustic data over time.
  • Spectra Plotting:

    • plot_spectra: Produces a spectral density plot with a log-transformed x-axis, allowing for clear visualization of spectral density across frequency bands.

All plotting functions accept Matplotlib keyword arguments via **kwargs, enabling users to customize plot aspects such as color, scale, and labels.

Example Notebook Workflow

An accompanying example notebook demonstrates the practical application of the module's functionalities, following procedures outlined by the IEC-TS 62600-40 standard:

  1. Data Import and Scaling:

    • Uses read_hydrophone to read hydrophone WAV files, converting raw data into calibrated voltage or pressure values, considering peak voltage, sensitivity, and gain.
  2. Spectral Density Calculation:

    • Applies sound_pressure_spectral_density to compute SPSD, essential for understanding sound pressure distribution across frequencies.
  3. Calibration Curve Application:

    • Employs apply_calibration to adjust SPSD data based on the hydrophone's sensitivity curve, ensuring accurate spectral measurements.
  4. Spectrogram Visualization:

    • Utilizes plot_spectrogram to generate spectrograms that help visualize sound events and detect potential contaminations like boat noise.
  5. Window Averaging for IEC-40 Compliance:

    • Implements time_aggregate to compute statistical summaries over specified time windows, meeting standardized requirements for characterizing marine energy devices.
  6. Sound Pressure Level Calculation:

    • Calculates SPLs using sound_pressure_level, considering factors like shallow water cutoff frequencies and potential flow noise, adjusting frequency limits as necessary.
  7. Decidecade and Third Octave SPLs:

    • Computes SPLs within specific frequency bands using decidecade_sound_pressure_level and third_octave_sound_pressure_level, enabling detailed analysis of sound distribution in line with IEC requirements.
  8. Optional Playback:

    • Demonstrates the use of export_audio to rescale and export audio files, facilitating better playback and interpretation of acoustic data.

jmcvey3 and others added 30 commits May 30, 2024 18:30
# MHKiT v0.8.1
MHKiT v0.8.1, includes bug fixes in the example notebooks and fixes the dependencies to requirements updates prior to Numpy 2.0.0.

Fixes MHKIT v0.8.0 installation issues (MHKiT-Software#334) by fixing the dependencies.
- MHKiT-Software#335

Fixes bugs in MHKiT example notebooks
- MHKiT-Software#327
…-Software#326)

* tke updates

* Fix shear velocity functions

* More detail for tke shear production

* Don't rotate heading beyond 360 degrees

* Fix some typos in notebook

* Rename deprecated function
…#340)

* Test: Determine method using input frequency index

* Feat: Use sum of sines if ifft is not computable

This change allows `surface_elevation` to return a result if the user
inputs a spectrum with a frequency index that does not have a zero
frequency.

If the non zero frequency index condition is found when the method is
`ifft` we warn the user and change the method to `sum_of_sines`

* Fix: Use previously found frequency index

S.index may not exist for some input datasets, but f[0] does and we
should use the value of f[0] here.

* Test: Warn when using ifft with a non zero frequency

* Lint
This PR adds a Github action to test the example notebooks as part of or CD pipeline. Additionally a timeout is added to which notebooks will fail if they exceed the given time.
This PR addresses MHKiT-Software#315 by:
- Changes the data called by the Wind Toolkit tests so that they run faster
- Updates the .csv files that the tests compare against
- Updates a few descriptions in the metocean example, fixes a sorting issue, reduces the data downloaded there
- Tests in hindcast match the notebooks and use the same cache.
Bug fix by removing matplotlib version check needed for < 3.8.0
@ssolson
Copy link
Contributor

ssolson commented Sep 26, 2024

@jmcvey3 could you take a look at my comments above?

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking the PR on in bites. Today I got through the Acoustics example notebook. This is really great work @jmcvey3 I only have a couple of minor comments I think could add clarity for those less familiar with the field/ example.

" gain=0, \n",
" start_time=\"2024-06-01T05:31:14\"\n",
")\n",
"print(P)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider addint the follow comments above the print or somewhere else to add a little additional context about the output

# `P` is returned as an xarray DataArray, which allows easy handling of labeled multi-dimensional arrays. 
# In this case, the array represents sound pressure in Pascals (Pa) over time because the sensitivity was passed.

"cell_type": "markdown",
"metadata": {},
"source": [
"\"Smart\" hydrophones are those where the hydrophone element, pre-amplifier board, ADC, motherboard and memory card are sold in a single package. Companies that sell these often store metadata in the .wav file header, and MHKiT has a couple of wrapper functions for these hydrophones.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this paragraph I think you could add clarity by adding a sentence to the end like: "Ocean Sonics icListen and Ocean Instruments Soundtrap are two common smart hydrophones datafiles with examples as follows". This would help add context about the following two examples

"\n",
"A calibration curve consists of the hydrophone's sensitivity (in units of dB rel $1 V^2/uPa^2$) vs frequency and should be applied to the spectral density we just calculated.\n",
"\n",
"The easiest way to do apply a sensitivity calibration curve in MHKiT is to first copy the calibration data into a CSV file, where the left column contains the calibrated frequencies and the right column contains the sensitivity values. Here we use the function in the following codeblock to read in a CSV file I created with the column headers \"Frequency\" and \"Analog Sensitivity\"."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "do" => "The easiest way to do apply..."

remove "I" => "to read in a CSV file I created with"

}
],
"source": [
"spsdl[spsdl.dims[-1]].max().item()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a comment like "Get max frequency if we are going to keep this"

Comment on lines 675 to 681
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete empty cell

return out


def time_average(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are multiple methods what about:
aggregate_spectral_density or bin_and_aggregate_spectral_density ?

Using aggregate here to refer to a generalized aggregation method. Or bin and aggregate to be explicit that you're binning the data in time and applying a general aggregation method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with "time_aggregate", since that's the dimension that's being operated on.

raise ValueError("'pressure' must be indexed by 'time' dimension.")

# window length of each time series
nbin = bin_length * fs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bin and fs can be floats I think this could be a float. Maybe best practice to cast as int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"nbin" doesn't have to be a float, and it won't throw an error with the binning functions used here.

psd_adj,
coords={"time": psd_adj["time"], "freq": psd_adj["freq"]},
attrs={
"units": pressure.units + "^2/Hz",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are pressure units enforced to be defined somewhere else?

Maybe we use getattr and assign a default null value otherwise?
units = getattr(pressure, 'units', '') + '^2/Hz'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually hardcode them in the io.py file, but if a user wants to use their own imported data they'll run into more issues than just this, I think.

spsd_calibrated /= sensitivity_ratio # uPa^2/Hz
spsd_calibrated /= 1e12 # Pa^2/Hz
spsd_calibrated.attrs["units"] = "Pa^2/Hz"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You give long names elsewhere. Maybe add one here?
spsd_calibrated.attrs["long_name"] = "Calibrated Sound Pressure Spectral Density"

method: Union[str, Dict[str, Union[float, int]]]
) -> Tuple[str, Optional[Union[float, int]]]:
"""
Validates the 'method' parameter and returns the method name and argument (if any).
Copy link
Contributor

@ssolson ssolson Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this the docstring could be improved by describing what this function raises and showing examples:

    """
    Validates the 'method' parameter and returns the method name and its argument (if any).

    Parameters
    ----------
    method : str or dict
        The aggregation method to validate. It can be either:
          - A string representing one of the supported methods without additional arguments,
            e.g., 'mean', 'sum'.
          - A dictionary with a single key-value pair where the key is the method name and
            the value is its argument, e.g., {'quantile': 0.25}.

        Supported methods are:
          - 'median'
          - 'mean'
          - 'min'
          - 'max'
          - 'sum'
          - 'quantile' (requires an argument between 0 and 1)
          - 'std'
          - 'var'
          - 'count'

    Returns
    -------
    method_name : str
        The validated method name in lowercase.
    method_arg : float, int, or None
        The argument associated with the method, if applicable; otherwise, None.

    Raises
    ------
    ValueError
        - If the method name is not supported.
        - If the 'quantile' method is provided without an argument or with an invalid argument.
        - If the 'method' dictionary does not contain exactly one key-value pair.
        - If 'method' is of an unsupported type.
    TypeError
        - If the key in the 'method' dictionary is not a string.

    Examples
    --------
    >>> _validate_method('mean')
    ('mean', None)

    >>> _validate_method({'quantile': 0.75})
    ('quantile', 0.75)

    >>> _validate_method('quantile')
    ValueError: The 'quantile' method must be provided as a dictionary with the quantile value, e.g., {'quantile': 0.25}.

    >>> _validate_method({'quantile': 1.5})
    ValueError: The 'quantile' method must have a float between 0 and 1 as an argument.

    >>> _validate_method({'unsupported_method': None})
    ValueError: Method 'unsupported_method' is not supported. Supported methods are: ['median', 'mean', 'min', 'max', 'sum', 'quantile', 'std', 'var', 'count']

    """
    ```

return method_name, method_arg


def band_average(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a misleading name.

What about band_aggregate to refer to a generalized aggregation method dependent on the passed method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work

Returns
-------
mspl : xarray.DataArray (time, freq_bins)
Sound pressure level [dB re 1 uPa] indexed by time and third octave bands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decidecade bands (not third octave)

@ssolson
Copy link
Contributor

ssolson commented Oct 22, 2024

I still need to review graphics and io

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the acoustics graphics module

Comment on lines 39 to 40
time = spsdl.dims[0]
freq = spsdl.dims[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that time is the first dimension and frequency is the last is probably not the best design.

What if we allowed the user to pass the data or specify the keys?

At a minimum we should check and raise an error if not:

if 'time' in spsdl.dims and 'freq' in spsdl.dims:
    time = 'time'
    freq = 'freq'
else:
    raise ValueError("spsdl must have 'time' and 'freq' dimensions.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well no, because you can have a "time_bins" or "freq_bins" coordinates, which are always first and second, respectively, using this codebase.

If you're smart enough to know how to a read a hydrophone file yourself you probably also have plotting functionality.

fig, ax = plt.subplots(figsize=(6, 5), subplot_kw={"yscale": "log"})
fig.subplots_adjust(left=0.1, right=0.95, top=0.97, bottom=0.11)
h = ax.pcolormesh(
spsdl[time].values, spsdl[freq].values, spsdl.T, shading="nearest", **kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the user will have more than the 2 dimensions expected here (time, freq)?

I think it is and we should explicitly transpose on the desired dimension:
spsdl_transposed = spsdl.transpose(freq, time)

Copy link
Contributor Author

@jmcvey3 jmcvey3 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they use our I/O functions, no. But your line there is safer.

h = ax.pcolormesh(
spsdl[time].values, spsdl[freq].values, spsdl.T, shading="nearest", **kwargs
)
fig.colorbar(h, ax=ax, label=spsdl.units)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expect the user will always have units defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they use the I/O functions they will. I can't imagine they'd import data elsewhere.


def plot_spectogram(spsdl, fmin=10, fmax=100000, fig=None, ax=None, **kwargs):
"""
Plots the spectogram of the sound pressure spectral density level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spectrogram

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmao I can't spell

ax: matplotlib.pyplot.axis
Figure plot axis
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add some error handling to the function:

if not isinstance(fmin, (int, float)) or not isinstance(fmax, (int, float)):
    raise TypeError("fmin and fmax must be numeric types.")
if fmin >= fmax:
    raise ValueError("fmin must be less than fmax.")

from .analysis import _fmax_warning


def plot_spectogram(spsdl, fmin=10, fmax=100000, fig=None, ax=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add typehints

"""

# Set dimension names
freq = spsdl.dims[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the frequency dimension is the last dimension may not always hold true. We should at least check but should probably make this configurable for ease of use:

if 'freq' in spsdl.dims:
    freq = 'freq'
else:
    raise ValueError("spsdl must have a 'freq' dimension.")

Copy link
Contributor Author

@jmcvey3 jmcvey3 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no. Because if you do any band averaging you'll have a "freq_bins" coordinate. So I assume frequency is the second dimension because that's how I've got it set up

This comment was marked as duplicate.


Parameters
----------
spsdl: xarray DataArray (time, freq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function will work for an unaggregated spsdl. The way you use it in the example is post a time_averaging/ aggregation method.

Should this function have a default aggregation if passed an unaggregated spsdl?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I get when I pass spsdl into the function:

image

Copy link
Contributor Author

@jmcvey3 jmcvey3 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean yeah, no it won't. Each color in the plot above is of a different timestamp, and that's why the "plot_spectogram" function is there.

We can add a warning if the input has more than one timestamp, but I try to leave the plotting functions very generic and with the most flexibility possible. A user should be able to see the output, even if you or I don't find it immediately useful.

@ssolson
Copy link
Contributor

ssolson commented Oct 28, 2024

@jmcvey3 I'm happy with the io module as I went over it in the previous round.

Can I get a quick update on where you are with this PR and if you are good to merge with one before #354?

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Oct 28, 2024

@jmcvey3 I'm happy with the io module as I went over it in the previous round.

Can I get a quick update on where you are with this PR and if you are good to merge with one before #354?

I am ready to merge when you are. Looks like the wave example notebook got messed up again though

@ssolson ssolson merged commit c5b698f into MHKiT-Software:develop Oct 29, 2024
41 checks passed
@jmcvey3 jmcvey3 deleted the acoustics_module branch October 29, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acoustics Acoustics Module enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants