Skip to content

Commit c60c8e3

Browse files
committed
doc: added proper Notes for weight selection strategies of WhittakerSmooth
1 parent e6e8405 commit c60c8e3

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

chemotools/smooth/_whittaker_smooth.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,13 +256,37 @@ def transform(
256256
sample_weight : ndarray of shape (n_features,), (n_samples, n_features), (1, n_features), or None, default=None
257257
Individual weights for each of the input data. If only 1 weight vector is
258258
provided, it is assumed to be the same for the features all samples.
259+
No weights may be negative (< 0.0) and at least one weight needs to be
260+
positive (> 0.0).
261+
Providing them is mandatory when the optimum penalty weight ``lam`` is to be
262+
determined automatically via the log marginal likelihood (``"logml"``)
263+
method.
259264
If ``None``, all features are assumed to have the same weight.
265+
Please refer to the Notes section for further details on selecting the
266+
weights.
260267
261268
Returns
262269
-------
263270
X_smoothed : ndarray of shape (n_samples, n_features)
264271
The transformed data.
265272
273+
Notes
274+
-----
275+
If estimates of the standard deviations ``s_i`` of each data point are
276+
available, e.g., from theoretical considerations or repeated measurements, it is
277+
recommended to use the inverse of the squared standard deviations as weights,
278+
i.e., ``w_i = 1 / (s_i * s_i)``. This is a very effective way to down-weight
279+
noisy data points and thus reduce the risk of noise-induced artifacts in the
280+
smoothed signal. On the other hand, features measured with high confidence will
281+
remain well-preserved even under strong smoothing.
282+
Sometimes, it is infeasible to provide standard deviations because theoretical
283+
considerations are not appropriate and replicate measurements are not available/
284+
feasible. In such scenarios, the weights can still be estimated by making use of
285+
the function :func:`chemotools.smooth.estimate_noise_stddev` with a `power=-2`.
286+
It relies on the parameter ``window_length`` to estimate the local/global noise
287+
standard deviation of the spectrum, but please refer to the documentation of the
288+
function for further details.
289+
266290
""" # noqa: E501
267291

268292
# Check that the estimator is fitted
@@ -313,13 +337,35 @@ def fit_transform(
313337
provided, it is assumed to be the same for the features all samples.
314338
No weights may be negative (< 0.0) and at least one weight needs to be
315339
positive (> 0.0).
340+
Providing them is mandatory when the optimum penalty weight ``lam`` is to be
341+
determined automatically via the log marginal likelihood (``"logml"``)
342+
method.
316343
If ``None``, all features are assumed to have the same weight.
344+
Please refer to the Notes section for further details on selecting the
345+
weights.
317346
318347
Returns
319348
-------
320349
X_smoothed : ndarray of shape (n_samples, n_features)
321350
The transformed data.
322351
352+
Notes
353+
-----
354+
If estimates of the standard deviations ``s_i`` of each data point are
355+
available, e.g., from theoretical considerations or repeated measurements, it is
356+
recommended to use the inverse of the squared standard deviations as weights,
357+
i.e., ``w_i = 1 / (s_i * s_i)``. This is a very effective way to down-weight
358+
noisy data points and thus reduce the risk of noise-induced artifacts in the
359+
smoothed signal. On the other hand, features measured with high confidence will
360+
remain well-preserved even under strong smoothing.
361+
Sometimes, it is infeasible to provide standard deviations because theoretical
362+
considerations are not appropriate and replicate measurements are not available/
363+
feasible. In such scenarios, the weights can still be estimated by making use of
364+
the function :func:`chemotools.smooth.estimate_noise_stddev` with a `power=-2`.
365+
It relies on the parameter ``window_length`` to estimate the local/global noise
366+
standard deviation of the spectrum, but please refer to the documentation of the
367+
function for further details.
368+
323369
""" # noqa: E501
324370

325371
return self.fit(X=X).transform(X=X, sample_weight=sample_weight)

0 commit comments

Comments
 (0)