Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retrieval of temperature history data (fix #494) #557

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mcarbonne
Copy link

@mcarbonne mcarbonne commented Dec 18, 2023

This PR allow temperature history data to be retrieved correctly, using Index and Size fields.
collectorSmartData.AtaSctTemperatureHistory.Table is now managed as a circular buffer, starting from Index.

EDIT : I was wrong. The SCT temperature history table is indeed a circular buffer starting from index.
But when outputting json, smartmontools do re-order entries: first = oldest, last = most recent (see ataPrintSCTTempHist from smartmontools).

Details about spec here:
https://tc.gts3.org/cs3210/2016/spring/r/hardware/ATA8-ACS.pdf
(table 82 mostly)
And json generation from smartmontools here:
https://github.com/mirror/smartmontools/blob/master/ataprint.cpp (function ataPrintSCTTempHist)

Anyway, based on my observations, ataPrintSCTTempHist isn't reliable. I tested on 4 SSD (kingston, 2x micron, crucial) and none are fully compliant with the specification.

Here are some issues I found:

  1. logging_interval_minutes / sampling_period_minutes isn't always correct.
    Example (Micron 5200 Pro):
 286    2024-01-02 18:40    29  **********
 287    2024-01-02 18:41    30  ***********
 288    2024-01-02 18:42    29  **********
 ...    ..(  6 skipped).    ..  **********
 295    2024-01-02 18:49    29  **********
 296    2024-01-02 18:50    30  ***********
 ...    ..( 11 skipped).    ..  ***********
 308    2024-01-02 19:02    30  ***********

2 minutes later :

 284    2024-01-02 18:38    29  **********
 285    2024-01-02 18:39    30  ***********
 286    2024-01-02 18:40    29  **********
 ...    ..(  6 skipped).    ..  **********
 293    2024-01-02 18:47    29  **********
 294    2024-01-02 18:48    30  ***********
 ...    ..( 15 skipped).    ..  ***********
 310    2024-01-02 19:04    30  ***********

Most likely, the "real" logging interval is ~ 30 seconds, but this interval is stored in an integer, it cannot work.
Therefore, only the current temperature (the last element of the table from json output) is correct.

  1. When the drive starts (power on/resume), the drive is supposed to insert a special value (0x80, translated to null in json and 0 in go) but:
  • some ssd (2 out of 4 I tested) do not insert the value 0x80 when resuming from sleep.
  • none insert the magic value when powering on.

All values before this magical values must be ignored (because timestamp cannot be guessed). The current implementation only ignore "null" values, but not all the values before.

  1. Current implementation seems to parse the history the wrong way (first=most recent, last=oldest).

@AnalogJ maybe you have more data than me to confirm my observations. If you agree, my proposal is to simply remove history retrieval.

Note: This will fix #494 once for all

@mcarbonne mcarbonne force-pushed the fix_494 branch 3 times, most recently from bca8876 to 7d50795 Compare January 2, 2024 18:51
@mcarbonne mcarbonne marked this pull request as ready for review January 2, 2024 19:22
@mcarbonne
Copy link
Author

Also, to mitigate this change, the default COLLECTOR_CRON_SCHEDULE might be reduced from one day to one hour (COLLECTOR_CRON_SCHEDULE=0 * * * *).

@mcarbonne mcarbonne changed the title Fix retrieval of temperature history data (#494) Fix retrieval of temperature history data (fix #494) Jan 2, 2024
@mcarbonne
Copy link
Author

@AnalogJ I'd like to have your feedback on this PR.
My proposal was to simply remove temperature history retrieval because of its lack of robustness (cf. my arguments above) but If you don't agree, I can update my PR to add an option to disable this feature instead of removing it.

Here is a screenshot of my temperature history graph (with temperature history retrieval enabled) :
image
Obviously, there is something wrong with the temperature history of the green one.

@mcarbonne mcarbonne force-pushed the fix_494 branch 2 times, most recently from 6dd5c67 to ebc4193 Compare March 17, 2024 14:00
(not tested yet)
@AnalogJ
Copy link
Owner

AnalogJ commented Jul 19, 2024

Hey @mcarbonne appreciate your work here -- but yeah I'd rather add a switch to enable/disable it, rather than remove it completely

@mcarbonne
Copy link
Author

Hey @mcarbonne appreciate your work here -- but yeah I'd rather add a switch to enable/disable it, rather than remove it completely

Thanks for you feedback. I have updated my PR with your suggestions but I have not yet been able to test this change: I have build issues (npm related) but haven't found out if it's related to my changes but it is similar to the nightly build issue. I'll try to investigate when I have time.

@mcarbonne
Copy link
Author

@AnalogJ build is working again, feature is behaving as expected, docker image is available (ghcr.io/mcarbonne/scrutiny:master-omnibus). Feel free to give me your feedback.

Conflicts:
	webapp/backend/pkg/database/scrutiny_repository_migrations.go
@mcarbonne
Copy link
Author

@AnalogJ PR updated :)

@helgehatt
Copy link

Would be great if this could get merged anytime soon.

@AnalogJ
Copy link
Owner

AnalogJ commented Jan 4, 2025

This looks good, let me check why the CI isn't running.

@mcarbonne
Copy link
Author

mcarbonne commented Jan 11, 2025

I did some testing on my fork and indeed, workflows need a little update. I included it in my PR (commit b7d57c0).

This is mandatory according to documentation:
https://github.com/actions/upload-artifact

In matrix scenarios, be careful to not accidentally upload to the same artifact, or else you will encounter conflict errors. It would be best to name the artifact with a prefix or suffix from the matrix:

jobs:
upload:
name: Generate Build Artifacts

strategy:
  matrix:
    os: [ubuntu-latest, windows-latest]
    version: [a, b, c]

runs-on: ${{ matrix.os }}

steps:
- name: Build
  run: ./some-script --version=${{ matrix.version }} > my-binary
- name: Upload
  uses: actions/upload-artifact@v4
  with:
    name: binary-${{ matrix.os }}-${{ matrix.version }}
    path: my-binary

This will result in artifacts like: binary-ubuntu-latest-a, binary-windows-latest-b, and so on.

EDIT : this is a breaking change with v4: Unlike earlier versions of upload-artifact, uploading to the same artifact via multiple jobs is not supported with v4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Temperature history showing incorrect numbers
3 participants