-
Notifications
You must be signed in to change notification settings - Fork 32
Adds a script to remove carbonDB duplicates #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Old Energy EstimationEco-CI Output:
📈 Energy graph:
8.18 ┤ ╭───╮
7.54 ┤ │ │
6.90 ┤ ╭╮ ╭╮│ │
6.26 ┤ ││ ╭╮ │││ │
5.62 ┤ ││ ╭╯│ │╰╯ │
4.97 ┤ ╭──╮ ││ │ ╰╮ │ │
4.33 ┤ │ ╰╮ ╭╮ ╭╮ ╭╮ ╭╮ ╭╮ │╰╮╭───╮ ╭╯ ╰─╯ │ ╭╮ ╭╮ ╭╮ ╭─────╮ ╭╮ ╭
3.69 ┤ ╭────────╯ ╰────╯╰────╯╰──────────────────────────────────────────────╯╰─╯╰───────────────────────╯╰─────────╯ ╰╯ ╰─╯ │ ╭───────╯╰─╮ ╭╮ ╭╮ ╭╯╰─╮ ╭╮ ╭╮ ╭─────╮╭───────────────────────────────────────╯╰─╯ ╰───╮ ╭──────────╮ ╭──╮ ╭──╮ ╭────╮ ╭──╮ ╭───────────────────────────────────────────────────╮ ╭──────────────────────────────────╯╰────────────────╯
3.05 ┤ │ ╰╮ ╭╯ │ ╭╯╰╮ ││ │ │ │╰╮ ││ │ ││ │ │ │ │ │ │ ╰╮ ╭╯ │ │ │ │ │ │
2.41 ┤ │ │ │ │ ╭╯ │ ││ │ ╰╮ ╭─╯ │ ││ │ ││ │ │ │ │ │ │ │ │ │ │ │ │ │ │
1.77 ┼────╯ ╰───╯ ╰───────╯ ╰────────╯╰─╯ ╰───────╯ ╰────────╯╰─╯ ╰╯ ╰─────────╯ ╰───────╯ ╰───────╯ ╰────────╯ ╰──╯ ╰─────────╯ ╰─╯
Watts over time 🌳 CO2 Data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My remarks:
- How often do you estimate this script has to run?
- Can you please fix the filename, it contains a typo
- The query takes at the moment already > 20 Minutes (it is still running ...). If feels like this is a horrible inefficient operation. What is it trying to achieve by looking at all the data? Can we not close the window of acceptance in CarbonDB by let's say accepting at max 30 day old data? Then the window for duplicates can be narrowed?
- Further adding to my point before: The query sets a lot of locks. This will block the service. Even if running at night I think this is too intense. See screenshot

Update on this: The script cannot finish in it's current form. Taking longer than 12 hours ... then I stopped. Might be due to the nature of the script or because of the resource limitation on the server. |
Renamed and changed the logic. Now the query takes 2 seconds and with the old DB full of duplicates 16 secs. So I think this is something we can run every hour so so. |
Eco-CI Output:
📈 Energy graph:
8.18 ┤ ╭──╮
7.54 ┤ │ │
6.90 ┤ ╭╯ │
6.26 ┤ ╭╮ ╭╮│ │
5.62 ┤ ╭─╮ ╭╮│╰╮ │╰╯ ╰╮
4.97 ┤ ╭╮ │ │ │╰╯ │ │ │
4.33 ┤ ╭─╮ ╭╮ ││ ╭╮ ╭╮ │ │╭────╮ │ ╰╮│ │ ╭╮ ╭╮╭╮ ╭╮ ╭╮ ╭╮ ╭─────╮ ╭╮
3.69 ┤ ╭──────╮╭─────────────╯ ╰───────╮ ╭╮ ╭──────────────────────────╮╭─╮ ╭─────────╯╰─╮│╰╮ │╰──────────╮╭──────────────╯╰─────────╯ ╰╯ ╰╮│ ╰╯ │ ╭╯╰────╯││╰─╮ ╭╮ ╭╮ ╭─╯╰─╮ ╭╮ ╭─────────────────────────────────────────╯╰────╮│╰──╯ ╰───╮ ╭────────────╮ ╭──╮ ╭──╮ ╭────╮ ╭──╮ ╭─────────────────────────────────────────────────────╮ ╭──────────────────────╮╭───╮╭────────────────╯╰───────╮╭──
3.05 ┤ │ ││ ╰╮╭╯│ │ ││ │╭─╯ ╰╯ ╰─╯ ╰╯ ╰╯ │ │ ││ │ ╭╯│╭╮ ││ │ │ ││╭╮ ╭─╮│ ││ │ │ │ ╭╯ │ │ │ ╭╯ │ │ │ │ │ │ ││ ╰╯ ╰╯
2.41 ┤ │ ╰╯ ││ ╰╮ │ ╰╯ ││ │ │ ││ │ │ │││ │╰╮│ ╰╮ ╭╯╰╯│ │ ││ ╰╯ ╰╮ │ │ │ │ │ │ │ │ │ │ │ ╰╮│ ╰╯
1.77 ┼────╯ ╰╯ ╰─╯ ╰╯ ╰────╯ ╰╯ ╰────────╯ ╰╯╰───────╯ ╰╯ ╰───────╯ ╰───────╯ ╰╯ ╰────────╯ ╰──────╯ ╰───────╯ ╰─────────╯ ╰──╯ ╰─────────╯ ╰╯
Watts over time 🌳 CO2 Data: |
Looks good |
* main: (60 commits) Adds a script to remove carbonDB duplicates (#799) Bump uvicorn[standard] from 0.30.0 to 0.30.1 (#801) Power per container (#795) Wrong verb Removed more permissions Moved to suspend Reducing workflow permissions (#797) Moving our workflows to Ubuntu 24.04 because Docker Engine is too old in 22.04 (#794) Typo Client.py error for docker commands Adding network to SCI and clarifications (#793) Bump requests from 2.32.2 to 2.32.3 (#791) Added sorting by date and unified ci and measurement runs frontend (#769) Bump pydantic from 2.7.1 to 2.7.2 (#789) macOS test compatibility Relaxed tests more to accomodate for different startup times on slower machines Relaxed tests Uvicorn worker (#788) Bump uvicorn[standard] from 0.29.0 to 0.30.0 (#787) Add test with missing start_period ...
I went for a KISS version which needs to be called via
cron
. As we will be reworking carbonDB soonish I didn't want to do something overly complex like copying DBs around. We can call this everyn
minutes. We could consider setting a flag in redis if something has changed but I think we should think about this bigger. Maybe we need a general queuing system like jobs. But more flexible. Not something I see vital for now though