Skip to content

Adds a script to remove carbonDB duplicates #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Conversation

ribalba
Copy link
Member

@ribalba ribalba commented May 31, 2024

I went for a KISS version which needs to be called via cron. As we will be reworking carbonDB soonish I didn't want to do something overly complex like copying DBs around. We can call this every n minutes. We could consider setting a flag in redis if something has changed but I think we should think about this bigger. Maybe we need a general queuing system like jobs. But more flexible. Not something I see vital for now though

@ribalba ribalba requested a review from ArneTR May 31, 2024 13:21
Copy link

github-actions bot commented May 31, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 22.9131 1568.94 3.48653 458
Measurement #1 23.1277 1568.94 3.48653 450

📈 Energy graph:

 
 8.18 ┤                                                                                                                                    ╭───╮
 7.54 ┤                                                                                                                                    │   │
 6.90 ┤                                                                                                                   ╭╮             ╭╮│   │
 6.26 ┤                                                                                                                   ││         ╭╮  │││   │
 5.62 ┤                                                                                                                   ││        ╭╯│  │╰╯   │
 4.97 ┤             ╭──╮                                                                                                  ││        │ ╰╮ │     │
 4.33 ┤             │  ╰╮    ╭╮    ╭╮                                              ╭╮ ╭╮                       ╭╮         │╰╮╭───╮ ╭╯  ╰─╯     │             ╭╮                          ╭╮                                                                         ╭╮ ╭─────╮                                                                                                                                                                       ╭╮                ╭
 3.69 ┤    ╭────────╯   ╰────╯╰────╯╰──────────────────────────────────────────────╯╰─╯╰───────────────────────╯╰─────────╯ ╰╯   ╰─╯           │     ╭───────╯╰─╮         ╭╮         ╭╮ ╭╯╰─╮          ╭╮         ╭╮ ╭─────╮╭───────────────────────────────────────╯╰─╯     ╰───╮         ╭──────────╮       ╭──╮       ╭──╮          ╭────╮  ╭──╮         ╭───────────────────────────────────────────────────╮ ╭──────────────────────────────────╯╰────────────────╯
 3.05 ┤    │                                                                                                                                   ╰╮   ╭╯          │        ╭╯╰╮        ││ │   │          │╰╮        ││ │     ││                                                    │         │          │       │  │       │  ╰╮        ╭╯    │  │  │         │                                                   │ │
 2.41 ┤    │                                                                                                                                    │   │           │       ╭╯  │        ││ │   ╰╮       ╭─╯ │        ││ │     ││                                                    │         │          │       │  │       │   │        │     │  │  │         │                                                   │ │
 1.77 ┼────╯                                                                                                                                    ╰───╯           ╰───────╯   ╰────────╯╰─╯    ╰───────╯   ╰────────╯╰─╯     ╰╯                                                    ╰─────────╯          ╰───────╯  ╰───────╯   ╰────────╯     ╰──╯  ╰─────────╯                                                   ╰─╯
                                                                                                                                                                                                                                Watts over time

🌳 CO2 Data:
City: Boydton, Lat: 36.677696, Lon: -78.37471
Carbon Intensity for this location: 361 gCO₂eq/kWh
SCI: 0.566387 gCO₂eq / pipeline run emitted

Copy link
Member

@ArneTR ArneTR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remarks:

  • How often do you estimate this script has to run?
  • Can you please fix the filename, it contains a typo
  • The query takes at the moment already > 20 Minutes (it is still running ...). If feels like this is a horrible inefficient operation. What is it trying to achieve by looking at all the data? Can we not close the window of acceptance in CarbonDB by let's say accepting at max 30 day old data? Then the window for duplicates can be narrowed?
  • Further adding to my point before: The query sets a lot of locks. This will block the service. Even if running at night I think this is too intense. See screenshot
Screenshot 2024-06-01 at 10 26 39 AM

@ArneTR
Copy link
Member

ArneTR commented Jun 2, 2024

Update on this: The script cannot finish in it's current form. Taking longer than 12 hours ... then I stopped.

Might be due to the nature of the script or because of the resource limitation on the server.

@ribalba
Copy link
Member Author

ribalba commented Jun 3, 2024

Renamed and changed the logic. Now the query takes 2 seconds and with the old DB full of duplicates 16 secs. So I think this is something we can run every hour so so.

Copy link

github-actions bot commented Jun 3, 2024

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run 22.3409 1671.2 3.46721 490
Measurement #1 22.4811 1671.2 3.46721 483

📈 Energy graph:

 
 8.18 ┤                                                                                                                                                        ╭──╮
 7.54 ┤                                                                                                                                                        │  │
 6.90 ┤                                                                                                                                                       ╭╯  │
 6.26 ┤                                                                                                                                                 ╭╮  ╭╮│   │
 5.62 ┤                                                                                                                                     ╭─╮       ╭╮│╰╮ │╰╯   ╰╮
 4.97 ┤                                                                                          ╭╮                                         │ │       │╰╯ │ │      │
 4.33 ┤                          ╭─╮                                                         ╭╮  ││  ╭╮                          ╭╮         │ │╭────╮ │   ╰╮│      │     ╭╮    ╭╮╭╮                           ╭╮                                                                   ╭╮     ╭╮  ╭─────╮                                                                                                                                                                                      ╭╮
 3.69 ┤    ╭──────╮╭─────────────╯ ╰───────╮  ╭╮  ╭──────────────────────────╮╭─╮  ╭─────────╯╰─╮│╰╮ │╰──────────╮╭──────────────╯╰─────────╯ ╰╯    ╰╮│    ╰╯      │    ╭╯╰────╯││╰─╮         ╭╮         ╭╮ ╭─╯╰─╮         ╭╮            ╭─────────────────────────────────────────╯╰────╮│╰──╯     ╰───╮         ╭────────────╮       ╭──╮       ╭──╮          ╭────╮  ╭──╮         ╭─────────────────────────────────────────────────────╮ ╭──────────────────────╮╭───╮╭────────────────╯╰───────╮╭──
 3.05 ┤    │      ││                       ╰╮╭╯│  │                          ││ │╭─╯            ╰╯ ╰─╯           ╰╯                                  ╰╯            │    │       ││  │        ╭╯│╭╮       ││ │    │         ││╭╮       ╭─╮│                                               ││             │         │            │      ╭╯  │       │  │         ╭╯    │  │  │         │                                                     │ │                      ││   ╰╯                         ╰╯
 2.41 ┤    │      ╰╯                        ││ ╰╮ │                          ╰╯ ││                                                                                 │    │       ││  │        │ │││       │╰╮│    ╰╮       ╭╯╰╯│       │ ││                                               ╰╯             ╰╮        │            │      │   │       │  │         │     │  │  │         │                                                     ╰╮│                      ╰╯
 1.77 ┼────╯                                ╰╯  ╰─╯                             ╰╯                                                                                 ╰────╯       ╰╯  ╰────────╯ ╰╯╰───────╯ ╰╯     ╰───────╯   ╰───────╯ ╰╯                                                               ╰────────╯            ╰──────╯   ╰───────╯  ╰─────────╯     ╰──╯  ╰─────────╯                                                      ╰╯
                                                                                                                                                                                                                                                Watts over time

🌳 CO2 Data:
City: Chicago, Lat: 41.8819, Lon: -87.6278
Carbon Intensity for this location: 384 gCO₂eq/kWh
SCI: 0.641741 gCO₂eq / pipeline run emitted

@ArneTR
Copy link
Member

ArneTR commented Jun 3, 2024

Looks good

@ArneTR ArneTR merged commit 4dea9e9 into main Jun 3, 2024
4 checks passed
@ArneTR ArneTR deleted the remove-carbondb-duplicates branch June 3, 2024 12:02
ArneTR added a commit that referenced this pull request Jun 3, 2024
* main: (60 commits)
  Adds a script to remove carbonDB duplicates (#799)
  Bump uvicorn[standard] from 0.30.0 to 0.30.1 (#801)
  Power per container (#795)
  Wrong verb
  Removed more permissions
  Moved to suspend
  Reducing workflow permissions (#797)
  Moving our workflows to Ubuntu 24.04 because Docker Engine is too old in 22.04 (#794)
  Typo
  Client.py error for docker commands
  Adding network to SCI and clarifications (#793)
  Bump requests from 2.32.2 to 2.32.3 (#791)
  Added sorting by date and unified ci and measurement runs frontend (#769)
  Bump pydantic from 2.7.1 to 2.7.2 (#789)
  macOS test compatibility
  Relaxed tests more to accomodate for different startup times on slower machines
  Relaxed tests
  Uvicorn worker (#788)
  Bump uvicorn[standard] from 0.29.0 to 0.30.0 (#787)
  Add test with missing start_period
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants