Skip to content

speed up forecast all #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 16, 2025
Merged

speed up forecast all #456

merged 10 commits into from
May 16, 2025

Conversation

peterdudfield
Copy link
Collaborator

@peterdudfield peterdudfield commented May 16, 2025

Pull Request

Description

  • speed up forecast all, this is just for compact=True and no creation time limit. This will help the UI.

Speed ups are

route current [s] new [s]
/gsp/forecast/all/?compact=true 3.67 1.15
/gsp/forecast/all/?compact=true&start_datetime_utc=2025-05-16T10:00:18&end_datetime_utc=2025-05-16T11:00:18 10.7 1.17

Helps with #455

How Has This Been Tested?

  • Ci tests
  • ran locally

Ran UAT checks currently in openclimatefix/airflow-dags#193,
and two forecast/all checks go down from 9 seconds to ~1 second

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield peterdudfield marked this pull request as ready for review May 16, 2025 11:51
@peterdudfield peterdudfield requested a review from braddf May 16, 2025 11:59
Copy link
Contributor

@braddf braddf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments – one small type change, and the others more thoughts that I think would be worth considering for this to really supercharge this as much as possible, but otherwise yep let's see how this runs!

session: Session,
start_datetime_utc: datetime | None = None,
end_datetime_utc: datetime | None = None,
gsp_ids=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this type be Optional[str] = None rather than just None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[str] was the old(er) way to do it, but now it tends be done like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this as I was closing tabs – surely this just isn't typing gsp_ids here then, instead of the optional datetime params above that are datetime | None (should gsp_ids be gsp_ids: str | None for better type inference?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it should be infact gsp_ids: [str] | None, ill make an issue for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many_forecast_values_by_datetime = {}

# loop over locations and gsp yields to create a dictionary of gsp generation by datetime
for forecast_value in forecast_values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, though I'd have to check, that this logic/loop could also be pushed to the DB and further decrease unneeded Python processing on each call, e.g. rounding on the DB side for each collected column value is on the whole a lot faster for larger queries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rounding we could do in the database.

Unfortunately, at some point we do have to loop through it to get it to the shape we want. There are other options like loading it straight into pandas and then doing column operations, which may be quicker, but Im not totally sure.


# 4. convert dictionary to list of OneDatetimeManyForecastValues objects
many_forecast_values = []
for datetime_utc, forecast_values in many_forecast_values_by_datetime.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can quite do a GROUP BY in Postgres for this as I'd first hoped looking at it, as we're changing the shape quite specifically for compact=true (but might be good for non-compact!) – this might be the only bit we need to do Python-side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, unfortuantely I dont think we can do this db side.

@peterdudfield peterdudfield merged commit 73469d4 into main May 16, 2025
2 checks passed
@peterdudfield peterdudfield deleted the speed-up-forecast-all branch May 16, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants