-
-
Notifications
You must be signed in to change notification settings - Fork 24
speed up forecast all #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments – one small type change, and the others more thoughts that I think would be worth considering for this to really supercharge this as much as possible, but otherwise yep let's see how this runs!
session: Session, | ||
start_datetime_utc: datetime | None = None, | ||
end_datetime_utc: datetime | None = None, | ||
gsp_ids=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this type be Optional[str] = None
rather than just None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[str] was the old(er) way to do it, but now it tends be done like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this as I was closing tabs – surely this just isn't typing gsp_ids
here then, instead of the optional datetime params above that are datetime | None
(should gsp_ids
be gsp_ids: str | None
for better type inference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it should be infact gsp_ids: [str] | None
, ill make an issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many_forecast_values_by_datetime = {} | ||
|
||
# loop over locations and gsp yields to create a dictionary of gsp generation by datetime | ||
for forecast_value in forecast_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, though I'd have to check, that this logic/loop could also be pushed to the DB and further decrease unneeded Python processing on each call, e.g. rounding on the DB side for each collected column value is on the whole a lot faster for larger queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding we could do in the database.
Unfortunately, at some point we do have to loop through it to get it to the shape we want. There are other options like loading it straight into pandas and then doing column operations, which may be quicker, but Im not totally sure.
|
||
# 4. convert dictionary to list of OneDatetimeManyForecastValues objects | ||
many_forecast_values = [] | ||
for datetime_utc, forecast_values in many_forecast_values_by_datetime.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can quite do a GROUP BY in Postgres for this as I'd first hoped looking at it, as we're changing the shape quite specifically for compact=true
(but might be good for non-compact!) – this might be the only bit we need to do Python-side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, unfortuantely I dont think we can do this db side.
Pull Request
Description
Speed ups are
/gsp/forecast/all/?compact=true
/gsp/forecast/all/?compact=true&start_datetime_utc=2025-05-16T10:00:18&end_datetime_utc=2025-05-16T11:00:18
Helps with #455
How Has This Been Tested?
Ran UAT checks currently in openclimatefix/airflow-dags#193,
and two forecast/all checks go down from 9 seconds to ~1 second
Checklist: