-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/implement daily bcsd #28
Feature/implement daily bcsd #28
Conversation
…aily and debugging functionality
…ture/implement_daily_bcsd
@jhamman - would be awesome to get a review from you on this PR when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgergel - thank you for your patience here. This is a great start and I'm generally a fan of how you've done things. I've added some inline comments below.
I'd love to see a few tests that exercise the NEX option and the PaddedDOYGrouper
. We'll also want to added the groupers classes to the API docs.
skdownscale/pointwise_models/bcsd.py
Outdated
|
||
|
||
def MONTH_GROUPER(x): | ||
return x.month | ||
|
||
|
||
def DAY_GROUPER(x): | ||
return x.day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move the DAY_GROUPER and MONTH_GROUPER functions to the groupers
module.
def check_datetime_index(obj, timestep): | ||
""" helper function to check datetime index for compatibility | ||
""" | ||
if isinstance(obj, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should happen when obj is not a DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually don't think we really need this function - I didn't end up using it in the NASA-NEX daily implementation and I think it's redundant with the testing that is in place now. seem reasonable @jhamman?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I don't think we need this anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - just took this out.
self.df = df | ||
self.offset = offset | ||
self.max = 365 | ||
self.days_of_year = np.arange(1, 366) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which calendars will this support? I'm thinking it will be useful to check that the calendar of df.index is valid for this grouper method.
sec_half = self.days_of_year_wrapped[self.n + self.offset : i + total_days] | ||
all_days = np.concatenate((first_half, np.array([self.n]), sec_half), axis=0) | ||
|
||
assert len(set(all_days)) == total_days, all_days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's convert this to a proper ValueError
:
if len(set(all_days)) != total_days:
raise ValueError('...say something meaningful...')
list_result = [] | ||
for key, group in self: | ||
list_result.append(group.mean().values[0]) | ||
result = pd.Series(list_result, index=self.days_of_year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be a DataFrame. In Pandas, df.groupby(lambda x: x.month).mean() -> pd.DataFrame
if df
is a DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should actually be a Series
? It seems like if I make it a DataFrame, the time index gets wonky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh nm, got this to work along with some other updates for your comment below
list_result = [] | ||
for key, group in self: | ||
list_result.append(group.mean().values[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be slightly faster if you allocate a numpy array via np.full
and populate the values as result[key] = group.mean().values[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@dgergel - anything I can do to help get this finished up? |
@jhamman great timing - I'm working on these edits now and hoping to tie this up soon as well. For your first comment about tests, were you thinking along the lines of unit tests for the PaddedDOYGrouper and NASA-NEX option or some figures that compare it to grouping by month..or both? |
I was just thinking unit tests at this stage. |
@jhamman just finally got around to adding some unit tests and addressed your other comments as well. Looking forward to hearing what you think. Also note that I added in support for leap days into the |
@jhamman I think this should be ready to merge unless you see anything else that you think needs additional mods. |
Thanks @dgergel! Onward! |
adds NASA-NEX modification to classic BCSD bias correction method with quantile mapping by day groups rather than by months, uses a new custom
PaddedDOYGrouper
to do this. To fit quantile maps by day group rather than month, the model is fit with the flagtime_grouper='daily_nasa-nex'
, e.g.model_nasanex = BcsdTemperature(time_grouper='daily_nasa-nex', return_anoms=False).fit(X_train, y_train)
This currently doesn't support leap days, that's a future to-do.
ref #27