Open
Description
I am looking into parallelizing a section of code in detect_anoms where the majority of execution time is spent:
if not one_tail:
ares = abs(data - data.median())
elif upper_tail:
ares = data - data.median()
else:
ares = data.median() - data
ares = ares / data.mad()
tmp_anom_index = ares[ares.values == ares.max()].index
cand = pd.Series(data.loc[tmp_anom_index], index=tmp_anom_index)
data.drop(tmp_anom_index, inplace=True)
Is there a way to refactor the code so that ordering enforced by the for loop for the data.drop invocations is no longer needed?
Similar question here:
for i in range(1, data.size + 1, num_obs_in_period):
start_date = data.index[i]
# if there is at least 14 days left, subset it, otherwise subset last_date - 14 days
end_date = start_date + datetime.timedelta(days=num_days_in_period)
if end_date < data.index[-1]:
all_data.append(
data.loc[lambda x: (x.index >= start_date) & (x.index <= end_date)])
else:
all_data.append(
data.loc[lambda x: x.index >= data.index[-1] - datetime.timedelta(days=num_days_in_period)])
return all_data
I am a software engineer, not a data scientist, so this may be a very naive question. :)
--John
Metadata
Metadata
Assignees
Labels
No labels