Refactor loops to enable parallelization?

I am looking into parallelizing a section of code in detect_anoms where the majority of execution time is spent:

        if not one_tail:
            ares = abs(data - data.median())
        elif upper_tail:
            ares = data - data.median()
        else:
            ares = data.median() - data

        ares = ares / data.mad()

        tmp_anom_index = ares[ares.values == ares.max()].index
        cand = pd.Series(data.loc[tmp_anom_index], index=tmp_anom_index)

        data.drop(tmp_anom_index, inplace=True)

Is there a way to refactor the code so that ordering enforced by the for loop for the data.drop invocations is no longer needed? 

Similar question here:

    for i in range(1, data.size + 1, num_obs_in_period):
        start_date = data.index[i]
        # if there is at least 14 days left, subset it, otherwise subset last_date - 14 days
        end_date = start_date + datetime.timedelta(days=num_days_in_period)
        if end_date < data.index[-1]:
            all_data.append(
                data.loc[lambda x: (x.index >= start_date) & (x.index <= end_date)])
        else:
            all_data.append(
                data.loc[lambda x: x.index >= data.index[-1] - datetime.timedelta(days=num_days_in_period)])
    return all_data

I am a software engineer, not a data scientist, so this may be a very naive question. :)

--John

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor loops to enable parallelization? #32

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Refactor loops to enable parallelization? #32

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions