Skip to content

ValueError: cannot reindex on an axis with duplicate labels with Pandas 3.0.0 #2710

@Dillonwong12

Description

@Dillonwong12

With the new Pandas 3.0.0 release, Prophet throws the following error upon calling model.fit() with additional regressors.

Traceback (most recent call last):
  File "/home/user/repos/repo/libs/proj/src/scripts/forecast.py", line 45, in <module>
    m.fit(train)
    ~~~~~^^^^^^^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/prophet/forecaster.py", line 1220, in fit
    model_inputs = self.preprocess(df, **kwargs)
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/prophet/forecaster.py", line 1141, in preprocess
    self.make_all_seasonality_features(self.history))
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/prophet/forecaster.py", line 849, in make_all_seasonality_features
    component_cols, modes = self.regressor_column_matrix(
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        seasonal_features, modes
        ^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/prophet/forecaster.py", line 901, in regressor_column_matrix
    component_cols = pd.crosstab(
                     ~~~~~~~~~~~^
        components['col'], components['component'],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ).sort_index(level='col')
    ^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/reshape/pivot.py", line 1099, in crosstab
    df = DataFrame(data, index=common_idx)
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/frame.py", line 769, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/internals/construction.py", line 447, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, consolidate=copy)
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/internals/construction.py", line 117, in arrays_to_mgr
    arrays, refs = _homogenize(arrays, index, dtype)
                   ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/internals/construction.py", line 555, in _homogenize
    val = val.reindex(index)
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/series.py", line 5525, in reindex
    return super().reindex(
           ~~~~~~~~~~~~~~~^
        index=index,
        ^^^^^^^^^^^^
    ...<5 lines>...
        copy=copy,
        ^^^^^^^^^^
    )
    ^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/generic.py", line 5476, in reindex
    return self._reindex_axes(
           ~~~~~~~~~~~~~~~~~~^
        axes, level, limit, tolerance, method, fill_value
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ).__finalize__(self, method="reindex")
    ^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/generic.py", line 5498, in _reindex_axes
    new_index, indexer = ax.reindex(
                         ~~~~~~~~~~^
        labels, level=level, limit=limit, tolerance=tolerance, method=method
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/user/repos/repo/.venv/lib/python3.13/site-packages/pandas/core/indexes/base.py", line 4253, in reindex
    raise ValueError("cannot reindex on an axis with duplicate labels")
ValueError: cannot reindex on an axis with duplicate labels

Here is a minimal script to reproduce the error:

import pandas as pd
from prophet import Prophet

if __name__ == "__main__":
    prophet_model = Prophet(
        weekly_seasonality=False,
        daily_seasonality=False,
        changepoint_prior_scale=0.1,
        seasonality_prior_scale=0.1,
        holidays_prior_scale=0.1,
    )
    prophet_model.add_regressor("is_weekend")
    prophet_model.add_regressor(
        "holiday_PH",
    )
    [prophet_model.add_regressor(f"dow_{index}") for index in range(7)]

    h = pd.DataFrame(
    {
        "ds": pd.to_datetime(
            [
                "2024-06-27",
                "2024-06-28",
                "2024-06-29",
                "2024-06-30",
                "2024-07-01",
            ]
        ),
        "y": [42, 45, 38, 35, 47],
        "is_weekend": [0, 0, 1, 1, 0],
        "holiday_PH": [0, 0, 0, 0, 0],
        "dow_0": [0, 0, 0, 0, 1],
        "dow_1": [0, 0, 0, 0, 0],
        "dow_2": [0, 0, 0, 0, 0],
        "dow_3": [1, 0, 0, 0, 0],
        "dow_4": [0, 1, 0, 0, 0],
        "dow_5": [0, 0, 1, 0, 0],
        "dow_6": [0, 0, 0, 1, 0],
        }
    )

    future_end_date = pd.to_datetime("2024-07-07")
    train = h[
        (h.ds >= pd.to_datetime("2024-06-27"))
        & (h.ds < pd.to_datetime("2024-06-30"))
    ]
    future = h[
        (h.ds >= pd.to_datetime("2024-06-30")) & (h.ds <= future_end_date)
    ]

    m = prophet_model

    m.fit(train)

    forecasts = m.predict(pd.DataFrame(future))

The issue seems to only arise when additional regressors are added, but the code works fine with Pandas 2.3.3. Is there any workaround where additional regressors can be preserved?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions