TransformerProduct results in error in specific case #1310

yolking · 2023-08-09T13:56:28Z

Hello I am trying to estimate simple model and get an error.
First lets create some data

import numpy as np
import pandas as pd
from sklearn import datasets
from river.compose import Select
from river import preprocessing
from river.compat import convert_sklearn_to_river
from sklearn.linear_model import SGDRegressor
np.random.seed(1000)
X,y = datasets.make_regression(n_samples=5000, n_features=2)
X = pd.DataFrame(X)
X.columns = ['feat_1','feat_2']
X['cat'] = np.random.randint(1, 100, X.shape[0])
X['cat'] = X['cat'].astype('string')
y = pd.Series(y)

Now model:

group1 = Select('cat') | preprocessing.OneHotEncoder() 
group2 = Select('feat_2') | preprocessing.StandardScaler()
model = group1 + group1*group2*group2  | convert_sklearn_to_river(SGDRegressor())

model.predict_many(X)
model.learn_many(X,y)

I get next error:

File ~\MyEnv\venv\lib\site-packages\river\compose\union.py:303, in <genexpr>(.0)
    299 def transform_many(self, X):
    300     """Passes the data through each transformer and packs the results together."""
    302     return pd.concat(
--> 303         (t.transform_many(X) for t in self.transformers.values()),
    304         copy=False,
    305         axis=1,
    306     )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:110, in TransformerProduct.transform_many(self, X)
    106     # Default
    107     return np.multiply(a, b)
    109 return pd.DataFrame(
--> 110     {
    111         "*".join(combo): functools.reduce(
    112             multiply, (outputs[i][f] for i, f in enumerate(combo))
    113         )
    114         for combo in itertools.product(*outputs)
    115     },
    116     index=X.index,
    117 )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:111, in <dictcomp>(.0)
    106     # Default
    107     return np.multiply(a, b)
    109 return pd.DataFrame(
    110     {
--> 111         "*".join(combo): functools.reduce(
    112             multiply, (outputs[i][f] for i, f in enumerate(combo))
    113         )
    114         for combo in itertools.product(*outputs)
    115     },
    116     index=X.index,
    117 )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:102, in TransformerProduct.transform_many.<locals>.multiply(a, b)
    100 # Fast-track for sparse * numeric
    101 if pd.api.types.is_sparse(a):
--> 102     return pd.arrays.SparseArray(a * b, fill_value=a.sparse.fill_value)
    103 # Fast-track for numeric * sparse
    104 if pd.api.types.is_sparse(b):
'SparseArray' object has no attribute 'sparse'

The problem happens in group1*group2*group2 place, if I use group1*group2 or group2*group2 or even group2*group2*group1 it works fine

The text was updated successfully, but these errors were encountered:

MaxHalford · 2023-08-09T15:00:35Z

Thanks for opening this issue! I just fixed this edge-case in #1311.

MaxHalford mentioned this issue Aug 9, 2023

Fix transformer product edge case #1311

Merged

MaxHalford closed this as completed in #1311 Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerProduct results in error in specific case #1310

TransformerProduct results in error in specific case #1310

yolking commented Aug 9, 2023 •

edited by MaxHalford

Loading

MaxHalford commented Aug 9, 2023

TransformerProduct results in error in specific case #1310

TransformerProduct results in error in specific case #1310

Comments

yolking commented Aug 9, 2023 • edited by MaxHalford Loading

MaxHalford commented Aug 9, 2023

yolking commented Aug 9, 2023 •

edited by MaxHalford

Loading