Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformerProduct results in error in specific case #1310

Closed
yolking opened this issue Aug 9, 2023 · 1 comment · Fixed by #1311
Closed

TransformerProduct results in error in specific case #1310

yolking opened this issue Aug 9, 2023 · 1 comment · Fixed by #1311

Comments

@yolking
Copy link

yolking commented Aug 9, 2023

Hello I am trying to estimate simple model and get an error.
First lets create some data

import numpy as np
import pandas as pd
from sklearn import datasets
from river.compose import Select
from river import preprocessing
from river.compat import convert_sklearn_to_river
from sklearn.linear_model import SGDRegressor
np.random.seed(1000)
X,y = datasets.make_regression(n_samples=5000, n_features=2)
X = pd.DataFrame(X)
X.columns = ['feat_1','feat_2']
X['cat'] = np.random.randint(1, 100, X.shape[0])
X['cat'] = X['cat'].astype('string')
y = pd.Series(y)

Now model:

group1 = Select('cat') | preprocessing.OneHotEncoder() 
group2 = Select('feat_2') | preprocessing.StandardScaler()
model = group1 + group1*group2*group2  | convert_sklearn_to_river(SGDRegressor())

model.predict_many(X)
model.learn_many(X,y)

I get next error:

File ~\MyEnv\venv\lib\site-packages\river\compose\union.py:303, in <genexpr>(.0)
    299 def transform_many(self, X):
    300     """Passes the data through each transformer and packs the results together."""
    302     return pd.concat(
--> 303         (t.transform_many(X) for t in self.transformers.values()),
    304         copy=False,
    305         axis=1,
    306     )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:110, in TransformerProduct.transform_many(self, X)
    106     # Default
    107     return np.multiply(a, b)
    109 return pd.DataFrame(
--> 110     {
    111         "*".join(combo): functools.reduce(
    112             multiply, (outputs[i][f] for i, f in enumerate(combo))
    113         )
    114         for combo in itertools.product(*outputs)
    115     },
    116     index=X.index,
    117 )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:111, in <dictcomp>(.0)
    106     # Default
    107     return np.multiply(a, b)
    109 return pd.DataFrame(
    110     {
--> 111         "*".join(combo): functools.reduce(
    112             multiply, (outputs[i][f] for i, f in enumerate(combo))
    113         )
    114         for combo in itertools.product(*outputs)
    115     },
    116     index=X.index,
    117 )

File ~\MyEnv\venv\lib\site-packages\river\compose\product.py:102, in TransformerProduct.transform_many.<locals>.multiply(a, b)
    100 # Fast-track for sparse * numeric
    101 if pd.api.types.is_sparse(a):
--> 102     return pd.arrays.SparseArray(a * b, fill_value=a.sparse.fill_value)
    103 # Fast-track for numeric * sparse
    104 if pd.api.types.is_sparse(b):
'SparseArray' object has no attribute 'sparse'

The problem happens in group1*group2*group2 place, if I use group1*group2 or group2*group2 or even group2*group2*group1 it works fine

@MaxHalford
Copy link
Member

Thanks for opening this issue! I just fixed this edge-case in #1311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants