Skip to content

Handle "multicategorical" columns #227

Open
@davidfstein

Description

@davidfstein

The pytorch_frame library natively handles categorical variables where the variable may take on multiple categories simultaneously, e.g. row1 = [1, .5, ['a', 'b', 'c']], row2 = [2, .3, ['a']] ...

It would be a nice quality of life enhancement to have this sort of functionality added to the widedeep library.

I believe, though I need to look more carefully, that they do something along the lines of 1) label encode the categories 2) convert to tensors such that multicategorical feature a is replaced with an "embedding" of shape n rows x max categories for single row. Rows with variables taking on fewer than max categories for single row take -1 in the "missing" columns. I imagine there are other options for handling this also.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions