[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? 

** Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? **

Setup :
nvtabular version : 23.6.0
merlin-systems version : 23.6.0

The NvTabular workflow is defined as follows :

```python
input_features = ["item_id-list"]
max_len = 20
cat_features = (
    ColumnSelector(input_features)
    >> ops.Categorify()
    >> nvt.ops.AddMetadata(tags=[Tags.CATEGORICAL])
)
seq_feats_list = (
    cat_features["item_id-list"]
    >> nvt.ops.ListSlice(-max_len, pad=True, pad_value=0)
    >> nvt.ops.Rename(postfix="_seq")
    >> nvt.ops.AddMetadata(tags=[Tags.LIST])
)
features = seq_feats_list >> nvt.ops.AddMetadata(tags=[Tags.ITEM, Tags.ID])
workflow = nvt.Workflow(features)
```

The dataset typically has sequences of items of different length and the workflow slice and pads them to the specified sequence_length. 

The workflow is exported as follows:
```python
transform_workflow_op = workflow.input_schema.column_names >> TransformWorkflow(workflow)
ensemble = Ensemble(transform_workflow_op, workflow.input_schema)
ens_config, node_configs = ensemble.export(preprocessing_path)
```

When exporting the workflow using the Ensemble module, the NvTabular triton config file creates two parameters for each ragged feature: "feature_name___offsets" and "feature_name___values" for both the inputs and outputs.

Is there a solution to avoid creating these new parameters and keep the inputs as is ?
Any workaround appreciated.

<details>
  <summary>Code to reproduce</summary>

  ```python
    import dask.dataframe as dd
    import nvtabular as nvt
    import pandas as pd
    from merlin.schema import Tags
    from merlin.systems.dag import Ensemble
    from merlin.systems.dag.ops.workflow import TransformWorkflow
    from nvtabular import ColumnSelector

    tmp_path = "tmp"

    d = {
        "item_id-list": [
            [28, 12, 44],
            [12, 28, 73],
            [24, 35, 6, 12],
            [74, 28, 9, 12, 44],
            [101, 102, 103, 104, 105],
        ],
    }

    df = pd.DataFrame(data=d)
    ddf = dd.from_pandas(df, npartitions=1)
    train_set = nvt.Dataset(ddf)

    input_features = ["item_id-list"]
    max_len = 20
    cat_features = (
            ColumnSelector(input_features)
            >> nvt.ops.Categorify()
            >> nvt.ops.AddMetadata(tags=[Tags.CATEGORICAL])
    )
    seq_feats_list = (
            cat_features["item_id-list"]
            >> nvt.ops.ListSlice(-max_len, pad=True, pad_value=0)
            >> nvt.ops.Rename(postfix="_seq")
            >> nvt.ops.AddMetadata(tags=[Tags.LIST])
    )
    features = seq_feats_list >> nvt.ops.AddMetadata(tags=[Tags.ITEM, Tags.ID])
    workflow = nvt.Workflow(features)

    workflow.fit(train_set)

    transform_workflow_op = workflow.input_schema.column_names >> TransformWorkflow(workflow)

    ensemble = Ensemble(transform_workflow_op, workflow.input_schema)
    ens_config, node_configs = ensemble.export(tmp_path)

    print(ens_config)
  ```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions