Skip to content

Error loading custom dataset #90

Open
@tkap243

Description

@tkap243
  • OCTIS version: 1.11.0
  • Python version: 3.8
  • Operating System: Windows 10

Description

Hello,

I am having trouble loading my custom dataset. I followed the guide in the main README and am getting the below errors.

What I Did

from octis.dataset.dataset import Dataset
import pandas as pd

df = pd.read_csv("/mnt/mydata/notebooks/data.csv")

df.to_csv('corpus.tsv', sep="\t", header= False, columns=['documents'])
dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:330: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  final_df = df[df[1] == 'train'].append(df[df[1] == 'val'])
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py:331: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  final_df = final_df.append(df[df[1] == 'test'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
    335 
--> 336                 self.__corpus = [d.split() for d in final_df[0].tolist()]
    337                 if len(final_df.keys()) > 2:

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in <listcomp>(.0)
    335 
--> 336                 self.__corpus = [d.split() for d in final_df[0].tolist()]
    337                 if len(final_df.keys()) > 2:

AttributeError: 'int' object has no attribute 'split'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-28e6bd2fc3cd> in <module>
      1 dataset = Dataset()
----> 2 dataset.load_custom_dataset_from_folder("/mnt/mydata/notebooks")

/opt/conda/lib/python3.8/site-packages/octis/dataset/dataset.py in load_custom_dataset_from_folder(self, path, multilabel)
    356                 self._load_document_indexes(self.dataset_path + "/indexes.txt")
    357         except:
--> 358             raise Exception("error in loading the dataset:" + self.dataset_path)
    359 
    360     def fetch_dataset(self, dataset_name, data_home=None, download_if_missing=True):

Exception: error in loading the dataset:/mnt/mydata/notebooks


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions