-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi , as per the documentation this is mentioned as way to create the training data set.
DatasetName (e.g. LF-AmazonTitles-131K)
│ trn_X.txt (text for trn documents, one text in each line)
| tst_X.tst (text for tst documents, one text in each line)
| Y.txt (text for labels, one text in each line)
│ trn_X_Y.txt (trn labels in spmat format)
| tst_X_Y.txt (tst labels in spmat format)
| filter_labels_test.txt (filter labels where label and test documents are same)
│
└───XXCondensedData (embeddings for tst, trn documents and labels, for benchmark datasets, XX=DX[Astec])
│ trn_point_embs.npy (2D numpy matrix for trn document embeddings)
│ tst_point_embs.npy (2D numpy matrix for tst document embeddings)
| label_embs.npy (2D numpy matrix for label embeddings)
I could not understand the trn labels in spmat format . Is there a script that creates that from input documents like ( trn_X.txt and tst_X.txt and Y.txt ) . This is for the case we want to use the label embeddings as well.
I want to generate it for my custom dataset.