Description
Hi,
I am trying to use a demo dataset to test the training code. But the instruction is not clear enough. Before running the training code, I did the "binarize_data" step, for this one, which format I should use? npy or jsonl, if it is jsonl, it looks like there is no "input_ids" and "label" for the dataloader parts for following training part. If it is npy, i meet a problem about uint format cannot be converted shown as below:
self.input_ids = [torch.tensor(example["input_ids"], dtype=torch.long) for example in self.input_ids if len(example["input_ids"]) < args.model_max_length]
TypeError: can't convert np.ndarray of type numpy.uint32. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Any clue on this issue? or the only thing needed is forcely transfer the data format to make it NOT as "uint"?
Thanks!