Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191 #1

Open
lthakur007 opened this issue Dec 28, 2018 · 25 comments

Comments

@lthakur007
Copy link

lthakur007 commented Dec 28, 2018

observing the following error while running deep ctr on the gpu:

Traceback (most recent call last):
File "main.py", line 31, in
model.fit(loader_train, loader_val, optimizer, epochs=5, verbose=True)
File "/root/deepctr/DeepFM_with_PyTorch/model/DeepFM.py", line 153, in fit
total = model(xi, xv)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/root/deepctr/DeepFM_with_PyTorch/model/DeepFM.py", line 98, in forward
fm_first_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_first_order_embeddings)]
File "/root/deepctr/DeepFM_with_PyTorch/model/DeepFM.py", line 98, in
fm_first_order_emb_arr = [(torch.sum(emb(Xi[:, i, :]), 1).t() * Xv[:, i]).t() for i, emb in enumerate(self.fm_first_order_embeddings)]
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191

@lthakur007 lthakur007 changed the title feature_sizes.txt is not present with the dataset downloaded from the link http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/ RuntimeError: index out of range at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191 Jan 1, 2019
@cserken
Copy link

cserken commented Apr 4, 2019

hi~
anybody know how to fix this issue?

thanks!

@leadtoit
Copy link

leadtoit commented Apr 9, 2019

the same problem~

2 similar comments
@mojirml
Copy link

mojirml commented Apr 12, 2019

the same problem~

@monkeyusage
Copy link

the same problem~

@ice16
Copy link

ice16 commented Apr 27, 2019

same problem

@lewisbakkero
Copy link

Same issue when using flair0.4.1, pythorch1.1.0 and BertEmbeddings on 2 x NVIDIA Tesla P100

@zqs01
Copy link

zqs01 commented May 3, 2019

the same problem

@monkeyusage
Copy link

monkeyusage commented May 6, 2019

the same problem~

I had an issue with the embeddings I fixed it initializing the embedding layer with the right size which is the size of the vocabulary I am using.

when creating your Encoder/Model:

self.embed = nn.Embedding(vocab_size, embed_size)
I was using a smaller number than my actual vocab_size thus resulting in an error.

@vinklibrary
Copy link

the same problem

@RyanAmaudruz
Copy link

same issue

@ganyuqi
Copy link

ganyuqi commented Jun 27, 2019

hi, is anybody fix the problem?

@anksng
Copy link

anksng commented Jul 9, 2019

Hi, try to inspect the size of your vocabulary , if using the vocab_size in the Embedding layer initialization, try to add +1 to it like -> self.embed = nn.Embedding(vocab_size+1, embed_size)

@loretoparisi
Copy link

we have the same problem using LASER bi-LSTM model with PyTorch 1.0 / Python 3.7
https://github.com/facebookresearch/LASER

@shantam21
Copy link

Did anyone get the solution? I'm stuck! just wanted to confirm what vocab_size here means. Does it mean the length of the tokenized words set?

@shira-g
Copy link

shira-g commented Aug 11, 2019

It happened to me when I had out-of-vocabulary words which were assigned a -1 value, and also it happens when you set the vocab-size to a smaller value than the size of the vocabulary + 1

@chenxijun1029
Copy link
Owner

Sorry for answering so late!
I think I've fixed this bug. The main reason is that I accumulated offset when iterate feature size in data preprocess, so index of categorial feature is out of embedding size. Please refer to the update in dataPreprocess.py.
Also, I've found that I should set the index of coutinous feature as 0, and the value of coutinous feature as its original value instead of 1. Refer to the update of dataset.py for more details.
Thanks for your attention.

@loretoparisi
Copy link

@chenxijun1029 in which version this has been fixed? Thank you.

@Guilherme26
Copy link

Hey guys, I had the same problem. In my case, what happened was that I was presenting the Input (X) and the Output (Y) to the model with len(X) != len(Y) due to an error in a third-party library.

Best regards and good luck!

@maulberto3
Copy link

the same problem~

I had an issue with the embeddings I fixed it initializing the embedding layer with the right size which is the size of the vocabulary I am using.

when creating your Encoder/Model:

self.embed = nn.Embedding(vocab_size, embed_size)
I was using a smaller number than my actual vocab_size thus resulting in an error.

Hi. I too resolved my issue by fixing what @mcszn suggested.

@tinaty
Copy link

tinaty commented Feb 17, 2020

Hi, try to inspect the size of your vocabulary , if using the vocab_size in the Embedding layer initialization, try to add +1 to it like -> self.embed = nn.Embedding(vocab_size+1, embed_size)

Hi, this works. But would you mind providing an explanation for this?

@anksng
Copy link

anksng commented Feb 17, 2020

Hi, try to inspect the size of your vocabulary , if using the vocab_size in the Embedding layer initialization, try to add +1 to it like -> self.embed = nn.Embedding(vocab_size+1, embed_size)

Hi, this works. But would you mind providing an explanation for this?

I guess it was a bug, which is now fixed by @chenxijun1029 .
What I remember is that the error is because embedding_dim must be equal to the vocab size, but when initializing the embedding layer with len(voca_size) it somehow subtracts 1.

@tinaty
Copy link

tinaty commented Feb 17, 2020

Hi, try to inspect the size of your vocabulary , if using the vocab_size in the Embedding layer initialization, try to add +1 to it like -> self.embed = nn.Embedding(vocab_size+1, embed_size)

Hi, this works. But would you mind providing an explanation for this?

I guess it was a bug, which is now fixed by @chenxijun1029 .
What I remember is that the error is because embedding_dim must be equal to the vocab size, but when initializing the embedding layer with len(voca_size) it somehow subtracts 1.

got it. thanks very much.

@lyleshaw
Copy link

same issue

@lu161513
Copy link

Hi, try to inspect the size of your vocabulary , if using the vocab_size in the Embedding layer initialization, try to add +1 to it like -> self.embed = nn.Embedding(vocab_size+1, embed_size)

Why +1 will solve the problem,The initialization of embedding should not be used vocab_size rather than vocab_size+1?

@onlyhebo
Copy link

onlyhebo commented Nov 2, 2020

I have the same issue. See DeepFM_with_PyTorch/data/dataset.py Row7 : continous_features = 13.
Change the value for your dataset. You can have a correct result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests