Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model loading problem #5

Open
xiaocangnn opened this issue Aug 28, 2024 · 3 comments
Open

model loading problem #5

xiaocangnn opened this issue Aug 28, 2024 · 3 comments

Comments

@xiaocangnn
Copy link

xiaocangnn commented Aug 28, 2024

Hello, when I was loading the pre-trained code generation model given in your github repository, the problem shown in the picture below occurred: the model is missing a word segmenter or the loading path conflicts. I have tried many methods but have not been able to solve it. Have you ever encountered this kind of problem? If so, how was it resolved?
屏幕截图 2024-08-28 232743

@shunzh
Copy link
Owner

shunzh commented Sep 1, 2024

Hello, Thanks for your question! I quickly tried it on Google Colab and there doesn't seem to be a problem loading the GPT2Tokenizer.
Based on the error message, is there a local directory called "gpt2" in your workspace?

Screenshot 2024-09-01 at 3 47 09 PM

@xiaocangnn
Copy link
Author

Thank you very much for your reply. I all used the code you gave on github and then ran it on the Auto DL server, but it prompted me missing the file vocab.json and merges.txt files, trying to download both files from Hugging Face and import, solved the above problem.
But I have a new question: can I take your PG-TD model as a pre-trained model and train it on my own dataset? Attempt to train into a code generation model for exploit exploits.

@shunzh
Copy link
Owner

shunzh commented Sep 6, 2024

Thank you very much for your reply. I all used the code you gave on github and then ran it on the Auto DL server, but it prompted me missing the file vocab.json and merges.txt files, trying to download both files from Hugging Face and import, solved the above problem.

Great to know that the problem is solved!

But I have a new question: can I take your PG-TD model as a pre-trained model and train it on my own dataset? Attempt to train into a code generation model for exploit exploits.

We don't own the pre-trained models. We use the models from this paper: https://arxiv.org/abs/2105.09938. You may check but I think it's okay to fine-tune their models.
However there have been newer code models since then, like CodeGen and Code Llama.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants