Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

Closed
swburge opened this issue Nov 14, 2024 · 6 comments
Closed

RobertaTokenizerFast object has no attribute 'split_special_tokens' #501

swburge opened this issue Nov 14, 2024 · 6 comments

Comments

@swburge
Copy link

swburge commented Nov 14, 2024

Hi there,

I'm having trouble running MedCAT for deidentification after some system upgrades. I have python 3.11, transformers 4.46.2, tokenizers 0.20.3 and medcat 1.13.1, and I'm using a model pack that works very well on medcat 1.7.2

I see that the deid code has changed slightly, and now using:
from medcat.utils.ner import deid' 'from medcat.cat import CAT' 'deid=DeIdModel.create("./modelpack.zip")' 'anon_text=deid.deid_text(foo)

results in AttributeErro: 'RobertaTokenizerFast' object has no attribute 'split_special_tokens'. Did you mean: 'all_special_tokens'?

I think this is an issue with the transformers or tokenizer libraries, but I'm not sure I understand what's going on. The datasets and models work perfectly with previous versions of medcat, transformers (4.21.3) and tokenisers (0.12.1).

@mart-r
Copy link
Collaborator

mart-r commented Nov 14, 2024

Hi,

The fix for this should have been released in 1.13.1 with #490 . But for some reason it went out with the fix commented out.

Bare with me as I try and rectify the issue.

@mart-r
Copy link
Collaborator

mart-r commented Nov 14, 2024

I've created a PR to fix this (#502).

If you need a fix now, you can install medcat based on the fixed PR:

pip install git+https://github.com/CogStack/MedCAT.git@CU-8696n7w95-fix-deid-comment

PS: That's not entirely the same state as the 1.13.1 release - it's got a few more things added to it since it's based on the master branch (#489, #485, #486, #469, #492, #497, #498, #479). So if that doesn't work for you, you can wait for a patch release (1.13.2).

@swburge
Copy link
Author

swburge commented Nov 14, 2024

Thank you, your fix works beautifully and I very much appreciate the speedy response. Do you know when pypi will be updated with an official release? I need to deploy this in a secure environment, and I can't access code via GitHub, only via pypi and after review.
Thanks again!

@mart-r
Copy link
Collaborator

mart-r commented Nov 14, 2024

I hope to do a patch release later today to incorporate the fix.

But I can't fully guarantee that since it also relies on other people reviewing the aforementioned PR before I can merge it in and push a release.

@mart-r
Copy link
Collaborator

mart-r commented Nov 15, 2024

Hi @swburge

I've now been able to push a patch release and 1.13.2 is now available on PyPI as well.

Let me know if you experience any further issues.

@swburge
Copy link
Author

swburge commented Nov 15, 2024

Thanks so much - it works perfectly. Closing this issue now.

@swburge swburge closed this as completed Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants