What's Changed
Major new version that removes dependency on older machine learning approaches, including spaCy and NLTK, which tend to have version conflicts with Docassemble and perform poorly and at a higher cost than modern LLM technologies.
Changed
- Add information about presence of PDF tags for accessibility by @nonprofittechy in #104
- Force reportlab to use latest security patch by @BryceStevenWilley in #105
- Sorted simplified words to prep for new words by @plocket in #106
- Add simple words by @plocket in #107
- don't parse [] terms in YAML, allow overwrite of PDF by @nonprofittechy in #111
- If the PDF has no text, OCR it by @nonprofittechy in #112
- Add has_fields() function by @nonprofittechy in #115
- Typing issue by @nonprofittechy in #116
- Correct mypy issues by @BryceStevenWilley in #121
- Remove use of pickled (joblib) files for now by @nonprofittechy in #119
- Safer checking for field annotations in a PDF by @BryceStevenWilley in #122
- Fix issue with duplicate fields by @BryceStevenWilley in #124
- Only send first 5000 characters to Spot by @ClaireSimmonds in #126
- Migrate to the stable OpenAI Python client (1.0) by @nonprofittechy in #129
- Add some DOCX modification functions by @nonprofittechy in #130
- Fix mypy typing issues by @BryceStevenWilley in #136
- 25 detect sensitive fields by @codestronger in #134
- Unpin scikit-learn because it conflicts with docassemble by @nonprofittechy in #138
- quick and dirty patch by @nonprofittechy in #141
- Switch to
ubuntu-latestaction runner by @BryceStevenWilley in #143 - Migrate to more standard .env file usage by @nonprofittechy in #146
- We're not really using spaCy but it was still a dependency by @nonprofittechy in #148
- Replace passivepy with a call to an LLM by @nonprofittechy in #147
- Use raw strings for regexes to prevent warning in python 3.12 / black by @nonprofittechy in #151
- Typing and unit test fixes by @nonprofittechy in #152
- Allow working without .env; pull creds from docassemble config by @nonprofittechy in #150
- Finish the migration to LLMs; removing NLTK, etc. by @nonprofittechy in #153
- Migrate from spa cy and nltk by @nonprofittechy in #154
New Contributors
- @plocket made their first contribution in #106
- @ClaireSimmonds made their first contribution in #126
- @codestronger made their first contribution in #134
Full Changelog: v0.2.0...v1.0.0