Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenClassificationModelWithSeq2SeqEncoderAndCrf: concatenate sequence segments from same batch #159

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Mar 4, 2025

TODO: explain approach

usage:

  • model config: wrap the seq2seq encoder in concatenate_sequences (use the usual seq2seq config as value of module)
  • taskmodule config: set inputs_key_document_indices=seq2seq_encoder_sequence_ids
  • don't shuffle data during training!

@ArneBinder ArneBinder added the enhancement New feature or request label Mar 4, 2025
@codecov-commenter
Copy link

codecov-commenter commented Mar 4, 2025

Codecov Report

Attention: Patch coverage is 48.38710% with 16 lines in your changes missing coverage. Please review.

Project coverage is 95.17%. Comparing base (201a022) to head (a914614).

Files with missing lines Patch % Lines
...c/pie_modules/models/components/seq2seq_encoder.py 30.00% 14 Missing ⚠️
...labeled_span_extraction_by_token_classification.py 77.77% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (48.38%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #159      +/-   ##
==========================================
- Coverage   95.44%   95.17%   -0.28%     
==========================================
  Files          61       61              
  Lines        5313     5342      +29     
==========================================
+ Hits         5071     5084      +13     
- Misses        242      258      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ArneBinder ArneBinder force-pushed the token_classif_model/concatenate_sequences branch from 8fc8f9e to a914614 Compare March 12, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants