-
Notifications
You must be signed in to change notification settings - Fork 3
feat: calculate MD and NM tags #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
87136be to
62e8228
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #229 +/- ##
==========================================
- Coverage 91.19% 91.07% -0.12%
==========================================
Files 18 18
Lines 2351 2387 +36
Branches 356 365 +9
==========================================
+ Hits 2144 2174 +30
- Misses 135 138 +3
- Partials 72 75 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f035577 to
d625ecf
Compare
d625ecf to
1e52794
Compare
1e52794 to
019c788
Compare
…t for clarity, true up assertion style to be consistent
… for n_as_match, expand docstring, defensive programming against invalid CIGAR ops
6b29c10 to
c303154
Compare
c303154 to
6dc63aa
Compare
WalkthroughThe changes introduce constants Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fgpyo/sam/__init__.py (1)
1046-1069: Function signature and docstring updates
The signature now includesn_as_matchandreference_offset, broadening functionality. The docstring is thorough, referencing SAM standards and htsjdk logic. Consider simplifying if the function grows more complex.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py(4 hunks)tests/fgpyo/sam/test_sam.py(4 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 1130-1130: fgpyo/sam/init.py#L1130
Added line #L1130 was not covered by tests
🔇 Additional comments (19)
fgpyo/sam/__init__.py (6)
204-205: New constant for missing query bases
These lines clearly define a placeholder for records without query bases. Looks good and straightforward.
1033-1044: Addedmdattribute inReadEditInfo
Documenting and implementingmdinReadEditInfoaligns well with SAM standards, ensuring we can store the MD tag. No issues spotted.
1082-1083: Return None for unmapped or missing query bases
Clean conditional check here, properly documented in the docstring. Good approach to handle incomplete data.
1085-1129: Core loop logic for alignments
This block meticulously accounts for each CIGAR operation type (N, I, S, D, M, X, EQ). The code is consistent with the docstring, handling offsets and boundaries carefully.
1130-1131: Uncovered error path
Static analysis indicates line 1130 is not tested. Consider adding a test that triggers this “Invalid CIGAR operation” condition to maintain coverage and ensure the error path is verified.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1130-1130: fgpyo/sam/init.py#L1130
Added line #L1130 was not covered by tests
1132-1142: Finalizing MD edits
Appending the final match count and constructingReadEditInfolooks correct. Themd="".join(md_edits)call aligns with standard MD tag assembly.tests/fgpyo/sam/test_sam.py (13)
8-8: Importing Optional
Straightforward addition for handling None usage. No issues noted.
655-659: Testing MD with no edits
Callingcalculate_edit_info()withn_as_match=Falseand checkinginfo.md == "30"ensures no mismatches. Good coverage for the simple case.
661-671: Test for zero offset
Verifies that the function properly handles reference offsets with no edits. Clear test name and correct assertion.
673-692: Test for non-zero offset edits
The ASCII art is helpful. Confirms mismatches are counted correctly when reference alignment is shifted. Thorough approach.
701-709: Mismatch plus insertion scenario
The MD tag ignoring insertions is correct per SAM spec. Tests confirm logic for mismatches and inserted bases. Nicely done.
718-725: Test with clipping and deletions
Ensures that soft-clips are ignored in MD calculations yet fully checks the deletion. The results align with expectations.
728-753: Ns alignment test
Parameterizing forn_as_matchcovers both ignoring or counting N→N as mismatches. This thoroughly tests ambiguous bases.
756-773: Consecutive mismatches
Test confirms consecutive fails from htsjdk data. The comparison to the original MD tag is a robust approach.
776-788: Soft clip at end
Verifies the correctness of the MD tag amid clips. Checking thatn_as_matchdoes not matter is a nice detail.
790-802: Skipped region
Validates how 2N is treated in the CIGAR. Confirming the MD tag matches the read’s tag is thorough.
804-818: Deletion out of bounds
Ensures the function doesn’t overreach when the deletion extends beyond reference. Good boundary check.
820-837: Matches out of bounds
Coverage for partial reference consumption. The MD string and NM tally look correct.
839-848: Missing query bases
Confirming thatNoneis returned if the query bases are absent guarantees consistency with the function doc.
fgpyo/sam/__init__.py
Outdated
| elif op not in (CigarOp.H, CigarOp.P): # if we come across an unexpected CIGAR operation | ||
| raise ValueError(f"Invalid CIGAR operation: {op}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I don't think these are errors.
Both hard clipping and padding operators consume neither reference bases nor read bases (see the table on page 8 in the spec), so they can be silently ignored. This is what the htsjdk implementation does, which we're following.
Can you add a test cases with cigars that have hard clips and padding operators (separately)?
| elif op not in (CigarOp.H, CigarOp.P): # if we come across an unexpected CIGAR operation | |
| raise ValueError(f"Invalid CIGAR operation: {op}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so sorry, I read this as elif op in (CigarOp.H, CigarOp.P). Consider assert_never. Sorry again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not get assert_never() to work here. I reviewed it with Clint and it sounded like a potential rabbit hole?
elif op not in (CigarOp.H, CigarOp.P): # pragma: no cover
assert_never(op)
error: Argument 1 to "assert_never" has incompatible type "Literal[CigarOp.M, CigarOp.I, CigarOp.S, CigarOp.H, CigarOp.P, CigarOp.EQ, CigarOp.X, CigarOp.code, CigarOp.character, CigarOp.consumes_query, CigarOp.consumes_reference]"; expected "Never" [arg-type]
0c702f5 to
3aeb035
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py(4 hunks)tests/fgpyo/sam/test_sam.py(4 hunks)
🧰 Additional context used
🪛 GitHub Actions: test
fgpyo/sam/__init__.py
[error] 1051-1051: mypy error: X | Y syntax for unions requires Python 3.10
🔇 Additional comments (20)
fgpyo/sam/__init__.py (11)
204-205: LGTM: Added constant for missing query bases.Clear constant to represent missing query bases in SAM.
1043-1043: LGTM: Added MD tag field to ReadEditInfo.Simple attribute addition aligns with PR objectives.
1061-1064: LGTM: Clear explanation of NM tag calculation.Docstring clearly references SAM spec and explains ambiguity handling.
1078-1078: Default for n_as_match should be False for SAM spec compliance.The default correctly aligns with the SAM specification now.
1085-1086: LGTM: Early return for unmapped or missing query reads.Appropriate early exit for cases where MD/NM calculation isn't possible.
1093-1096: LGTM: Added MD tag calculation variables.Proper initialization of variables needed for MD tag computation.
1099-1116: LGTM: Match/mismatch handling with proper bounds checking.Logic correctly handles matches/mismatches while tracking MD tag and preventing out-of-bounds access.
1106-1108: LGTM: Special handling for '=' character.The implementation properly treats '=' as a match always, regardless of reference base.
1118-1131: LGTM: Deletion handling with bounds checking.Deletion handling with proper MD tag formatting and bounds checking.
1138-1139: LGTM: Handling for ignored CIGAR operations.Properly skips hard clips and padding operations without raising errors.
1151-1151: LGTM: Added MD tag to return value.Correctly passes the calculated MD tag string to the ReadEditInfo constructor.
tests/fgpyo/sam/test_sam.py (9)
655-658: LGTM: Updated existing test with n_as_match parameter.Test properly includes new parameter and MD tag assertion.
673-692: LGTM: Good test for nonzero offset with clear ASCII visualization.Test properly validates offsets with detailed ASCII visualization of alignment.
728-753: LGTM: Parameterized test for ambiguous N bases.Test validates both behaviors of n_as_match parameter with ambiguous bases.
756-773: LGTM: Test for consecutive mismatches.Test verifies complex case with consecutive mismatches without matches at start.
790-803: LGTM: Tests for ignored CIGAR operations.Tests verify that skipped, padded, and hard-clipped regions are handled correctly.
805-817: LGTM: Test for equals sign in query sequence.Test confirms that '=' in query is always treated as match regardless of reference.
835-849: LGTM: Test for deletion extending beyond reference.Test verifies proper handling of deletions that go out of bounds.
851-868: LGTM: Test for matches extending beyond reference.Test confirms correct behavior when matches extend beyond reference bounds.
870-878: LGTM: Test for missing query bases.Test confirms function returns None for reads with missing query bases.
3aeb035 to
09fdd38
Compare
fgpyo/sam/__init__.py
Outdated
| elif op not in (CigarOp.H, CigarOp.P): # if we come across an unexpected CIGAR operation | ||
| raise ValueError(f"Invalid CIGAR operation: {op}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not get assert_never() to work here. I reviewed it with Clint and it sounded like a potential rabbit hole?
elif op not in (CigarOp.H, CigarOp.P): # pragma: no cover
assert_never(op)
error: Argument 1 to "assert_never" has incompatible type "Literal[CigarOp.M, CigarOp.I, CigarOp.S, CigarOp.H, CigarOp.P, CigarOp.EQ, CigarOp.X, CigarOp.code, CigarOp.character, CigarOp.consumes_query, CigarOp.consumes_reference]"; expected "Never" [arg-type]
* set n_as_match default to False * expand docstring * re-order CIGAR op checks * add testing coverage for =, op.N, op.H, op.P
09fdd38 to
5b02a5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🛑 Comments failed to post (1)
fgpyo/sam/__init__.py (1)
1138-1140: 💡 Verification agent
🧩 Analysis chain
Verify error handling coverage.
The error handling for unexpected CIGAR operations isn't covered by tests.
🏁 Script executed:
#!/bin/bash # Check if there are tests that verify error handling for invalid CIGAR operations echo "Searching for tests that could trigger the ValueError for invalid CIGAR ops..." rg "test_calc_edit_info.*invalid.*cigar" tests/ echo "Checking for any CIGAR operations not in the standard set..." rg "CigarOp\.\w+" --no-filename | sort | uniqLength of output: 1868
Add tests for invalid CIGAR operations
The
ValueErrorbranch for unexpected CIGAR ops infgpyo/sam/__init__.py(lines 1138–1140) has no test coverage. Please add unit tests that:
- Instantiate or mock a CIGAR string containing a non-standard operation (neither H nor P).
- Call the function that processes CIGAR operations (e.g.
calc_edit_info) and assert that it raisesValueError(f"Invalid CIGAR operation: {op}").This will ensure error handling for invalid CIGAR codes is properly tested.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1139-1139: fgpyo/sam/init.py#L1139
Added line #L1139 was not covered by tests🤖 Prompt for AI Agents (early access)
In fgpyo/sam/__init__.py around lines 1138 to 1140, the ValueError raised for invalid CIGAR operations lacks test coverage. Add unit tests that create or mock a CIGAR string containing an invalid operation (not H or P), then call the function processing CIGAR ops (such as calc_edit_info) and assert that it raises the expected ValueError with the message "Invalid CIGAR operation: {op}". This will verify the error handling branch is properly tested.
|
@nh13 can you please confirm this is the behavior you want? I changed the results of this test and I just want to be careful that this is the expected behavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/sam/__init__.py(4 hunks)tests/fgpyo/sam/test_sam.py(4 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/sam/__init__.py
[warning] 1118-1118: fgpyo/sam/init.py#L1118
Added line #L1118 was not covered by tests
[warning] 1140-1140: fgpyo/sam/init.py#L1140
Added line #L1140 was not covered by tests
[warning] 1156-1156: fgpyo/sam/init.py#L1156
Added line #L1156 was not covered by tests
🔇 Additional comments (16)
fgpyo/sam/__init__.py (9)
204-206: Good addition ofNO_QUERY_BASESconstant.Clearly defines the behavior when query bases are missing in a SAM record.
210-212: Appropriate definition of valid bases.The
OK_BASESincludes "=" to properly handle SAM spec where equals is always considered a match regardless of reference base.
1049-1058: Helper function improves code clarity.Good encapsulation of match logic into a separate function. The differentiation between SAM spec and htsjdk implementation is clear.
1061-1067: Function signature properly updated.The renamed parameter
match_htsjdk(replacingn_as_match) makes the intention clearer and the return type correctly handles theNonecase.
1070-1087: Improved docstring with detailed explanation.Documentation effectively explains the difference between SAM spec and htsjdk implementations, and clarifies when
Nonewill be returned.
1102-1103: Early return handles edge cases.Correctly returns
Nonefor unmapped reads or missing query bases, preventing downstream issues.
1135-1146: Deletion handling builds MD tag correctly.The code appends the match count, adds the caret symbol, then appends the deleted reference bases. Note that line 1140-1141 (out-of-bounds handling) isn't tested.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1140-1140: fgpyo/sam/init.py#L1140
Added line #L1140 was not covered by tests
1158-1159: MD tag finalization is correct.Appending the final match count at the end of the function ensures complete MD tag generation.
1167-1169: Return value includes MD tag.The ReadEditInfo object correctly includes the new MD tag.
tests/fgpyo/sam/test_sam.py (7)
651-678: Comprehensive test for base matching logic.Thorough test case covering all important combinations of query and reference bases with both match_htsjdk settings.
785-803: Good test for consecutive mismatches.Verifies MD tag construction for the challenging case of consecutive mismatches with no intervening matches.
820-833: Tests for ignored CIGAR operations.Effectively tests handling of skipped regions, hard clips, and padding operations in CIGAR strings.
835-847: Test for '=' in query sequence.Confirms that '=' in query sequence is always treated as a match regardless of the reference base.
865-879: Tests deletion extending out of bounds.Good test for a deletion that goes beyond the reference boundaries.
881-898: Tests matches extending beyond reference boundaries.This test is important for validating edge case handling where alignments extend past available reference sequence.
900-908: Tests for missing query bases.Properly verifies both
Noneand"*"cases for missing query sequences.
| elif op == CigarOp.N: # skipped region from ref, consumes ref | ||
| target_offset += elem.length | ||
| elif op not in (CigarOp.H, CigarOp.P): # pragma: no cover | ||
| raise ValueError(f"Invalid CIGAR operation: {op}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
This error path needs test coverage.
The code path for invalid CIGAR operations isn't tested. While marked with pragma: no cover, consider adding a test to ensure proper error handling.
🏁 Script executed:
#!/bin/bash
# Check if there are tests for invalid CIGAR operations
echo "Searching for tests with invalid CIGAR operations..."
rg -A 2 "Invalid CIGAR" tests/Length of output: 147
🏁 Script executed:
#!/bin/bash
# Show the context around the raise to confirm the pragma and function name
sed -n '1140,1170p' fgpyo/sam/__init__.pyLength of output: 1161
Add test for invalid CIGAR branch
The ValueError at fgpyo/sam/init.py lines 1155–1156 (the branch marked # pragma: no cover) is untested. Please add a unit test that feeds an out-of-spec CIGAR operation and asserts that ValueError("Invalid CIGAR operation: …") is raised.
- Location: fgpyo/sam/init.py, in the loop handling
op not in (CigarOp.H, CigarOp.P) - Suggestion: in your test suite, create a dummy CIGAR element with an invalid
opand usepytest.raises(ValueError)to verify the error message.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1156-1156: fgpyo/sam/init.py#L1156
Added line #L1156 was not covered by tests
🤖 Prompt for AI Agents
In fgpyo/sam/__init__.py at line 1156, the ValueError raised for invalid CIGAR
operations is not covered by tests. Add a unit test in the test suite that
constructs a CIGAR element with an invalid operation code outside the allowed
set, then use pytest.raises(ValueError) to assert that the error with the
message "Invalid CIGAR operation: ..." is properly raised, ensuring this error
path is tested.
| break # out of bounds | ||
| query_base: str = rec.query_sequence[query_offset + in_block_offset].upper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Out-of-bounds handling could use test coverage.
The code correctly breaks when matches extend beyond reference boundaries, but this branch is not covered by tests.
🏁 Script executed:
#!/bin/bash
# Check if this code path is actually tested anywhere
echo "Searching for tests that might cover the out-of-bounds match condition..."
rg -A 2 "break # out of bounds" tests/Length of output: 198
🏁 Script executed:
#!/bin/bash
# Identify the function containing the out-of-bounds break and search tests for it
FUNCTION=$(awk '/def /{func=$0} /break # out of bounds/{print func; exit}' fgpyo/sam/__init__.py | sed -E 's/def\s+([^(]+).*/\1/')
echo "Function: $FUNCTION"
echo "Searching for tests referencing $FUNCTION..."
rg -n "$FUNCTION" -n tests/Length of output: 68238
🏁 Script executed:
#!/bin/bash
# Show context around the out-of-bounds break to identify the enclosing function
rg -n "break # out of bounds" -C10 fgpyo/sam/__init__.pyLength of output: 2306
Add tests for out-of-bounds CIGAR handling
No existing tests hit the break # out of bounds branches in fgpyo/sam/init.py. Please add unit tests that:
- Exercise a match block (e.g. CIGAR “NM” longer than reference tail) to hit line 1118
- Exercise a deletion block longer than remaining reference to hit line 1139
Place them under tests/fgpyo/sam/, verifying that your MD/NM tag logic correctly stops at the reference end.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1118-1118: fgpyo/sam/init.py#L1118
Added line #L1118 was not covered by tests
🤖 Prompt for AI Agents
In fgpyo/sam/__init__.py around lines 1118 to 1119, the code breaks out of a
loop when matches extend beyond reference boundaries, but this branch lacks test
coverage. Add unit tests under tests/fgpyo/sam/ that create scenarios with a
match block (e.g., a CIGAR "NM" longer than the reference tail) to trigger the
break at line 1118, and a deletion block longer than the remaining reference to
trigger the break at line 1139. These tests should verify that the MD/NM tag
logic correctly stops processing at the reference end.
| NO_QUERY_QUALITIES: array = qualitystring_to_array(STRING_PLACEHOLDER) | ||
| """The quality array corresponding to an unavailable query quality string ("*").""" | ||
|
|
||
| OK_BASES: set[str] = {"A", "C", "G", "T", "="} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: shall we make this private for now since it's only used here and not intended to be use by outside projects
| OK_BASES: set[str] = {"A", "C", "G", "T", "="} | |
| _OK_BASES: set[str] = {"A", "C", "G", "T", "="} |
| for in_block_offset in range(0, elem.length): | ||
| if (target_offset + in_block_offset) >= len(reference_sequence): | ||
| break # out of bounds | ||
| md_edits.append(f"{reference_sequence[target_offset:target_offset + elem.length]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chore: (1) I think we need to upper case the reference and (2) I don't think we need an f-string since reference sequence is already a string
| md_edits.append(f"{reference_sequence[target_offset:target_offset + elem.length]}") | |
| md_edits.append(reference_sequence[target_offset:target_offset + elem.length].upper()) |
| target_offset: int = reference_offset if reference_offset is not None else rec.reference_start | ||
| cigar: Cigar = Cigar.from_cigartuples(rec.cigartuples) | ||
|
|
||
| matches, mms, insertions, ins_bases, deletions, del_bases = 0, 0, 0, 0, 0, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chore: while we're here, can we update this variable to a longer form?
| matches, mms, insertions, ins_bases, deletions, del_bases = 0, 0, 0, 0, 0, 0 | |
| matches, mismatches, insertions, ins_bases, deletions, del_bases = 0, 0, 0, 0, 0, 0 |
| if target_offset < elem.length: | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chore: I found another difference between htsjdk and samtools:
What do you think about updating deletions and del_bases if match_htsjdk is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree, remember to update the docstring.
| elif op not in (CigarOp.H, CigarOp.P): # pragma: no cover | ||
| raise ValueError(f"Invalid CIGAR operation: {op}") | ||
|
|
||
| md_edits.append(str(current_match_count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: f-strings are just a little faster
| md_edits.append(str(current_match_count)) | |
| md_edits.append(f"{current_match_count}") |
Closes #84. Adds an
mdattribute to theReadEditInfoclass and ann_as_matchboolean flag tocalculate_edit_info().calculate_edit_info()is now a little more complex than we might like (11 > 10). I tried not to change any pre-existing code more than was necessary.Some of the reads for testing were inherited from
htsjdk.Bonus points for adding another method that takes a iterator over Templates (and/or pysam.AlignedSegments) and a pysam.FastaFile and updates the records.