Skip to content

Conversation

@AndreaGuarracino
Copy link
Member

@AndreaGuarracino AndreaGuarracino commented Oct 20, 2025

Still buggy

sweepga y.1aln > y.sweepga.1aln
ALNtoPAF -x y.sweepga.1aln > y.sweepga.from-aln2paf.paf                                                                                                                                                  
ALNtoPAF: Subrange -451910,-447864 out of bounds (Get_Contig_Piece)                                                                                                                                                                             
ALNtoPAF: Subrange -211859,-205953 out of bounds (Get_Contig_Piece)                                                                                                                                                                             
ALNtoPAF: Subrange 637538,637878 out of bounds (Get_Contig_Piece)    

y.1aln.zip

@ekg
Copy link
Contributor

ekg commented Oct 20, 2025

This PR addresses the trace spacing issue but doesn't fix the critical coordinate conversion bug that causes the out-of-bounds errors.

Root Cause

The out-of-bounds errors occur because AlnWriter::write_alignment() writes scaffold coordinates directly to .1aln files, but .1aln format expects contig coordinates. When ALNtoPAF reads these back, it applies scaffold offsets again, resulting in double-transformation and out-of-bounds access.

Complete Fix

I've implemented a comprehensive fix locally that addresses both issues:

1. Coordinate Conversion (CRITICAL)

  • Modified AlnWriter to store contig_offsets HashMap (same as AlnReader)
  • Added coordinate conversion in write_alignment() to transform scaffold → contig coords
  • Properly handles both forward and reverse strand transformations

2. Trace Spacing (matches your approach)

  • Added trace spacing line writing in create_with_gdb()
  • Reads from input file if available, falls back to default (100)

Test Results

✅ Before: "Subrange -451910,-447864 out of bounds"
✅ After: 33,317 alignments converted successfully with ALNtoPAF
✅ All coordinates within valid bounds

Closing this PR in favor of the complete fix. Will push the changes shortly.

@ekg
Copy link
Contributor

ekg commented Oct 20, 2025

Closing as superseded by a more complete fix that addresses both the trace spacing issue and the critical coordinate conversion bug.

@ekg ekg closed this Oct 20, 2025
ekg added a commit that referenced this pull request Oct 20, 2025
This fixes critical out-of-bounds errors when converting filtered .1aln
files back to PAF format using ALNtoPAF.

## Root Cause
AlnWriter was writing scaffold coordinates directly to .1aln files, but
.1aln format expects contig-relative coordinates. When ALNtoPAF read
these files, it applied scaffold offsets again (double transformation),
resulting in out-of-bounds access with negative or excessive coordinates.

## Changes
1. Added contig_offsets HashMap to AlnWriter (matching AlnReader)
2. Modified write_alignment() to convert scaffold → contig coordinates
3. Properly handles forward and reverse strand transformations
4. Added trace spacing line writing (required by ALNtoPAF)

## Test Results
- Before: 'Subrange -451910,-447864 out of bounds'
- After: 33,317 alignments converted successfully
- All coordinates within valid bounds

Fixes #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants