Skip to content

Releases: noamteyssier/gia

0.2.23

19 Jul 20:12
b731be0
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.2...0.2.23

v0.2.20

01 May 22:33
a375cc1
Compare
Choose a tag to compare

Changelog

File Support

Native support for all types of interval files

  1. BED3
  2. BED4
  3. BED6
  4. BED12
  5. BedGraph
  6. Generic BED (BED3 + columns)
  7. GTF

Specialized functions for HTSlib data structures

  1. BAM
  2. VCF / BCF

Auto-determined Naming Schemes

  1. User doesn't need to provide whether files are named/unnamed
  2. File format is automatically determined and will default to generic BED if it cannot be figured out.

BGZIP Support

  1. FASTA
  2. VCF

Utilities

Current List of Utilities for Native BED Files

  1. Closest
  2. Cluster
  3. Complement
  4. Coverage
  5. Extend
  6. Flank
  7. Get Fasta
  8. Intersect
  9. Merge
  10. Random
  11. Sample
  12. Segment
  13. Shift
  14. Sort
  15. Spacing
  16. Subtract
  17. Unionbedg
  18. Window

Specialized HTSlib Utilities

  1. BAM Convert
  2. BAM Coverage
  3. BAM Filter
  4. BCF Filter

Stranded Methods

  1. Closest
  2. Coverage
  3. Extend
  4. Flank
  5. Get Fasta
  6. Intersect
  7. Merge
  8. Subtract
  9. Window
  10. BAM Coverage
  11. BAM Filter
  12. BCF Filter

Multiple Inputs

  1. Closest
  2. Coverage
  3. Intersect
  4. Subtract
  5. Window

Commit Changelog

🚀 Features

File Support

  • Added bed6 support for get_fasta
  • Implement bed6 support for merge subcommand. added format argument to cli
  • Increment bedrs version to 0.1.10
  • Implement Coordinates for references to NumericBed6 for code generalization
  • Made intersect (inplace) compatible with bed6 file format inputs. also refactored internal function calls to have streamed match branch selection inside the function instead of within main
  • Add support for bed6 with subtract submodule.
  • Add a reorder trait which operates on different coordinate types
  • Add an extra method onto numbericbed6 to return and update the name
  • Added a named argument to random to allow for named genome inputs. using new refactored genome struct
  • Added compression threads and level as global arguments to the cli
  • Update bedrs version and added rayon feature
  • Added parallel sorting argument on sort
  • Added support for bed12 file format
  • Added 3 column output as a function within writenamediter for merging
  • Update gia version to 0.2
  • Added auto determination of string/numeric format with BedReader
  • Added flanking function to gia with an optional genome file
  • Added percentage to flank command
  • Added shifting subcommand to gia
  • Added window overlaps as subcommand
  • Added an interval depth structure for fast serde serialization
  • Added a naive implementation of coverage
  • Added direct type conversion to inputs
  • Use direct type conversion in closest
  • Added bed4 as an auto-determined input format
  • Added an ambiguous input format which reads in all 3+ columns into a tab-delim string
  • Added a split translater which keeps an internal translator for the chr and metadata separately
  • Added a split translater which contains two internal translaters. one for the chr translation and one for the meta translation. During sorting, only the chr translater is sorted which heavily reduces the amount of keys to reorder
  • Skip commented lines in input matching
  • Added in gtf set parsing
  • Added in reading functions for gtf
  • Added spacing to the cli
  • Added implementation for spacing - as well as a type for spacing interval outputs which appends a Score to the TSV to incorporate dots for Nulls
  • Added a command which wraps the segmentation algorithm
  • Added a unionbedg command to cli as well as a shared multiinput which accepts gt 1 filename
  • Added a bedreader over bedgraph files
  • Implementation of the unionbedg algorithm using a union over the bed sets, segmentation, then intersections
  • Added a specific writing utility for segments with variable score slices used in unionbedg without reinit csv writers and flushing
  • Added bedgraph to generic dispatch mechanisms
  • Added a cluster command which uses the depth interval struct for writing out
  • Added noodles for bam parsing
  • Added a bam subcommand with an internal convert subcommand to convert bam into bed
  • Added an unimplemented warning with bail instead of panic
  • Added bam output options
  • Added cli interface for mixed inputs bam/bed and a filter command which can be used to select bam intervals that meet overlap criteria
  • Moved bam parsing functions into a shared utility directory
  • Added new dispatch for bam and header with variable bed format
  • Added convenience tool for pulling chr idx directory without specifying a group
  • Implementation of the bam filtering algorithm given an interval file as b
  • Added invert as an output predicate to bam filter, bit slow than bedtools so should compare whether noodles or htslib is faster for writing
  • Added htslib and removed noodles
  • Added a vcf filtering method - borrowing API of rust_htslib-bam centric methods. Also renamed some overlapping namespaces to delineate bam and vcf origins
  • Added clone derive for all subcommand args
  • Add both format and compression status to single output format
  • Added stranded method to Growth to propagate stranded methods to flank, window, extend
  • Added stranded and specific stranded methods to merge. Also added a demote parameter so that merge will by default return the same output format as input format but can be demoted to bed3 if specified
  • Added stranded methods to bam filter
  • Added strandedness to closest and match bedrs-2.0 api for call
  • Added a bam coverage command which accepts a BAM/BED input and counts the number of BAM records that overlap at the BED record
  • Added thread count to bam coverage when reading bam
  • Added threaded option to bam convert
  • Refactored get-fasta into a module and write a bgzip get-fasta using rust-htslib
  • Changed b to allow multiple inputs and set up a ranking system for type demotion
  • Build dispatch with multiple rhs option
  • Added multiple b-file concatenation to all dual input commands

🐛 Bug Fixes

  • Fix bug in tests where wrong read function was imported
  • Modify tests for new shorthand
  • Fix keyword 'about' to 'description'
  • Sort was not retranslating the name field of bed6
  • Allow named chr names in input genome file to extend
  • Update cli to remove argument bounds on inverse for windows
  • Update tests to use new cli
  • Fix bug in tests where columns were being split on newline instead of tab
  • Bug where intersect was skipping sorting file pairs
  • Force meta intervals to always be named because their metadata must always be interpreted as a string
  • Update subtract tests to follow inheritance rules of scores. remove score types from generic
  • Update segment ordering to match bedtools ordering
  • Take explicit end of vcf for structural variants
  • Rename stream in tests
  • Update formatting
  • Update dual generics on StrandedBed3 to match bedrs-2.0 development

🚜 Refactor

  • Remove write_records, write_named_records, and write_records_with by implementing WriteIterImpl for references to coordinates
  • Folded write_records and all associated versions into the WriteIter trait to avoid handling multiple versions of essentially the same code. Needed to also handle generic translaters for this.
  • Needed to specify a specific type to the None in intersect write
  • Remove dead code for format set - will implement in a future version in a different branch
  • Move internal read methods to private to limit number of public read methods
  • Since unnamed iter was already generalized it didn't make sense to include it in bed3. instead I created a new file 'iter' and reexport it from there publicly
  • Create a new struct for a genome with multiple build styles
  • Allow genome to accept an externally provided translater in cases where named bed inputs are read in first
  • Take an external compression threads and level so they are not fixed at compile time
  • Use full rust version of gzp to avoid external cmake dependency
  • Use bedrs 0.2 for lib
  • Remove all mentions of Containers and use IntervalContainer structs instead
  • Update tests to use IntervalContainer structs instead
  • Use bedrs buildin types instead of custom-spun bed6 and bed12 as well as GenomicIntervals
  • Create a BedReader struct which handles file IO and autodetermines input format
  • Include flate2 for input instead of niffler
  • Remove InputFormat impl as it is rolled into the BedReader
  • Use BedReader for sort module
  • Update sample to use new input/field format scheme and bedreader
  • Updated merge module with new format inputs and also generalized streaming iterator to all unnamed file formats
  • Update get_fasta with new formats and generalize initialization of fasta and interval reader before writing
  • Update extend to use new input format specs
  • Major refactor of intersection to handle mixed file formats and using the bedreader struct
  • Refactor closest to use mixed file formats - required rewriting the pairs struct to handle mixed interval types as well as named conversions
  • Update subtract to used mixed file formats and dispatch pattern. can fully remove overlaps module now since that is handled internally by bedrs
  • Remove all old read pairs code since it is handled better via dispatch and bedreader
  • Update extend methods to use built-in bounds
  • Use built-in methods for calculating percentages and bounding extensions in bedrs
  • Used owned find iter to avoid constant rebuffering of output
  • Move cli to separate module
  • Have closest use new argument dispatching and argument folding
  • Update complement to use new cli flattening
  • Update coverage with new flattened arg structure and introduced an overlap_predicates that can be shared
  • Update extend to use new flattened arg structure and introduced percentages
  • Update flank to use new flattened arg structure and introduced percentages
  • Update flank to set left bound at zero
  • Remove name map completely
  • Update get_fasta with new flattened arg structure
  • Update intersect and merge...
Read more