Skip to content

Releases: kids-first/kf-cbioportal-etl

🔧 Minor bug fixes

10 Jul 15:37
78d28ea
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.1...v2.4.2

↕️ Separate Refs + CNV Bug Fix

23 Jun 14:07
39ecc0d
Compare
Choose a tag to compare

Separated refs from code for cleaner dev cycle. Also incorporated a bug fix for errant GISTIC calculations in CNV.

What's Changed

New Contributors

Full Changelog: v2.3.0...v2.4.1

👯 ✨Copy Number and Expression Refactor

10 Jun 12:39
526e097
Compare
Choose a tag to compare

Copy number and RNA expression transformation have been revamped

  • CN now governed by one script, with config determining order of preference based on data availability (i.e., use CNVkit for WXS if there, then ControlFreeC, etc)
  • Expression now normalized by library type when available and against a healthy reference when available
  • Docs updated to reflect changes

What's Changed

Full Changelog: v2.2.2...v2.3.0

🔧 More Bug Fixes 🧹 More Linting

10 Apr 21:01
1c2342c
Compare
Choose a tag to compare

Fixed some minor solo vs etl mode arg bugs in:

  • get_study_metadata with args.study_config (also cleared out a deprecated function)
  • check_downloads with file_types to check namespace (solo) and not None (etl)
  • cli to use manifest_subset (also learned that hyphenated args get interpreted as _ automatically)

What's Changed

Full Changelog: v2.2.1...v2.2.2

🧹 Code linting, 🔧 Fixed bug, 🔨 Improved delta output

03 Apr 19:27
a0616ef
Compare
Choose a tag to compare

Code was linted to improve quality and reviewability

  • Added Typing to variables and functions
  • Use PEP 8 standards as much as possible
  • Doc strings all over the place
  • Found a bug in the process! Info files were not being opened during CNV GISTIC output, therefore non-diploid samples will have some incorrect values. Applied a fix for this
    Also modified delta output to merge timeline data to fix incremental upload behavior

What's Changed

Full Changelog: v2.2.0...v2.2.1

🚀 Incremental Updates + Multi-Step CLI

24 Mar 19:15
ea64793
Compare
Choose a tag to compare

This release introduces major improvements to the cbio-etl tool:

  • Incremental Updates: Added support for incremental updates, enabling users to update studies by importing only new samples instead of reprocessing the entire dataset. This makes the ETL workflow faster and more efficient, especially for studies with frequent data updates.

  • Refactored into a Multi-Command CLI: Reworked the ETL into a modular, multi-command-line interface (import & update modes) that allows for greater flexibility:

    • cbio-etl import: Executes the full ETL pipeline for a complete study import.
    • cbio-etl update: Handles incremental updates, identifying and processing only new patient samples, and automatically triggers the appropriate ETL steps.
  • Automation Enhancements:

    • Automatically detects and switches into add_data mode when new patient data is found.
    • Automatically adjusts paths and parameters when processing incremental updates.
    • Handles context-sensitive logic to manage and organize study data and config files on-the-fly.

What's Changed

New Contributors

Full Changelog: v2.1.0...v2.2.0

v2.1.0

10 Feb 15:52
ffa34ee
Compare
Choose a tag to compare

What's Changed

  • 🔧 debugged main ETL pipeline and added a config generator step (to be expanded in next PR) by @wongjessica93 in #73
  • 🔨 Add Incremental File Output by @migbro in #74

Full Changelog: v2.0.0...v2.1.0

🚀 Automated ETL Workflow with Pip Installation Support

16 Dec 21:09
b45a2d9
Compare
Choose a tag to compare

This release introduces a fully refractored ETL pipeline that is now installable via pip, streamlining the setup and execution of the workflow.

  • users can easily trigger the ETL process via a single command
  • supports modular approach, allowing users to run specific steps of pipeline as needed
  • improved code modularity

What's Changed

  • ✏️ Minor Config/Template Updates by @migbro in #70
  • converted etl to standalone tool and updated readme.md by @wongjessica93 in #72

New Contributors

Full Changelog: v1.6.0...v2.0.0

📝 OPC V12 Doc Revision

23 Oct 14:55
Compare
Choose a tag to compare
Pre-release

This is a special case in which this particular branch has code and documentation relevant to the OpenPedCan v12 load, broken by later releases. It is a niche use case in the even that this specific study needs further revision or simply referenced for a repeat load.

Full Changelog: 0.8.1...v0.8.2

🗻CBTN Summit + Chordoma Updates

18 Oct 13:08
b53f371
Compare
Choose a tag to compare

Config and software updates made to help simplify ETL

  • Updated configs to reflect CNV changes
  • Updated download script to allow for new cbio_file_name_id.txt format that now has file_id and s3_path so that ...
    genomic file manifests eliminated and folded into cbio_file_name_id.txt files so that the command is much simpler. See line 21 in README documentation to see the difference
  • Updated maf merge to record sym link errors
  • Some QOL formatting updates

What's Changed

  • ⛰️ PBTA Summit and Chordoma Updates by @migbro in #68

Full Changelog: v1.5.0...v1.6.0