Skip to content

Releases: bghira/CaptionFlow

v0.4.2 - config parser + local filesystem processor fixes

12 Sep 16:58
72575ce
Compare
Choose a tag to compare

What's Changed

  • add warning + fallback for wrong orchestrator config layout by @bghira in #59
  • attempt to resolve condition where local filesystem processor recaptions successful images by @bghira in #60

Full Changelog: v0.4.1...v0.4.2

v0.4.1 - bugfixes for export

11 Sep 11:44
fe62684
Compare
Choose a tag to compare

What's Changed

  • handle KeyError on reload for captionworker by @bghira in #57
  • resolve error during export, unexpected arguments by @bghira in #58

Full Changelog: v0.4.0...v0.4.1

v0.4.0 - migration to Lance format

10 Sep 04:39
11e2c45
Compare
Choose a tag to compare

What's Changed

  • use webshart caching iterator helper to avoid blocking by @bghira in #41
  • worker split cache by @bghira in #42
  • fix gpu_id based subdir by @bghira in #43
  • use pylance to export instead of pandas by @bghira in #44
  • hf url dataset should use relative indexing by @bghira in #45
  • cleanup examples by @bghira in #46
  • add pytest-asyncio by @bghira in #47
  • add tests for captionworker by @bghira in #48
  • storage manager: use pylance for more optimised appends by @bghira in #50
  • use correct starting index for chunks, add regression tests by @bghira in #51
  • config reload drops auth by @bghira in #52
  • fix tests taking forever by @bghira in #53
  • test coverage improvements by @bghira in #54
  • mark bad chunks by @bghira in #55
  • add auth subcommand to cli module for managing tokens by @bghira in #56

Full Changelog: v0.3.4...v0.4.0

v0.3.4 - even more scalability

05 Sep 03:40
aaa4ee2
Compare
Choose a tag to compare

What's Changed

  • refactor how we handle heartbeat and worker disconnection by @bghira in #40

Full Changelog: v0.3.3...v0.3.4b

v0.3.3

04 Sep 21:31
92719b4
Compare
Choose a tag to compare

What's Changed

  • webdatasets: position tracking improvements by @bghira in #39

Full Changelog: v0.3.2...v0.3.3

v0.3.2 - fix for webdataset caption job resumption

04 Sep 19:35
9d5eecc
Compare
Choose a tag to compare

What's Changed

  • bugfix: resuming interrupted jobs does not process missing elements fully by @bghira in #38

Full Changelog: v0.3.1...v0.3.2

v0.3.1 - lightweight captionworker

04 Sep 15:22
7f55b33
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0 - massive memory and throughput improvements

04 Sep 01:56
2530bbd
Compare
Choose a tag to compare
  • reimplemented huggingface processor with focus on memory reduction and throughput saturation, can hit 5000 captions/sec
  • reimplemented webdatasets processor to use the webshart library for massive throughput boost and memory use reduction thanks to the spicy Rust implementation

overall, the orchestrator and worker both will use about 0.5GiB of memory to run, as opposed to several GiB of memory.

added a mock_results mode for the dataset loaders and caption generator to assist in rapid development iteration.

What's Changed

  • hf URL-based datasets memory leak and performance fix for very-large datasets by @bghira in #36

Full Changelog: v0.2.4...v0.3.0

v0.2.4 - dataset viewer and export

27 Aug 16:13
b33396f
Compare
Choose a tag to compare

What's Changed

  • feature: add storage export subcommand by @bghira in #34
  • add urwid based dataset viewer that uses term-image to render by @bghira in #35

Full Changelog: v0.2.3...v0.2.4

v0.2.3 - local file support, refactored storage backend, job distribution

26 Aug 03:36
932bb1b
Compare
Choose a tag to compare

What's Changed

  • remove hf shardwise dataset support by @bghira in #25
  • Refactor webdataset dataloader abstraction by @bghira in #26
  • remove duplicate assignment; apply more consistent usage of dataclasses by @bghira in #27
  • add rate tracking log outputs to the storage subsystem by @bghira in #28
  • eliminate re-processing of samples that were already processed by a disconnecting worker by @bghira in #29
  • state tracking fixes for worker & workunit tracker by @bghira in #30
  • huggingface URL dataset processor v2 by @bghira in #31
  • local dataset processor by @bghira in #32
  • simplify schema management by @bghira in #33

Full Changelog: v0.2.2...v0.2.3