Skip to content

Navigation Menu

Appearance settings

GitHub Copilot
Write better code with AI

GitHub Spark New
Build and deploy intelligent apps

GitHub Models New
Manage and compare prompts

GitHub Advanced Security
Find and fix vulnerabilities

Actions
Automate any workflow
Codespaces
Instant dev environments

Issues
Plan and track work

Code Review
Manage code changes

Discussions
Collaborate outside of code

Code Search
Find more, search less
Explore

Why GitHub

Documentation

GitHub Skills

Blog
Integrations

GitHub Marketplace

MCP Registry
View all features
By company size

Enterprises

Small and medium teams

Startups

Nonprofits
By use case

App Modernization

DevSecOps

DevOps

CI/CD

View all use cases
By industry

Healthcare

Financial services

Manufacturing

Government

View all industries
View all solutions
Topics

AI

DevOps

Security

Software Development

View all
Explore

Learning Pathways

Events & Webinars

Ebooks & Whitepapers

Customer Stories

Partners

Executive Insights
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories

Topics

Trending

Collections
Enterprise platform
AI-powered developer platform
Available add-ons

GitHub Advanced Security
Enterprise-grade security features

Copilot for business
Enterprise-grade AI features

Premium Support
Enterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

bghira / CaptionFlow Public

Notifications You must be signed in to change notification settings
Fork 0
Star 11

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: bghira/CaptionFlow

Releases · bghira/CaptionFlow

v0.4.2 - config parser + local filesystem processor fixes

12 Sep 16:58

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.4.2 - config parser + local filesystem processor fixes Latest

Latest

What's Changed

add warning + fallback for wrong orchestrator config layout by @bghira in #59
attempt to resolve condition where local filesystem processor recaptions successful images by @bghira in #60

Full Changelog: v0.4.1...v0.4.2

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.4.1 - bugfixes for export

11 Sep 11:44

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.4.1 - bugfixes for export

What's Changed

handle KeyError on reload for captionworker by @bghira in #57
resolve error during export, unexpected arguments by @bghira in #58

Full Changelog: v0.4.0...v0.4.1

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.4.0 - migration to Lance format

10 Sep 04:39

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.4.0 - migration to Lance format

What's Changed

use webshart caching iterator helper to avoid blocking by @bghira in #41
worker split cache by @bghira in #42
fix gpu_id based subdir by @bghira in #43
use pylance to export instead of pandas by @bghira in #44
hf url dataset should use relative indexing by @bghira in #45
cleanup examples by @bghira in #46
add pytest-asyncio by @bghira in #47
add tests for captionworker by @bghira in #48
storage manager: use pylance for more optimised appends by @bghira in #50
use correct starting index for chunks, add regression tests by @bghira in #51
config reload drops auth by @bghira in #52
fix tests taking forever by @bghira in #53
test coverage improvements by @bghira in #54
mark bad chunks by @bghira in #55
add auth subcommand to cli module for managing tokens by @bghira in #56

Full Changelog: v0.3.4...v0.4.0

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.3.4 - even more scalability

05 Sep 03:40

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.4 - even more scalability

What's Changed

refactor how we handle heartbeat and worker disconnection by @bghira in #40

Full Changelog: v0.3.3...v0.3.4b

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.3.3

04 Sep 21:31

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.3

What's Changed

webdatasets: position tracking improvements by @bghira in #39

Full Changelog: v0.3.2...v0.3.3

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.3.2 - fix for webdataset caption job resumption

04 Sep 19:35

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.2 - fix for webdataset caption job resumption

What's Changed

bugfix: resuming interrupted jobs does not process missing elements fully by @bghira in #38

Full Changelog: v0.3.1...v0.3.2

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.3.1 - lightweight captionworker

04 Sep 15:22

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.1 - lightweight captionworker

What's Changed

simplify captionworker by @bghira in #37

Full Changelog: v0.3.0...v0.3.1

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.3.0 - massive memory and throughput improvements

04 Sep 01:56

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.3.0 - massive memory and throughput improvements

reimplemented huggingface processor with focus on memory reduction and throughput saturation, can hit 5000 captions/sec
reimplemented webdatasets processor to use the webshart library for massive throughput boost and memory use reduction thanks to the spicy Rust implementation

overall, the orchestrator and worker both will use about 0.5GiB of memory to run, as opposed to several GiB of memory.

added a mock_results mode for the dataset loaders and caption generator to assist in rapid development iteration.

What's Changed

hf URL-based datasets memory leak and performance fix for very-large datasets by @bghira in #36

Full Changelog: v0.2.4...v0.3.0

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.2.4 - dataset viewer and export

27 Aug 16:13

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.4 - dataset viewer and export

What's Changed

feature: add storage export subcommand by @bghira in #34
add urwid based dataset viewer that uses term-image to render by @bghira in #35

Full Changelog: v0.2.3...v0.2.4

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

v0.2.3 - local file support, refactored storage backend, job distribution

26 Aug 03:36

bghira

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.3 - local file support, refactored storage backend, job distribution

What's Changed

remove hf shardwise dataset support by @bghira in #25
Refactor webdataset dataloader abstraction by @bghira in #26
remove duplicate assignment; apply more consistent usage of dataclasses by @bghira in #27
add rate tracking log outputs to the storage subsystem by @bghira in #28
eliminate re-processing of samples that were already processed by a disconnecting worker by @bghira in #29
state tracking fixes for worker & workunit tracker by @bghira in #30
huggingface URL dataset processor v2 by @bghira in #31
local dataset processor by @bghira in #32
simplify schema management by @bghira in #33

Full Changelog: v0.2.2...v0.2.3

Contributors

bghira

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.