Skip to content

Conversation

SHuang-Broad
Copy link

This pull request introduces two major features and a series of build fixes:

  1. Replaces fcmm with Intel TBB: The fcmm concurrent map dependency, which was causing build failures, has been replaced with Intel's Threading Building Blocks library (tbb::concurrent_hash_map).

  2. Adds explicit RocksDB checkpointing: A new checkpointing feature has been added to allow saving and loading the state of the RocksDB database. This is useful for development and testing, as it avoids the need to re-ingest all gVCF files on every run.

    • New CLI flags: --checkpoint-in, --checkpoint-out, and --checkpoint-only.
    • A new --output flag was also added to direct the final BCF output to a specified file.

Also, a few docker build fixes:

  • Corrected quoting and macro expansion for the GIT_REVISION definition in CMakeLists.txt and C++ source files.
  • Adapted the custom KStringHash class to be compatible with the TBB API.
  • Implemented a stub for the CreateCheckpoint function in the in-memory database used for testing.

Thanks,
Steve

Disclaimer: code changes are co-developed with Gemini CLI

Adds a progress bar to the site genotyping process in Service::genotype_sites.
The bar is rendered to stderr and updates every 100 sites to avoid
excessive I/O overhead.
This commit introduces two new features to improve the usability and efficiency of the GLnexus command-line tool.

Checkpointing:
- A new `--checkpoint-out` option allows creating a checkpoint of the database after the initial data loading and compaction.
- A new `--checkpoint-in` option allows starting an analysis from a previously created checkpoint, skipping the time-consuming data loading step.
- A new `--checkpoint-only` flag can be used to exit the program after creating a checkpoint.
- The command-line interface has been updated to support these new options and enforce their correct usage.

--output flag:
- A new `--output` flag allows specifying an output file for the BCF results directly, as an alternative to redirecting stdout.
This commit replaces the fcmm library with tbb::concurrent_hash_map from Intel's Threading Building Blocks (TBB).
The fcmm dependency was unmaintained and causing build failures with newer compilers.

This change involves:
  - Modifying CMakeLists.txt to download and build TBB as an external project, and linking it to the appropriate targets.
  - Updating the C++ source code (src/data.cc and src/BCFKeyValueData.cc) to use the TBB API.
  - Adapting the custom KStringHash hasher to conform to TBB's HashCompare requirements.
  - Replacing direct iterator-based lookups with TBB's "accessor" pattern for thread-safe map access.
This commit resolves a series of build failures that occurred when compiling inside the Docker container.

The specific fixes include:

  * Dockerfile:
      * The build is now run with make instead of make -j4 to avoid race conditions when building the external dependencies.
      * A new stage has been added to install the gcloud CLI in the final image.

  * CMake Build System:
      * Corrected the quoting of the -DGIT_REVISION flag to prevent shell parsing errors.
      * Added libtbb to the unit_tests linker dependencies.

  * C++ Source Code:
      * Added a stringifying macro (MACRO_TO_STRING) to safely handle the GIT_REVISION macro as a string literal in src/service.cc and
        cli/glnexus_cli.cc.
      * Updated the call to open the RocksDB database in cli/glnexus_cli.cc to use the correct RocksKeyValue::Open function and config struct,
        resolving a compilation error.
@ryys1122
Copy link

ryys1122 commented Oct 9, 2025

i compile the code in this pr, but runs failed. the original release runs successfully.

[3495255] [2025-10-09 08:46:35.656] [GLnexus] [info] found sample set *@3
[3495255] [2025-10-09 08:46:35.656] [GLnexus] [info] discovering alleles in 2190 range(s) on 33 threads
[3495255] [2025-10-09 08:46:35.964] [GLnexus] [error] Failed to discover alleles: Failure: BCFBucketReader: input buffer isn't word-aligned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants