|
3 | 3 |
|
4 | 4 | [](https://codecov.io/gh/idlab-discover/AIBoMGen-cli) |
5 | 5 |
|
6 | | -Work-in-progress Go CLI to auto-detect AI artifacts (Hugging Face model IDs in Python and common weight files) and emit CycloneDX AIBOM. Designed for consumer/embedded pipelines with near-zero config. Can be merged with already generated SBOMs (for example with Syft). |
| 6 | +Work-in-progress Go CLI that scans a repository for **basic Hugging Face model usage** and emits a **CycloneDX AI BOM (AIBOM)**. |
7 | 7 |
|
8 | | -## Current |
9 | | -- Command: `generate` (scans path, writes `dist/aibom.json`). |
10 | | -- Detects: `from_pretrained("<id>")` + weight file extensions. |
11 | | -- Test repo: `testdata/repo-basic`. |
| 8 | +## Status (WIP) |
12 | 9 |
|
13 | | -## Planned |
14 | | -- AI metadata fetch, full compliant CycloneDX BOM, SBOM merge, vulnerabilities. |
| 10 | +What works today: |
| 11 | + |
| 12 | +- Basic scanning for Hugging Face model IDs in Python-like sources via `from_pretrained("...")`. |
| 13 | +- AIBOM generation per detected model in JSON or XML. |
| 14 | +- Optional Hugging Face Hub API fetch to populate some metadata fields. |
| 15 | +- Completeness scoring and validation of existing AIBOM files. |
| 16 | + |
| 17 | +What is explicitly future work: |
| 18 | + |
| 19 | +- Improving the scanner beyond the current regex-based Hugging Face detection. |
| 20 | +- Implementing the `internal/enricher` package (interactive completion is currently stubbed). |
| 21 | + |
| 22 | +## Build |
| 23 | + |
| 24 | +```bash |
| 25 | +go test ./... |
| 26 | +go build -o aibomgen-cli . |
| 27 | +./aibomgen-cli --help |
| 28 | +``` |
| 29 | + |
| 30 | +## Commands |
| 31 | + |
| 32 | +### `generate` |
| 33 | + |
| 34 | +Scans a directory for model usage and writes one AIBOM file per detected model. |
15 | 35 |
|
16 | | -## Usage |
17 | 36 | ```bash |
18 | | -go build ./cmd/aibomgen-cli |
19 | | -./aibomgen-cli generate --path testdata/repo-basic |
| 37 | +./aibomgen-cli generate -i testdata/repo-basic |
20 | 38 | ``` |
21 | 39 |
|
22 | | -See `docs/design.md` for roadmap details. |
| 40 | +By default this writes JSON files under `dist/` with filenames derived from the model ID, e.g.: |
| 41 | + |
| 42 | +- `dist/google-bert_bert-base-uncased_aibom.json` |
| 43 | +- `dist/templates_model-card-example_aibom.json` |
| 44 | + |
| 45 | +Common options: |
| 46 | + |
| 47 | +- `--format json|xml|auto` (default: `auto`) |
| 48 | +- `--output <path>`: the **directory portion** is used as output directory (default: `dist/aibom.json` → outputs to `dist/`) |
| 49 | +- `--hf-mode online|dummy` (default: `online`) |
| 50 | +- `--hf-token <token>` for gated/private models |
| 51 | +- `--hf-timeout <seconds>` |
| 52 | +- `--log-level quiet|standard|debug` |
| 53 | + |
| 54 | +Experimental/stubbed: |
| 55 | + |
| 56 | +- `--enrich`: attempts interactive completion, but the underlying enricher is not implemented yet. |
| 57 | + |
| 58 | +### `validate` |
| 59 | + |
| 60 | +Validates an existing AIBOM file (JSON/XML), runs completeness checks, and can fail in strict mode. |
| 61 | + |
| 62 | +```bash |
| 63 | +./aibomgen-cli validate -i dist/google-bert_bert-base-uncased_aibom.json |
| 64 | +./aibomgen-cli validate -i dist/google-bert_bert-base-uncased_aibom.json --strict --min-score 0.5 |
| 65 | +``` |
| 66 | + |
| 67 | +Useful options: |
| 68 | + |
| 69 | +- `--format json|xml|auto` |
| 70 | +- `--strict` (fail on missing required fields) |
| 71 | +- `--min-score 0.0-1.0` |
| 72 | +- `--check-model-card` (default: `true`) |
| 73 | +- `--log-level quiet|standard|debug` |
| 74 | + |
| 75 | +### `completeness` |
| 76 | + |
| 77 | +Computes and prints a completeness score for an existing AIBOM using the metadata field registry. |
| 78 | + |
| 79 | +```bash |
| 80 | +./aibomgen-cli completeness -i dist/google-bert_bert-base-uncased_aibom.json |
| 81 | +``` |
| 82 | + |
| 83 | +Options: |
| 84 | + |
| 85 | +- `--format json|xml|auto` |
| 86 | +- `--log-level quiet|standard|debug` |
| 87 | + |
| 88 | +### `enrich` |
| 89 | + |
| 90 | +Command exists, but is currently not implemented. |
| 91 | + |
| 92 | +```bash |
| 93 | +./aibomgen-cli enrich --help |
| 94 | +./aibomgen-cli enrich -i dist/google-bert_bert-base-uncased_aibom.json |
| 95 | +``` |
| 96 | + |
| 97 | +### Global flags |
| 98 | + |
| 99 | +- `--no-color`: disable ANSI coloring |
| 100 | +- `--config <path>`: optional config file. If not provided, the app attempts to read a Viper config from the home directory (see `cmd/root.go`). |
| 101 | + |
| 102 | +## Package overview |
| 103 | + |
| 104 | +Each folder below is a Go package. |
| 105 | + |
| 106 | +### `main` |
| 107 | + |
| 108 | +Entry point that calls the Cobra root command. |
| 109 | + |
| 110 | +### `cmd` |
| 111 | + |
| 112 | +Cobra CLI wiring: root command, subcommands, flag parsing, and orchestration into `internal/*` packages. |
| 113 | + |
| 114 | +### `internal/scanner` |
| 115 | + |
| 116 | +Repository scanning. |
| 117 | + |
| 118 | +- Current behavior: walks files and detects Hugging Face model IDs by regex matching `from_pretrained("<id>")` in `.py`, `.ipynb`, and `.txt`. |
| 119 | +- Important limitation: weight-file detection is intentionally disabled right now. |
| 120 | +- Future work: broaden detection beyond the current basic Hugging Face pattern. |
| 121 | + |
| 122 | +### `internal/fetcher` |
| 123 | + |
| 124 | +HTTP client for fetching model metadata from the Hugging Face Hub API (`/api/models/:id`). |
| 125 | + |
| 126 | +- Used when `generate --hf-mode online`. |
| 127 | +- Supports optional bearer token via `--hf-token`. |
| 128 | + |
| 129 | +### `internal/metadata` |
| 130 | + |
| 131 | +Central “field registry” describing which CycloneDX ML-BOM fields we care about. |
| 132 | + |
| 133 | +- Defines keys, how to populate them (`Apply`), and how to check presence (`Present`). |
| 134 | +- Used by `internal/builder` to populate the BOM and by `internal/completeness` to score it. |
| 135 | + |
| 136 | +### `internal/builder` |
| 137 | + |
| 138 | +Turns a scan result (and optional Hugging Face API response) into a CycloneDX BOM. |
| 139 | + |
| 140 | +- Creates a minimal ML model component skeleton. |
| 141 | +- Applies the `internal/metadata` registry once to populate fields. |
| 142 | + |
| 143 | +### `internal/generator` |
| 144 | + |
| 145 | +Orchestrates “per discovery” generation. |
| 146 | + |
| 147 | +- For each detected model: fetch metadata (online mode) and build a BOM via the builder. |
| 148 | +- Returns a list of generated BOMs back to the `generate` command. |
| 149 | + |
| 150 | +### `internal/io` |
| 151 | + |
| 152 | +Read/write helpers for CycloneDX BOMs. |
| 153 | + |
| 154 | +- Supports JSON and XML. |
| 155 | +- Supports `format=auto` based on file extension. |
| 156 | +- Supports optional CycloneDX spec version selection for output. |
| 157 | + |
| 158 | +### `internal/completeness` |
| 159 | + |
| 160 | +Computes a completeness score $0..1$ for a BOM using weights defined in the metadata registry. |
| 161 | + |
| 162 | +### `internal/validator` |
| 163 | + |
| 164 | +Validates an existing AIBOM. |
| 165 | + |
| 166 | +- Performs basic structural checks. |
| 167 | +- Validates CycloneDX spec version. |
| 168 | +- Runs completeness scoring and can enforce thresholds in strict mode. |
| 169 | + |
| 170 | +### `internal/enricher` |
| 171 | + |
| 172 | +Intended to interactively fill missing metadata fields. |
| 173 | + |
| 174 | +- Current status: stubbed / not implemented. |
| 175 | +- Future work: implement user prompting and (optionally) model card fetching. |
| 176 | + |
| 177 | +### `internal/logging` |
| 178 | + |
| 179 | +Small opt-in logger used across internal packages (writes only when a writer is configured). |
| 180 | + |
| 181 | +### `internal/ui` |
| 182 | + |
| 183 | +Very small ANSI-color helper used for banners and colored log prefixes. |
| 184 | + |
| 185 | +## Docs and examples |
| 186 | + |
| 187 | +- `testdata/repo-basic` is a small repository used in tests and examples. |
| 188 | +- `docs/` contains design notes and mapping documentation. |
23 | 189 |
|
24 | 190 |
|
0 commit comments