Skip to content

Commit b39f216

Browse files
feat: update README with detailed status, commands, and usage examples for AIBOM generation and validation
1 parent b4d1390 commit b39f216

File tree

1 file changed

+177
-11
lines changed

1 file changed

+177
-11
lines changed

README.md

Lines changed: 177 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,188 @@
33

44
[![codecov](https://codecov.io/gh/idlab-discover/AIBoMGen-cli/branch/main/graph/badge.svg)](https://codecov.io/gh/idlab-discover/AIBoMGen-cli)
55

6-
Work-in-progress Go CLI to auto-detect AI artifacts (Hugging Face model IDs in Python and common weight files) and emit CycloneDX AIBOM. Designed for consumer/embedded pipelines with near-zero config. Can be merged with already generated SBOMs (for example with Syft).
6+
Work-in-progress Go CLI that scans a repository for **basic Hugging Face model usage** and emits a **CycloneDX AI BOM (AIBOM)**.
77

8-
## Current
9-
- Command: `generate` (scans path, writes `dist/aibom.json`).
10-
- Detects: `from_pretrained("<id>")` + weight file extensions.
11-
- Test repo: `testdata/repo-basic`.
8+
## Status (WIP)
129

13-
## Planned
14-
- AI metadata fetch, full compliant CycloneDX BOM, SBOM merge, vulnerabilities.
10+
What works today:
11+
12+
- Basic scanning for Hugging Face model IDs in Python-like sources via `from_pretrained("...")`.
13+
- AIBOM generation per detected model in JSON or XML.
14+
- Optional Hugging Face Hub API fetch to populate some metadata fields.
15+
- Completeness scoring and validation of existing AIBOM files.
16+
17+
What is explicitly future work:
18+
19+
- Improving the scanner beyond the current regex-based Hugging Face detection.
20+
- Implementing the `internal/enricher` package (interactive completion is currently stubbed).
21+
22+
## Build
23+
24+
```bash
25+
go test ./...
26+
go build -o aibomgen-cli .
27+
./aibomgen-cli --help
28+
```
29+
30+
## Commands
31+
32+
### `generate`
33+
34+
Scans a directory for model usage and writes one AIBOM file per detected model.
1535

16-
## Usage
1736
```bash
18-
go build ./cmd/aibomgen-cli
19-
./aibomgen-cli generate --path testdata/repo-basic
37+
./aibomgen-cli generate -i testdata/repo-basic
2038
```
2139

22-
See `docs/design.md` for roadmap details.
40+
By default this writes JSON files under `dist/` with filenames derived from the model ID, e.g.:
41+
42+
- `dist/google-bert_bert-base-uncased_aibom.json`
43+
- `dist/templates_model-card-example_aibom.json`
44+
45+
Common options:
46+
47+
- `--format json|xml|auto` (default: `auto`)
48+
- `--output <path>`: the **directory portion** is used as output directory (default: `dist/aibom.json` → outputs to `dist/`)
49+
- `--hf-mode online|dummy` (default: `online`)
50+
- `--hf-token <token>` for gated/private models
51+
- `--hf-timeout <seconds>`
52+
- `--log-level quiet|standard|debug`
53+
54+
Experimental/stubbed:
55+
56+
- `--enrich`: attempts interactive completion, but the underlying enricher is not implemented yet.
57+
58+
### `validate`
59+
60+
Validates an existing AIBOM file (JSON/XML), runs completeness checks, and can fail in strict mode.
61+
62+
```bash
63+
./aibomgen-cli validate -i dist/google-bert_bert-base-uncased_aibom.json
64+
./aibomgen-cli validate -i dist/google-bert_bert-base-uncased_aibom.json --strict --min-score 0.5
65+
```
66+
67+
Useful options:
68+
69+
- `--format json|xml|auto`
70+
- `--strict` (fail on missing required fields)
71+
- `--min-score 0.0-1.0`
72+
- `--check-model-card` (default: `true`)
73+
- `--log-level quiet|standard|debug`
74+
75+
### `completeness`
76+
77+
Computes and prints a completeness score for an existing AIBOM using the metadata field registry.
78+
79+
```bash
80+
./aibomgen-cli completeness -i dist/google-bert_bert-base-uncased_aibom.json
81+
```
82+
83+
Options:
84+
85+
- `--format json|xml|auto`
86+
- `--log-level quiet|standard|debug`
87+
88+
### `enrich`
89+
90+
Command exists, but is currently not implemented.
91+
92+
```bash
93+
./aibomgen-cli enrich --help
94+
./aibomgen-cli enrich -i dist/google-bert_bert-base-uncased_aibom.json
95+
```
96+
97+
### Global flags
98+
99+
- `--no-color`: disable ANSI coloring
100+
- `--config <path>`: optional config file. If not provided, the app attempts to read a Viper config from the home directory (see `cmd/root.go`).
101+
102+
## Package overview
103+
104+
Each folder below is a Go package.
105+
106+
### `main`
107+
108+
Entry point that calls the Cobra root command.
109+
110+
### `cmd`
111+
112+
Cobra CLI wiring: root command, subcommands, flag parsing, and orchestration into `internal/*` packages.
113+
114+
### `internal/scanner`
115+
116+
Repository scanning.
117+
118+
- Current behavior: walks files and detects Hugging Face model IDs by regex matching `from_pretrained("<id>")` in `.py`, `.ipynb`, and `.txt`.
119+
- Important limitation: weight-file detection is intentionally disabled right now.
120+
- Future work: broaden detection beyond the current basic Hugging Face pattern.
121+
122+
### `internal/fetcher`
123+
124+
HTTP client for fetching model metadata from the Hugging Face Hub API (`/api/models/:id`).
125+
126+
- Used when `generate --hf-mode online`.
127+
- Supports optional bearer token via `--hf-token`.
128+
129+
### `internal/metadata`
130+
131+
Central “field registry” describing which CycloneDX ML-BOM fields we care about.
132+
133+
- Defines keys, how to populate them (`Apply`), and how to check presence (`Present`).
134+
- Used by `internal/builder` to populate the BOM and by `internal/completeness` to score it.
135+
136+
### `internal/builder`
137+
138+
Turns a scan result (and optional Hugging Face API response) into a CycloneDX BOM.
139+
140+
- Creates a minimal ML model component skeleton.
141+
- Applies the `internal/metadata` registry once to populate fields.
142+
143+
### `internal/generator`
144+
145+
Orchestrates “per discovery” generation.
146+
147+
- For each detected model: fetch metadata (online mode) and build a BOM via the builder.
148+
- Returns a list of generated BOMs back to the `generate` command.
149+
150+
### `internal/io`
151+
152+
Read/write helpers for CycloneDX BOMs.
153+
154+
- Supports JSON and XML.
155+
- Supports `format=auto` based on file extension.
156+
- Supports optional CycloneDX spec version selection for output.
157+
158+
### `internal/completeness`
159+
160+
Computes a completeness score $0..1$ for a BOM using weights defined in the metadata registry.
161+
162+
### `internal/validator`
163+
164+
Validates an existing AIBOM.
165+
166+
- Performs basic structural checks.
167+
- Validates CycloneDX spec version.
168+
- Runs completeness scoring and can enforce thresholds in strict mode.
169+
170+
### `internal/enricher`
171+
172+
Intended to interactively fill missing metadata fields.
173+
174+
- Current status: stubbed / not implemented.
175+
- Future work: implement user prompting and (optionally) model card fetching.
176+
177+
### `internal/logging`
178+
179+
Small opt-in logger used across internal packages (writes only when a writer is configured).
180+
181+
### `internal/ui`
182+
183+
Very small ANSI-color helper used for banners and colored log prefixes.
184+
185+
## Docs and examples
186+
187+
- `testdata/repo-basic` is a small repository used in tests and examples.
188+
- `docs/` contains design notes and mapping documentation.
23189

24190

0 commit comments

Comments
 (0)