Macro & Micronutrient Food Recommender (TestoAI)

What this is: A small ML project that learns nutrient embeddings from the USDA/Kaggle dataset and recommends foods that best match a testosterone‑supportive nutrient profile.

Overview (high level)

This project trains a tiny autoencoder (PyTorch) on hand‑picked macro and micro features, then compares each food to a prototype built from seed items (e.g., oysters, egg yolk, beef liver) known for key micronutrients. The recommender is protein‑forward, with light rules (activity awareness, food‑group preferences, and basic filters) to keep results practical.

Dataset: USDA National Nutrient Database (via Kaggle).
Goal: Hit macro targets and favor micro‑dense foods (e.g., zinc, magnesium, selenium, B‑vitamins) while avoiding ultra‑processed items.
Artifacts: data/emb_df.parquet, model/encoder.pt, model/scaler.pkl, model/meta.json.

How foods are scored (the important part)

Features used (columns, units):

Macros: Protein_g, Fat_g, Carb_g, Energy_kcal
Micros: Zinc_mg, Magnesium_mg, Selenium_mcg, VitB6_mg, VitB12_mcg, VitA_mcg, Iron_mg, Sugar_g, Fiber_g
Scaling: per‑nutrient MinMax to [0,1] (keeps mg vs g comparable).

Embedding learning (train time):

Autoencoder: 13 → 64 → 16 (latent) → 64 → 13, ReLU, MSE loss, Adam (1e‑3), ~10 epochs (tunable).
The encoder output is the food embedding.

Prototype & similarity (inference time):

Build a prototype vector = mean embedding of seed foods (oysters, egg yolk, liver, etc.).
Score each food by cosine similarity to the prototype (higher = better).

Rules & weights (post‑scoring tweaks):

Protein‑forward emphasis (boost items with strong protein signal).
Activity‑aware fat handling (slightly more lenient at high activity, stricter at low).
Food‑group preferences (e.g., Beef / Dairy‑Egg → Ideal; keep variety with Fruits/Veg based on sugar/fiber/minerals).
Fruit sugar buckets (low / moderate / high sugar by distribution percentile).
Quality filters: drop ultra‑processed/combined items; cap organ meats; de‑dupe near‑duplicates.

Final score (conceptual): score = α · cosine(embedding, prototype) + β · macro‑alignment − penalties(rules) (defaults are simple; weights are easy to tune.)

Two ways to run (share the spotlight)

A) Full model path (original, with your Kaggle dataset)

# 1) Install deps
pip install -r requirements.txt

# 2) Train (downloads data via kagglehub, preprocesses, trains AE, saves artifacts)
python train.py

# 3) Try the interactive demo
python demo.py

B) Fast demo via CLI (optional)

# 1) Install package (editable for dev)
pip install -e .

# 2) (Optional) write a tiny demo dataset to data/emb_df.parquet (see snippet in README)
# 3) Run the CLI (uses saved artifacts if present)
testoai --targets "protein=194,fat=65,carbs=247" --k 5

Example output (truncated)

               Description  Protein_g  Fat_g  Carb_g     score
0      Greek yogurt (200g)         20    4.0       8  0.967060
1  96/4 Ground beef (113g)         24    4.0       0  0.928033
2    Chicken breast (100g)         31    3.6       0  0.926318
3           Eggs (3 large)         18   15.0       1  0.917042
4      Ribeye steak (100g)         19   20.0       0  0.892692

Project structure

TestoAI/
├─ src/testoai/
│  ├─ __init__.py
│  ├─ recommend.py        # ranking/scoring logic (cosine + rules)
│  └─ cli.py              # parses flags, calls recommend(), prints a table
├─ train.py               # regenerates artifacts from full dataset
├─ demo.py                # interactive demo using saved artifacts
├─ data/                  # (generated) embeddings & tables
├─ model/                 # (generated) encoder weights, scaler, meta
├─ requirements.txt
├─ pyproject.toml         # makes package installable; creates `testoai` command
├─ README.md
└─ .gitignore

Roadmap

Near-term (polish)

Provide small pretrained AE weights and attach to a GitHub Release.
Tiny evaluation notebook comparing cosine vs macro-distance with top-K examples.
Unit tests (pytest): column mapping, protein-forward monotonicity, fruit sugar buckets.
GitHub Actions: ruff/black + pytest (Py 3.10/3.11).
Sample config files: examples/targets.yaml, examples/presets.yaml.

Next (nice-to-have)

Minimal FastAPI endpoint (/recommend) + 1-page Streamlit UI.
Dockerfile for one-command runs.
Expanded data docs (features, seed foods, assumptions/filters) linked from README.

Notes & Disclaimer

Educational project; not medical advice. Results depend on dataset quality, scaling, chosen seeds/rules.

License

MIT (see LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Macro & Micronutrient Food Recommender (TestoAI)

Overview (high level)

How foods are scored (the important part)

Two ways to run (share the spotlight)

A) Full model path (original, with your Kaggle dataset)

B) Fast demo via CLI (optional)

Example output (truncated)

Project structure

Roadmap

Notes & Disclaimer

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
src/testoai		src/testoai
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample_output.txt		sample_output.txt
train.py		train.py

License

damianmtz23/TestoAI

Folders and files

Latest commit

History

Repository files navigation

Macro & Micronutrient Food Recommender (TestoAI)

Overview (high level)

How foods are scored (the important part)

Two ways to run (share the spotlight)

A) Full model path (original, with your Kaggle dataset)

B) Fast demo via CLI (optional)

Example output (truncated)

Project structure

Roadmap

Notes & Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages