Skip to content

Commit 2f88ff8

Browse files
committed
docs(data-pipeline): readd discussion img/notes
1 parent 8cfef29 commit 2f88ff8

File tree

3 files changed

+115
-0
lines changed

3 files changed

+115
-0
lines changed

docs/data_pipeline.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: "Data Pipeline"
3+
subtitle: ""
4+
author: [Joern Griepenburg]
5+
date: "2024-10-24"
6+
lang: "en"
7+
colorlinks: true
8+
header-includes:
9+
- |
10+
```{=latex}
11+
\usepackage{awesomebox}
12+
\usepackage{caption}
13+
14+
\newcommand{\pandocbounded}[1]{#1}
15+
```
16+
pandoc-latex-environment:
17+
noteblock: [note]
18+
tipblock: [tip]
19+
warningblock: [warning]
20+
cautionblock: [caution]
21+
importantblock: [important]
22+
---
23+
24+
# Dataset
25+
26+
- split into training, test, validation set
27+
28+
# Pipeline
29+
30+
![](./img/data_pipeline.jpg)
31+
32+
1. Raw Data
33+
- rosbags
34+
- Nao data
35+
- ...
36+
2. Disk storage
37+
- png
38+
- csv
39+
- ...
40+
3. Torch Dataset
41+
- loads the disk stored data
42+
4. Torch Dataloader
43+
- loads the dataset in batches of samples
44+
45+
## Raw Data
46+
47+
Use bitbots standard fore all data.
48+
49+
- Images from the cameras
50+
- Transform to RGB 8bit
51+
- Resize to squared images (e.g. 480x480)
52+
- Camera id for NAOs
53+
- IMU
54+
- filtered pitch, roll in radians
55+
- use bitbots coordinate systems
56+
- Joint states (angles)
57+
- 20 degrees of freedom (Wolfgang-OP)
58+
- use bitbots naming of joints
59+
- for NAO split hip joints into left and right
60+
- Simplified game state
61+
- Positioning, Stop, Playing
62+
- Role (goalie, player)
63+
- Joint commands (angles)
64+
- 20 degrees of freedom (Wolfgang-OP)
65+
- use bitbots naming of joints
66+
- for NAO split hip joints into left and right
67+
- Time for each data in seconds, since start of recording (float)
68+
69+
## Disk storage
70+
71+
- Add metadata
72+
- when was it recorded
73+
- where is the data from
74+
- what robot was used
75+
- team color
76+
- store as sqlite database
77+
- table for each data type
78+
- table for metadata
79+
- save images as blobs
80+
81+
## Torch Dataset
82+
83+
- maybe convert sqlite to pandas to prevent conversion to python `int`, `float`, etc data types
84+
- iterator of samples
85+
- how many items of different types per sample (e.g. 6 imgs, 3 joint states, ...)
86+
- normalized scaled time `0..1` of item in sample
87+
- fixed number of images
88+
- overlapping samples (all permutations of images)
89+
- normalization of all data types
90+
- specific normalization/representation to be defined
91+
92+
# Ideas
93+
94+
- visualization in `foxglove-studio` and with `matplotlib`
95+
- hyperparameter optimization
96+
- batch sizes
97+
- learning rates
98+
- item counts per sample

docs/gen-pdf.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/usr/bin/env bash
2+
set -eEuo pipefail
3+
# generate pdf with pandoc/xelatex from markdown files with eisvogel
4+
# https://github.com/Wandmalfarbe/pandoc-latex-template
5+
6+
docker run --rm \
7+
-u "$(id -u):$(id -g)" \
8+
-v "$PWD:/data" \
9+
pandoc/extra \
10+
--from=markdown \
11+
--pdf-engine=xelatex \
12+
--template=eisvogel \
13+
--filter pandoc-latex-environment \
14+
--listings \
15+
--highlight-style kate \
16+
-o "${1%.*}.pdf" \
17+
"$1"

docs/img/data_pipeline.jpg

96.3 KB
Loading

0 commit comments

Comments
 (0)