Skip to content

feat: add first implementation of collector of sharedmobility #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Giacomo92
Copy link
Collaborator

No description provided.

Copy link
Member

@clezag clezag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main points are using the newer SDK (s3-poller as reference) and removing the helm config files. We will add the specific helm file ourselves

The rest are minor issues with dead/obsolete code and files.

@Giacomo92
Copy link
Collaborator Author

Hi @clezag I’ve addressed all your points and pushed the transformer directory as well. However, I'm encountering two errors when I test the transformer:

app-1 | {"time":"2025-07-24T13:54:00.270630432Z","level":"INFO","msg":"Pushing provenance done","uuid":""} app-1 | {"time":"2025-07-24T13:54:00.270667041Z","level":"DEBUG","msg":"Syncing data types..."} app-1 | {"time":"2025-07-24T13:54:00.274764513Z","level":"ERROR","msg":"Error performing POST"} app-1 | {"time":"2025-07-24T13:54:00.274809105Z","level":"DEBUG","msg":"Syncing data types done."} app-1 | {"time":"2025-07-24T13:54:00.274816744Z","level":"DEBUG","msg":"SyncDataTypes for stationType","stationType":"Scooter"} app-1 | {"time":"2025-07-24T13:54:00.27482866Z","level":"INFO","msg":"Pushing provenance..."} app-1 | {"time":"2025-07-24T13:54:00.274831452Z","level":"INFO","msg":"prv: 0.1.0 prn: tr-sharedmobility"} app-1 | {"time":"2025-07-24T13:54:00.279795318Z","level":"ERROR","msg":"Error performing POST"} app-1 | {"time":"2025-07-24T13:54:00.279885374Z","level":"ERROR","msg":"error","err":"Post \"https://share.opendatahub.testingmachine.eu/json/provenance?&prn=tr-sharedmobility&prv=0.1.0\": dial tcp: lookup share.opendatahub.testingmachine.eu on 192.168.65.7:53: no such host"} app-1 | {"time":"2025-07-24T13:54:00.279907835Z","level":"INFO","msg":"Pushing provenance done","uuid":""} app-1 | {"time":"2025-07-24T13:54:00.279912376Z","level":"DEBUG","msg":"Syncing data types..."} app-1 | {"time":"2025-07-24T13:54:00.283645968Z","level":"ERROR","msg":"Error performing POST"} app-1 | {"time":"2025-07-24T13:54:00.283674093Z","level":"DEBUG","msg":"Syncing data types done."} app-1 | {"time":"2025-07-24T13:54:00.283677628Z","level":"INFO","msg":"Data types registration complete"} app-1 | {"time":"2025-07-24T13:54:03.309464431Z","level":"ERROR","msg":"failed to initialize Tr pub","err":"failed to dial AMQP: Exception (403) Reason: \"username or password not allowed\""} app-1 | panic: failed to dial AMQP: Exception (403) Reason: "username or password not allowed" app-1 | app-1 | goroutine 1 [running]: app-1 | github.com/noi-techpark/opendatahub-go-sdk/ingest/tr.NewTr[...]({0xea9da0?, 0x1491300}, {{{0x0, 0x0}}, {0xc00003e067, 0x27}, {0xc00003c03c, 0x6}, {0xc00004404a, 0x11}, ...}) app-1 | github.com/noi-techpark/opendatahub-go-app-1 | github.com/noi-techpark/opendatahub-go-sdk/ingest/tr.NewTr[...]({0xea9da0?, 0x1491300}, {{{0x0, 0x0}}, {0xc00003e067, 0x27}, {0xc00003c03c, 0x6}, {0xc00004404a, 0x11}, ...}) app-1 | /go/pkg/mod/github.com/noi-techpark/opendatahub-go-sdk/[email protected]/tr/tr.go:57 +0x309 app-1 | main.main() app-1 | /code/main.go:128 +0xf1 app-1 | exit status 2 app-1 exited with code 1

Let me know if you have any suggestions or if there's anything specific you'd like me to adjust.

Thanks!

@clezag
Copy link
Member

clezag commented Jul 25, 2025

@Giacomo92 Looks good to me in general.

Errors

Based on the .env.example you committed (I assume your .env was the same), the writer API URL is wrong.

The correct testing environment URL would be https://mobility.share.opendatahub.testingmachine.eu, but you should not use that during development anyway, as you don't have any valid credentials.

Instead, spin up the timeseries specific services locally alongside the ingestion core, as documented here.
Then point the BDP_BASE_URL env variable at that service (should be http://localhost:8081 - as in the boilerplate)

For the rabbitmq issue:
The boilerplate .env.example uses MQ_URI=amqp://guest:guest@localhost:5672 as default.
The guest:guest is the default user created by rabbitmq on first startup, since you specified username:password, it probably failed because the is no user with name username

In general, we intend the boilerplate default setup to work with the infrastructure compose files out of the box. If they don't, please let us know and we will fix it.

If you have further issues running the local dev setup, feel free to reach out to me directly.

Collector

Looks good!

Just out of curiosity:

Did you test collecting the full dataset (all pages)?
If so, how long did it roughly take, and did the ingestion pipeline accept and relay it?

There should be a size limit of 16MB in the raw data table, I assumed we would run into it if the raw data is merged into a single JSON.

Transformer

  1. Transformer expects a single record as input, but in the collector you publish a list. The ingestion pipeline does not transform your raw data in any way. Expect the rdb.Raw struct that you put out on the collector side to be exactly what your transformer receives.

  2. Since your RawType struct uses a lot of them, be cautious around booleans and int zero values when handling JSONS in go. If the field is not present in the JSON, the value defaults to false / 0 respectively, which might not be what you intended.
    Since you already know that not all fields will always be present, it's better to use *bool / *int instead.
    The same goes for when you construct the MetaData struct later:
    As it is now, all fields are always present, even though station and free_floating should have two different metadata schemas.

  3. your call to SyncStations is with onlyActivate=false. Since you are syncing every single station separately this will not work.
    If onlyActivate is false, all stations of the type and origin you are syncing that are not in the list, will be deactivated.
    In simple terms:
    Set onlyActivate=false when syncing the complete set of stations that exist.
    Set onlyActivate=true when syncing only a subset of stations (in your case a single one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants