Skip to content

Conversation

@taxe10
Copy link
Member

@taxe10 taxe10 commented Aug 1, 2025

This PR introduces an alternative simulator that uses the MNIST dataset to generate digit-based test data. The simulator downloads MNIST, ingests a subset of the images into Tiled under:

f"{RESULTS_TILED_URI}/mnist"

Ten containers are created—one for each digit (0–9)—with a predefined number of images per digit. Container sizes vary, and a few contain only a single image, similar to the beamline 733 use case.

Each streamed message includes a simple feature vector of the form:

[digit, index]

where:

  • digit is the numeric label (0–9) corresponding to the container,
  • index is the position of the image within that digit’s subset.

This enables a simple and reproducible source of data for testing vector streaming and feature processing.

Usage

To enable the simulator, update your configuration:

arroyo_vec_sim:
  ...
  environment:
    SIMULATION_TYPE: "mnist"  # Use "mnist" for MNIST simulation

@taxe10 taxe10 requested a review from xiaoyachong August 1, 2025 23:38
@xiaoyachong
Copy link
Contributor

Hi Tanny, thanks for your PR. I noticed a callback error when selecting an image with the digit 3 or 5, where the container contains only one image.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants