Everything you need to prototype Apache Storm topologies quickly: a Gradle build, a ready-to-use devcontainer, and a joke-driven word count example that proves the toolchain end to end.
RandomJokeSpout(2 executors) reads the bundledjokes.jsondataset and emits random jokes (id, category, rating, body).SplitSentenceBolt(3 executors) tokenizes each joke body into lowercase words.WordCounterBolt(3 executors) maintains per-word counters and emits the running total.HistogramBolt(single executor) collects the counts into a global histogram and writes a timestamped snapshot todata/histogram.txtevery 5 seconds.
WordCountTopology wires these components with shuffle and fields groupings and, in local mode, keeps the embedded Storm cluster alive for about one minute so you have time to inspect the output. Production mode can be toggled via the STORM_PROD environment variable or the -Dstorm.prod system property (both default to false).
Dataset source: https://github.com/taivop/joke-dataset/blob/master/stupidstuff.json
- From the repo root run
./gradlew run. - Watch the console logs; each spout/bolt uses SLF4J to report the tuples it processes.
- Open
data/histogram.txtwhile the topology is running (or right after shutdown) to see the aggregated word frequencies.
Tip: remove data/histogram.txt between runs if you prefer a clean snapshot.
Need to submit directly from the Gradle task? Use STORM_PROD=true ./gradlew run, ./gradlew run -Dstorm.prod=true, or pass an explicit flag with ./gradlew run --args='--prod' so the topology is submitted to Nimbus instead of the embedded LocalCluster.
Important
These commands rely on the go-task runner. If it is not installed locally, either install it (brew install go-task, scoop install task, or download a binary from the releases page) or run them from within the devcontainer where it is preinstalled.
task devcontainer: build, start, and attach to the devcontainer (runs build → up → attach).task devcontainer-recreate: force a teardown and rebuild from scratch.task devcontainer-build: build only.task devcontainer-up: start or reuse the container.task devcontainer-attach: exec into the container and attach to the tmux session.task devcontainer-down: stop and remove the container plus its volumes.
-
Toggle production mode at runtime by exporting
STORM_PROD=trueor passing-Dstorm.prod=truewhen invoking the JVM/Gradle task (no code changes needed). -
Build the fat jar:
./gradlew clean jar. The artifact lands inbuild/libs/apache-storm-starter.jar, bundles your application dependencies, and relies on the Storm runtime provided by the cluster (Storm jars stay external to avoid resource clashes). -
Enter the devcontainer (
task devcontainerortask devcontainer-attach). It already ships with a Storm CLI configured via/root/storm.yaml, including Nimbus and ZooKeeper endpoints, so no extra flags are required. -
Submit the topology from inside the container (remember to enable production mode, e.g.
STORM_PROD=true):STORM_PROD=true storm jar build/libs/apache-storm-starter.jar \ org.apache.storm.example.WordCountTopology \ WordCountTopology
Replace the last argument if you want a different topology name.
-
Monitor the deployment through the Storm UI (
http://<nimbus-host>:8080) or the CLI (storm list). When you're done, stop it withstorm kill WordCountTopology(or your chosen name).
If you need to submit from outside the devcontainer, copy both build/libs/apache-storm-starter.jar and the provided conf/storm.yaml to the target machine and adjust the hostnames to match your cluster.