-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathworkshop.html
821 lines (758 loc) · 46.4 KB
/
workshop.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
<!DOCTYPE html>
<!-- saved from url=(0060)http://getbootstrap.com/2.3.2/examples/starter-template.html -->
<html lang="fr"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<title>Streaming Data Workshop</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<!-- Le styles -->
<link href="styles/bootstrap.min.css" rel="stylesheet" />
<link rel="stylesheet" href="styles/idea.css">
<style type="text/css">
body {
padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
}
section {
padding-bottom: 20px;
}
li {
margin: 10px;
}
pre {
background-color: white;
}
.btn {
margin-top: 5px;
margin-bottom: 5px;
}
</style>
<script src="js/highlight.pack.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body cz-shortcut-listen="true">
<div class="navbar navbar-inverse navbar-fixed-top">
<div class="navbar-inner">
<div class="container">
<a class="navbar-brand" href="#">
Streaming Data Workshop
</a>
<div class="nav-collapse collapse">
<ul class="nav">
</ul>
</div><!--/.nav-collapse -->
</div>
</div>
</div>
<div class="container">
<h1>Installation Steps</h1>
To do this workshop, follow these installation steps:
<h2>Install OpenShift Cluster on Google Cloud Platform</h2>
<ul>
<li>Use Chome or Firefox, open an Incognito/Guest window in your browser</li>
<li>Use test credentials handed out and log into
<a href="https://console.cloud.google.com">Google Cloud Console</a>.</li>
<li>Select the pre-created projects from the Projects list.</li>
<li>Open the Google Cloud Shell, we'll setup the OpenShift Cluster from within the Cloud Shell:
<br/><br/>
<pre><code>> git clone https://github.com/infinispan-demos/streaming-data-workshop
> cd streaming-data-workshop/openshifter
> ./provision-openshift.sh</code></pre>
<div class="alert alert-warning">
If at some point OpenShift connection times out or if provisioning fails with error while running docker
, log into the Google Cloud Shell and do:
<br/><br/>
<pre><code>> cd streaming-data-workshop/openshifter
> docker run -e -ti -v `pwd`:/root/data docker.io/osevg/openshifter create streaming-data-workshop</code></pre>
Doing this will re-provision the OpenShift cluster and it should get back online.
</div>
<div class="alert alert-info" role="alert">
If running OpenShift on your account and you want to delete OpenShift, open Google Cloud Shell and execute:
<br/><br/>
<pre><code>> cd streaming-data-workshop/openshifter
> docker run -e -ti -v `pwd`:/root/data docker.io/osevg/openshifter destroy streaming-data-workshop</code></pre>
If any timeouts happen while destroying OpenShift, call <code>delete-openshift.sh</code> script from the same folder.
</div>
In Cloud Shell, find out the OpenShift Compute Engine VM External IP address:
<br/><br/>
<pre>$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
streaming-data-workshop-master ... ... X.X.X.X Y.Y.Y.Y RUNNING
^-- This is your OpenShift Master IP!
</pre>
The OpenShift console is located in:
<br/><br/>
<code>https://console.streaming-data-workshop.${OPENSHIFT_MASTER_IP}.nip.io:8443/console</code>
<br/><br/>
<div class="alert alert-info" role="alert">
Default credentials in OpenShift are <kbd>developer/developer</kbd>
</div>
</li>
</ul>
<h2>Connect to OpenShift Cluster from Your Machine</h2>
<ul>
<li>Install
<a href="https://github.com/openshift/origin/releases/tag/v3.7.2">OpenShift client tools 3.7.2</a>
in your local machine.
Once you've installed it, try logging into the OpenShift instance in the cloud:
<br/><br/><pre><code>> export OPENSHIFT_MASTER_IP=...
> oc login -u developer -p developer https://console.streaming-data-workshop.${OPENSHIFT_MASTER_IP}.nip.io:8443
Login successful.
You have one project on this server: "myproject"
Using project "myproject".</code></pre>
</li>
</ul>
<h2>Setup Your Local Machine</h2>
<ul>
<li>
Clone or download the workshop repository to your machine:
<br/><br/>
<code>https://github.com/infinispan-demos/streaming-data-workshop</code>
<br/><br/>
</li>
<li>Install <a href="https://github.com/creationix/nvm">Node.js Version Manager</a> in your local machine.
Makes it easy to work with multiple Node.js versions.
Then install the Node.js version to be used in the workshop:
<br/><br/><pre><code>> nvm install 4.2</code></pre>
</li>
<li>Install Apache Maven in your local machine.</li>
</ul>
</div>
<div class="container">
<h1>Use Case</h1>
In this workshop you will learn how to combine real-life data streams to build robust, reactive and scalable applications.
For that, the workshop will used Red Hat technologies that are very well suited for these kind of use cases: <strong>Eclipse Vert.x</strong>, <strong>Infinispan</strong> and <strong>Openshift</strong>.
The data set will be based around Swiss transport and will feature two publicly available data streams:
<br/><br/>
<img src="img/opendata-logo.png" class="img-rounded" height="10%" width="10%">
<br/><br/>
The first data stream comes from
<a href="https://opendata.ch">OpenData.ch</a>
which is the Swiss branch of the
<a href="https://okfn.org">Open Knowledge Foundation.</a>
This workshop will use their
<a href="http://transport.opendata.ch">Transport API</a>,
which exposes public Swiss transport timetable data.
<br/><br/>
<img src="img/sbb-logo.png" class="img" height="30%" width="30%">
<br/><br/>
The second data stream comes from
<a href="https://opendata.ch">sbb.ch</a>,
the Swiss national railway company.
This workshop will use the API that exposes the positions of trains at a given time.
<br/><br/>
Since the availability of these APIs can't be guaranteed when the workshop is delivered, the workshop will use previously captured data.
The workshop will slowly ramp up using this data and learning the technologies involved by building applications based on this data.
</div>
<div class="container">
<h1>Playground</h1>
<p>We will start with a warm-up for those that are not familiar with the Red Hat technologies in use.</p>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is OpenShift?</h4>
<img src="img/openshift.png" alt="Openshift" class="img-rounded" height="30%" width="30%">
<br/><br/>
OpenShift is Red Hat's Platform-as-a-Service (Paas) that allows developers to quickly develop, host, and scale apps in a cloud environment.
It uses Kubernetes for the orchestration of Docker containers where applications run.
It can be run in a public environment or privately within your organization.
You can run applications written in multiple languages such as Ruby, Javascript, Python...etc.
<hr/>
<h4 class="alert-heading">What will OpenShift be used for in the workshop?</h4>
OpenShift will be used as the environment for running most of the components of this workshop.
As you work through the exercises, you will learn how to deploy applications to it.
</div>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is Vert.x?</h4>
Vert.x is a toolkit to build reactive and distributed applications on the Java Virtual Machine.
It provides various asynchronous and non-blocking APIs and it's non-opinionated, so you can combine them to build a wide range of applications:
Web applications, Microservices, Networking, embedded / IoT, you name it.
<hr/>
<h4 class="alert-heading">What will Vert.x be used for in the workshop?</h4>
Vert.x will be used to start HTTP servers and expose the REST API of some of the workshop components.
It will also be used push delayed trains positions with the EventBus bridge, a WebSocket extension of the Vert.x EventBus to the browser.
</div>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is Infinispan?</h4>
<img src="img/infinispan-logo.png" alt="Infinispan" class="img-rounded" height="20%" width="20%">
<br/>
Infinispan is an in-memory a distributed in-memory key/value data grid.
It uses peer-to-peer communication between different nodes, hence there are no master/slave nodes.
Infinispan has multiple uses:
<ul>
<li>Distributed cache to boost the performance of your application.</li>
<li>Temporary, scalable in-memory data store with failover capabilities.</li>
<li>Data analysis via distributed Java Streams API or Spark/Hadoop integrations.</li>
<li>Event-driven computation thanks to live-updating query functionality.</li>
</ul>
<hr/>
<h4 class="alert-heading">What will Infinispan be used for in the workshop?</h4>
Infinispan's role will be to store data from coming from the mocked real-live data streams and provide personalized real-time updates to the clients.
</div>
<h2>Ex1 :: Start Openshift, Infinispan and Visualizer</h2>
In this first exercise you will first learn to how to start OpenShift and deploy a 3 node Infinispan cluster where data will be stored.
Then, you will deploy all components of the workshop, some of which will be already complete and some you will need to modify to complete the workshop.
<br/><br/>
<div class="alert alert-warning">
You can use less Infinispan nodes if your machine is not powerful enough.
</div>
<img src="img/diagram-openshift-infinispan.png" class="img-rounded" height="50%" width="50%"/>
<br/><br/>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">Why deploy multiple Infinispan nodes?</h4>
Infinispan nodes are in charge of keeping data in-memory.
By deploying multiple nodes and using smart data distribution techniques, Infinispan can guarantee that even if nodes fail, data will survive.
On top of that, by deploying multiple nodes, Infinispan offers more space for storing data.
<hr/>
<h4 class="alert-heading">How does Infinispan distribute data?</h4>
<div class="row">
<div class="col-xs-2">
<img src="img/consistent-hash.png" class="img-rounded" height="110%" width="110%"/>
</div>
<div class="col-xs-10">
Infinispan uses consistent hash distribution techniques to decide which nodes stores which data.
For this workshop, Infinispan has been set up to keep 2 copies for each data.
<br/><br/>
Data is represented as a key/value pair and for each pair there's a node which is the owner.
The owner is decided applying consistent hash algorithms to the key.
The rest of copies of the data are stored in secondary owner nodes.
<br/><br/>
If a new node starts, or one of the node fails, Infinispan rebalances the data so that 2 copies are still maintained for each data piece.
So, this means that with this set up, Infinispan can handle one node failing at the time and the data will still survive.
</div>
</div>
</div>
To set up OpenShift, you will use the scripts provided by the workshop and follow the instructions below.
Once logged into OpenShift, you will create an Infinispan cluster.
Next, follow the step-by-step instructions:
<section id="start-openshift-infinispan">
<ol>
<li>Set up OpenShift cluster with the script provided:</li>
<pre><code>> cd streaming-data-workshop
> ./setup-gcp-openshift.sh streaming-data-workshop ${OPENSHIFT_MASTER_IP}</code></pre>
<li>Log in into Openshift console with <code>developer/developer</code>
<br/><br/>
<div class="alert alert-warning">
The rest of this section shows how to create an Infinispan cluster using the OpenShift user interface and explores some of its functionality.
If you prefer, you can skip the rest of this exercise by creating Infinispan cluster from command line :
<br/><br/>
<pre><code>> cd streaming-data-workshop
> ./setup-datagrid.sh</code></pre>
</div>
</li>
<li>In the home page, click on <kbd>Select from Project</kbd>
<br/><br/>
<img src="img/openshift-catalog.png" class="img-rounded" height="50%" width="50%"/>
</li>
<li>Select <kbd>infinispan-ephemeral</kbd> and click <kbd>Next</kbd>
<br/><br/>
<img src="img/infinispan-ephemeral.png" class="img-rounded" height="50%" width="50%"/>
</li>
<li>Click <kbd>Next</kbd></li>
<li>Set <code>APPLICATION_NAME</code> to <kbd>datagrid</kbd>
<br/><br/>
<img src="img/template-application-name.png" class="img-rounded" height="50%" width="50%"/>
<br/><br/>
</li>
<li>Set <code>MANAGEMENT_PASSWORD</code> to <kbd>developer</kbd>
<br/>
Set <code>MANAGEMENT_USER</code> to <kbd>developer</kbd>
<br/>
Set <code>NUMBER_OF_INSTANCES</code> to <kbd>3</kbd>
<br/><br/>
<div class="alert alert-warning">
If running in a low powered environment, feel free to set the number of instances to <kbd>1</kbd>
</div>
<img src="img/template-mgmt-instances.png" class="img-rounded" height="50%" width="50%"/>
</li>
<li>Click <kbd>Create</kbd></li>
<li>You should now see a message saying that Infinispan datagrid was successfully created.
Click on <kbd>Close</kbd> to close the window
<br/><br/>
<img src="img/infinispan-successful.png" class="img-rounded" height="50%" width="50%"/>
</li>
<li>Go to <kbd>My project</kbd> to inspect the created Infinispan <code>datagrid</code> deployment.
If you click on the arrow next to <code>datagrid</code>, you'll get more information about the deployment.
<br/><br/>
<img src="img/datagrid-deployment.png" class="img-rounded" height="75%" width="75%"/>
<br/><br/>
Once expanded, you should see expanded information about the Infinispan deployment:
<br/><br/>
<img src="img/datagrid-deployment-expanded.png" class="img-rounded" height="75%" width="75%"/>
<br/><br/>
Assuming you set number of instances to 3, you should see the number of pods being 3 and the ring being blue.
<br/><br/>
<div class="alert alert-warning">
If part of the ring is orange, that means that some of pods are not ready yet.
Just give it some time and it should turn blue eventually.
<br/><br/>
If part of the ring is red, it means that one or more of the pods failed to start.
To see the logs, go to <code>Applications / Pods</code>, select the pod that's failed to start and inspect the <code>Logs</code> tab.
</div>
The deployment should also saw the networking services it exposes.
In the case of Infinispan, these are:
<code>datagrid-hotrod</code> service on port <code>11222</code>,
<code>datagrid-http</code> service on port <code>8080</code>,
<code>datagrid-management</code> service on port <code>9990</code>.
<br/><br/>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is the <code>datagrid-hotrod</code> service?</h4>
This is the binary endpoint exposed by Infinispan to interact with data.
Hot Rod is the name of the binary protocol in use by this service.
As well as supporting data interaction, the protocol embeds cluster topology information.
This means that clients can react to backend topology changes without the need of any manual intervention.
Moreover, clients talking the Hot Rod protocol can apply the same consistent hash techniques used by the server.
What this means is that clients looking for a particular piece of data can figure out which server will contain the data they're looking for.
<hr/>
<h4 class="alert-heading">What is the <code>datagrid-http</code> service?</h4>
This is HTTP REST endpoint exposed by Infinispan to interact with data.
It's functionality is similar to the <code>datagrid-hotrod</code> service but over HTTP.
However, the REST protocol does not expose topology information and hence it's not as powerful as <code>datagrid-hotrod</code> one.
Also, the REST protocol is currently text based so it's slower <code>datagrid-hotrod</code>.
Its main advantage is that it offers a quick and easy way to store/retrieve data without the need to have a client that understands the Hot Rod protocol.
This service won't be used during the workshop.
<hr/>
<h4 class="alert-heading">What is the <code>datagrid-management</code> service?</h4>
This service exposes the Infinispan management console via the route shown.
You can access this console using the credentials you set when the Infinispan datagrid was created
(if you followed these instructions, that'd be <code>developer/develper</code>).
It allows you monitor and manage Infinispan specific parameters, number of entries stored, read/write ratios...etc.
On top of that, it exposes some management operations, such as creating caches.
</div>
</li>
</ol>
</section>
<section id="deploy-all-visualizer">
Next, lets deploy all the components of the workshop and once they're ready you'll explore the Infinispan visualizer:
<ol>
<li>Deploy all the components for the workshop with <code>./deploy-all.sh</code></strong></li>
<li>Check the <strong>Infinispan Visualizer</strong> in:
<br/><br/>
<code>
<a href="http://datagrid-visualizer-myproject.apps.streaming-data-workshop.{OPENSHIFT_MASTER_IP}.nip.io/infinispan-visualizer/">
http://datagrid-visualizer-myproject.apps.streaming-data-workshop.{OPENSHIFT_MASTER_IP}.nip.io/infinispan-visualizer
</a>
</code>
<br/><br/>
A link can to the datagrid visualizer host can also be found via the OpenShift console.
Once you have the link, you simply have to add <code>infinispan-visualizer</code> at the end.
<br/><br/>
<img src="img/visualizer-address.png" class="img-rounded" height="100%" width="100%"/>
<br/><br/>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is the Infinispan datagrid visualizer?</h4>
This is just a tool used for Infinispan users to get a visual representation of the data grid.
Each big coloured ball represents an Infinispan node in the cluster.
Below each ball you can see the IP and port information for each of nodes.
On top of that, when data gets stored in the data grid, you will see more little balls appearing around each big ball.
These little balls represent data items stored in each node.
As more data gets loaded, you should see more and more little balls appearing.
</div>
<p>The image here corresponds to the state of 3 instances without data.</p>
<img src="img/infinispan-visu.png" alt="visualizer" class="img-rounded" height="75%" width="75%"/>
</li>
</ol>
</section>
<h2>Ex2 :: Collect data</h2>
Now that the OpenShift is running and Infinispan data grid has been created it's time to start loading data into it.
<br/><br/>
This workshop works with two streams of data: train timetable data and train positions.
In this exercise, you will need to complete the work so that the train timetable information can be read from a file and injected into Infinispan data grid.
The code you write will run within a partially developed Vert.x Verticle, which will run as a separate deployment on OpenShift.
Reading from the file will be done in a reactive way using Vert.x and RxJava2.
As well as reading the data, you will need to transform it slightly so that it can be stored remotely into Infinispan data grid.
<br/><br/>
<div class="alert alert-info" role="alert">
Remember that in a real-life scenario, you would directly consume the data streams directly from their sources.
However, since availability of the data streams cannot be guaranteed and to avoid potential overuse of these APIs, data from these streams has been captured in files that are available as part of the workshop.
</div>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is a Vert.x Verticle?</h4>
A verticle is the unit of deployement for Vert.x applications.
When you deploy a verticle, Vert.x assigns an event loop which will handle all of this verticle events.
From a code perspective, a verticle is just a class which extends <code>io.vertx.AbstractVerticle</code>.
<hr/>
<h4 class="alert-heading">Why should I write reactive code with Vert.x?</h4>
Vert.x is a great platform to build reactive applications on the JVM because:
<ul>
<li>it allows to scale easily across all the CPU cores by deploying multiple instances of your verticle</li>
<li>it provides facilities to re-use all the existing libraries from the JVM ecosystem, even the synchronous and blocking ones.</li>
</ul>
<hr/>
<h4 class="alert-heading">What is Reactive Streams and RxJava2? How does it relate to Vert.x?</h4>
<p style="font-style: italic">
Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure.
This encompasses efforts aimed at runtime environments (JVM and JavaScript) as well as network protocols.
<span style="font-weight: bold; display: block; text-align: right;">http://www.reactive-streams.org/</span>
</p>
<p>
So Reactive streams on its own is just about exchanging data. RxJava however is about transforming, composing and coordinating flows of data or events.
For example, you can transform a stream of <code>Integer</code> to their <code>String</code> form with the
<code>map</code> operator:
<pre><code class="java">Flowable<String> = integerFlowable.map(integer -> integer.toString());</code></pre>
RxJava is reactive streams compliant since RxJava2.
</p>
<p>
Vert.x provides a so-called <span style="font-style: italic">rxified</span> API.
This means that you can work with Vert.x modules using RxJava constructs (<code>Flowable</code>,
<code>Single</code>, ...etc) instead of callbacks.
Let's take an example. Here is how you would send an HTTP request with the callback-based API:
<pre><code class="java">
WebClient client = WebClient.create(vertx);
client.get(8080, host, uri)
.as(BodyCodec.string())
.send(ar -> {
// Handle the result
});
</code></pre>
And the equivalent with the rxified API:
<pre><code class="java">
WebClient client = WebClient.create(vertx);
Single<HttpResponse<String>> responseSingle = client.get(8080, host, uri)
.as(BodyCodec.string())
.rxSend();
</code></pre>
Note that in the first example, the request is sent as soon as you invoke <code>send</code>.
However, in the second example, nothing happens when you invoke <code>rxSend</code>.
Only when you <span style="font-style: italic">subscribe</span> to the <code>Single</code> the request is sent.
</p>
<hr/>
<h4 class="alert-heading">How do I store data remotely into Infinispan in a reactive way?</h4>
Infinispan stores data into caches, and if using Hot Rod protocol from Java, these are represented as instances of <code>RemoteCache</code>.
These caches store data as key/value pairs.
Once you have a <code>RemoteCache</code> instance, you can call <code>putAsync(key, value)</code> method to store data.
This method returns a <code>CompletableFuture</code> which provides an asynchronous handle for working with the result of the operation.
</div>
<h3>Architecture Diagram</h3>
In this exercise, you work will focus on the <code>stations-injector</code> module.
This module is in charge of taking the Transport OpenData information on train timetables from station boards around the country and feeding it into the Infinispan data grid.
<br/><br/>
There is also a module called <code>positions-injector</code> module which injects train positions, but this module has already been completed.
<br/><br/>
Finally, there is a <code>workshop-main</code> module which acts as gateway for the injector modules.
It's primary role is to set up everything for the injectors to do their job.
For example, this module creates the Infinispan caches where the injectors will store data.
<br/><br/>
<img src="img/collectionLayer.png" class="img-rounded" height="75%" width="75%">
<h3>Exercise</h3>
<section id="inject">
<ul style="list-style: decimal;">
<li>Open <kbd>workshop.stations.StationsInjector</kbd> class in <code>stations-injector</code> module</li>
<li>Implement the TODOs found in the <code>inject</code> method:
<ul>
<li><kbd>TODO 1</kbd> Map each entry into a tuple of String/Stop with <code>StationsInjector::toEntry</code>
<br/><a class="btn btn-info" role="button" onclick="$('#array-sol-13').toggle();">Show/Hide Solution</a>
<div id="array-sol-13" style="display:none;">
<pre><code class="java">Flowable<Map.Entry<String, Stop>> pairFlowable = fileFlowable.map(StationsInjector::toEntry);</code></pre></div>
</li>
<li><kbd>TODO 2</kbd> For each entry, store it in the stations cache calling <code>putAsync</code>
<br/><a class="btn btn-info" role="button" onclick="$('#array-sol-14').toggle();">Show/Hide Solution</a>
<div id="array-sol-14" style="display:none;">
<pre><code class="java">CompletableFuture<Stop> putCompletableFuture = stations.putAsync(e.getKey(), e.getValue());</code></pre></div>
</li>
</ul>
<a class="btn btn-danger" role="button" onclick="$('#array-sol-3').toggle();">Show/Hide Final Solution</a>
<div id="array-sol-3" style="display:none;">
<div class="alert alert-info" role="alert">
Solution for this exercise can be found in:
<kbd>stations-injector/src/main/solution/workshop/stations/StationsInjector.java</kbd>
<br/><br/>
To run it, call: <code>mvn fabric8:deploy -Psolution</code>
</div>
</div>
</li>
<li>Redeploy after making the code changes:
<br/><br/>
<pre><code>> cd stations-injector
> mvn fabric8:deploy</code></pre>
</li>
<li>Force the injection to by visiting the URL below or executing:
<br/><br/>
<pre><code>> curl http://workshop-main-myproject.apps.streaming-data-workshop.{OPENSHIFT_MASTER_IP}.nip.io/inject</code></pre>
</li>
<li>Go to the visualizer and see data being loaded.
<br/><br/>
<div class="alert alert-warning">
If you don't see data being loaded, make sure <code>station-boards</code> cache selected in the drop down of the visualizer.
It might take a few seconds for data to appear in the visualizer, so be patient :)
</div>
<p>The image here corresponds to the state of 3 instances having data.</p>
<img src="img/infinispan-visu-loading.png" class="img-rounded" height="75%" width="75%"/>
</li>
</ul>
<br/>
</section>
<h2>Ex3 :: Transport data</h2>
With the data loaded into Infinispan data grid, it's time to extract meaningful data out of it.
In this exercise, you will develop a Java FX based dashboard for tracking delayed trains around the country.
To do that, you will use the data from the train timetables that you've fed into the <code>station-boards</code> cache, and you will create an Infinispan Continuous Query to detect delayed trains.
For each delayed train that you found, you will need to push it to the Vert.x Event Bus.
The event bus will then push the each delayed train over Sock.js and WebSockets to the JavaFX dashboard.
The JavaFX dashboard will run outside OpenShift and can be started from your favourite IDE.
<br/><br/>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is Infinispan Continuous Query?</h4>
Data stored into Infinispan can be queried remotely via a query builder API.
Queries are sent to Infinispan returning a result set of key/value pairs that match that query.
Moreover, Infinispan offers the possibility to attach a live-updating listener to the query.
What this means is that instead of getting a result set and iterating over it, you can a callback into the listener for each matching entry.
Also, if after executing the query new data is inserted that matches the query, the listener will be invoked with each entry that matches the query.
On the other hand, if data that was part of the result set stops being stored in Infinispan, the listener can also get invoked.
This combination of query and listener is what Infinispan calls Continuous Query.
<hr/>
<h4 class="alert-heading">How do I create an Infinispan Continuous Query?</h4>
To be able to execute an Infinispan Continuous Query, you first need to get Infinispan to index the data.
For this workshop, all the steps required for the data to be indexed have been done for you.
This means that as data was being loaded into Infinispan data grid, it was already being indexed.
If you want to find out more on how to get data indexed, check
<a href="http://infinispan.org/docs/stable/user_guide/user_guide.html#query.remote">Infinispan's user guide</a>
and the workshop code that is part of the <code>data-model</code> module.
<br/><br/>
With the data indexed, the next step is to create the query.
To create a query, you need to get a reference to <code>RemoteCache</code> and then create a <code>QueryFactory</code> for it.
Once you have it, use the query builder API to create the query, here's an example:
<br/><br/>
<pre><code lang="java">Query query = queryFactory.from(Book.class)
.having("title").like("%Infinispan%")
.build();</code></pre>
Next, you need to create an instance of <code>ContinuousQueryListener</code> and decide which of the callbacks you'll implement:
<code>resultJoining</code>, <code>resultUpdated</code> and/or <code>resultLeaving</code>.
Once you have the query and listener, you need to bind them together.
To do that, retrieve an instance of <code>ContinuousQuery</code> and bind the query and listener together:
<br/><br/>
<pre><code lang="java">continuousQuery.addContinuousQueryListener(query, listener)</code></pre>
<hr/>
<h4 class="alert-heading">What is the Vert.x Event Bus?</h4>
<p>
The event bus allows different parts of your application to communicate with each other.
It can be clustered so it doesn't matter if your services are in the same of different Vert.x instances.
It can even be bridged to allow client side JavaScript running in a browser to communicate on the same event bus.
</p>
<p>
So the event bus forms a distributed peer-to-peer messaging system spanning multiple server nodes and multiple browsers.
</p>
<p>
The event bus supports publish/subscribe, point to point, and request-response messaging.
The event bus API is very simple. It basically involves registering handlers, unregistering handlers and sending and publishing messages.
</p>
<hr/>
<h4 class="alert-heading">How do I push events into the Vert.x Event Bus?</h4>
There is a single event bus instance for every Vert.x instance and it is obtained using the method
<code>eventBus</code> method:
<pre><code class="java">EventBus eventBus = vertx.eventBus();</code></pre>
Publishing a message is simple. Just use <code>publish</code> specifying the address to publish it to:
<pre><code class="java">eventBus.publish("news.uk.sport", "Yay! Someone kicked a ball");</code></pre>
That message will then be delivered to all handlers registered against the <code>news.uk.sport</code> address.
</div>
<h3>Architecture Diagram</h3>
The focus of this exercise will be to implement the missing code fragments in <code>delayed-listener</code> module.
This module is in charge of registering the Infinispan Continuous Query and pushing the changes over the Vert.x Event Bus so that the Java FX dashboard running outside gets updated.
<br/><br/>
The JavaFX dashboard, which runs outside of OpenShift, connects to the <code>delayed-listener</code> via HTTP and registers a WebSocket listener.
Upon connection, <code>delayed-listener</code> module kickstarts data injection by calling the <code>workshop-main</code> module, and registers the continuous query.
As data gets injected, data that matches the continuous query gets pushed to the <code>delayed-listener</code> module which in turn ships it to the JavaFX dashboard.
<br/><br/>
<img src="img/transportLayer.png" class="img-rounded" height="75%" width="75%">
<h3>Exercise</h3>
<section id="transport">
<ul style="list-style: decimal;">
<li>Open <kbd>workshop.delayed.DelayedListener</kbd> class in the <kbd>delayed-listener</kbd> module</li>
<li>Implement the TODOs found in the <code>addContinuousQuery</code> method:
<ul>
<li><kbd>TODO 1</kbd> Create query for <code>Stop</code> where <strong>delayMin</strong> is bigger than 0 using the <code>QueryFactory</code>
<br/><a class="btn btn-info" role="button" onclick="$('#array-sol-17').toggle();">Show/Hide Solution</a>
<div id="array-sol-17" style="display:none;">
<pre><code class="java"> Query query = queryFactory.from(Stop.class).having("delayMin").gt(0L).build();</code></pre></div>
</li>
<li><kbd>TODO 2</kbd> Implement the <code>resultJoining</code> method so that it publish each new Stop as JSON to the <strong>"delayed-trains"</strong> address via Vert.x eventbus
<br/><a class="btn btn-info" role="button" onclick="$('#array-sol-18').toggle();">Show/Hide Solution</a>
<div id="array-sol-18" style="display:none;">
<pre><code class="java"> vertx.eventBus().publish("delayed-trains", stopAsJson);</code></pre></div>
</li>
<li><kbd>TODO 3</kbd> Finish the <code>addContinuousQuery</code> method joining the query with continuous query listener
<br/><a class="btn btn-info" role="button" onclick="$('#array-sol-19').toggle();">Show/Hide Solution</a>
<div id="array-sol-19" style="display:none;">
<pre><code class="java"> continuousQuery.addContinuousQueryListener(query, listener);</code></pre></div>
</li>
</ul>
<a class="btn btn-danger" role="button" onclick="$('#array-sol-4').toggle();">Show/Hide Final Solution</a>
<div id="array-sol-4" style="display:none;">
<div class="alert alert-info" role="alert">
Solution for this exercise can be found in:
<kbd>delayed-listener/src/main/solution/workshop/delayed/DelayedListener.java</kbd>
<br/><br/>
To run it, call: <code>mvn fabric8:deploy -Psolution</code>
</div>
</div>
</li>
<li>Redeploy after making the code changes:
<br/><br/>
<pre><code>> cd delayed-listener
> mvn fabric8:deploy</code></pre>
</li>
<li>Go to <kbd>delayed-dashboard</kbd> module and locate <code>dashboard.DelayedDashboard</code>.
<br/><br/>
Launch this class passing in <code>-Dhttp.host=</code> pointing to the route for the <kbd>delayed-listener</kbd> endpoint:
<br/><br/>
<pre><code class="java">-Dhttp.host=delayed-listener-myproject.apps.streaming-data-workshop.${OPENSHIFT_MASTER_IP}.nip.io</code></pre>
</li>
<li>Monitor the delayed trains!
<br/><br/>
<img src="img/delayed-trains-dashboard.png" class="img-rounded" height="50%" width="50%">
</li>
</ul>
</section>
<h2>Ex4 :: Delayed trains positions</h2>
The aim of this exercise is to combine delayed train information with train positions information.
In this exercise, you will complete a Google Maps based application that displays locations of delayed trains.
As a hypothetical use case, such application could be used to locate problematic areas in the country, maybe as a result of train congestion?
<br/><br/>
In this first part of this exercise, you will have to make sure you can regularly push delayed train position updates to the Google Maps based application.
The second part of the exercise will consist of matching of delayed train information that was displayed in the dashboard with the train position information.
This matching will be done using Infinispan Ickle query language.
<br/><br/>
<div class="alert alert-success" role="alert">
<h4 class="alert-heading">What is Ickle?</h4>
Ickle is Infinispan new query language introduced in Infinispan 9.
It's a light, small subset of JP-QL making it familiar for the users.
<hr/>
<h4 class="alert-heading">What is the difference between query builder API and Ickle?</h4>
The query builder API was originally introduced in Infinispan 6 at a time where the main focus was Java users.
So, the builder API was very much tailored for Java users.
As Infinispan has moved on to support other languages, this builder API was difficult to replicate in other languages.
So, it was decided to create a text-based query language that would be more easily portable.
Ickle queries support all of the query builder features with the addition of full-text matching.
If you want to learn more about the motivations of Ickle, make sure you check out
<a href="http://blog.infinispan.org/2016/12/meet-ickle.html">this blog post</a>.
<hr/>
<h4 class="alert-heading">What does an Ickle query look like?</h4>
Here is an example Ickle query.
It retrieves transaction id, amount and description for transactions with amount bigger than 10 and a matching description.
<br/><br/>
<pre><code>select tx.transactionId, tx.amount, tx.description from com.acme.Transaction tx
where amount > 10 and description = :desc</code></pre>
The parameter <code>desc</code> is replaced using <code>setParameter</code> method on <code>Query</code> class.<br/>
</div>
<h3>Architecture Diagram</h3>
The focus of this exercise will be to implement the missing code fragments in <code>delayed-trains</code> module.
<br/><br/>
<img src="img/InMemoryLayer.png" class="img-rounded" height="75%" width="75%">
<br/><br/>
This exercise assumes that all the data injectors are complete.
The delayed train dashboard is not strictly necessary for this exercise, but starting it up is an easy way to get data injection started.
On top of that, it also guarantees that delay train information is flowing.
So, it is recommended that you use it again here.
<br/><br/>
In the previous exercise, you might have noticed that after sending delayed train info via the Vert.x event bus to the dashboard, the train info was being stored in an Infinispan cache.
For this exercise, the <code>delayed-trains</code> module contains a Vert.x Verticle which locally caches any delayed trains added to this Infinispan cache.
The Google Maps front-end, which is already developed for you, connects to this Vert.x verticle to receive delayed train positions.
<br/><br/>
In this first part of this exercise, you will have to make sure you can regularly push delayed train position updates using the Vert.x Event Bus.
The path through which Vert.x Event Bus will push changes to the Google Maps application will be <code>/eventbus</code>.
The second part of the exercise will consist of matching of delayed train information and the train positions using Infinispan Ickle querys.
<br/><br/>
<div class="alert alert-info" role="alert">
<h4 class="alert-heading">Matching delayed trains and their positions</h4>
As it often happens with real life data streams, the two data streams don't easily match up.
When you look at the train timetable information, which comes from individual stations, you know that for example, a train on route <code>S 9</code> is delayed.
However, in a given train route, there might be multiple trains doing the route at one give point in time.
So, when the train timetable information says that a train on route <code>S 9</code> is delayed, you don't exactly know which of all the trains doing the route is exactly.
The train position data stream does gives you the individual train identifier, but since is not given by the train timetable data stream, there's no direct link.
For this workshop, the code provided for you simply takes the first matching train with a given route name from the train positions data stream.
This is of course not accurate but it's good enough for the workshop.
In real life, given that each train stations geo location is known and you know the geo locations of each train, you could potentially figure out the link using geo location approximation.
</div>
<div class="alert alert-danger" role="alert">
If running this workshop completely offline, Google Maps won't be available.
So, to be able to complete the exercise without the need of Google Maps, there is URL in path <code>/positions</code> which can be accessed to retrieve the delayed train positions.
The output should resemble something like this:
<br/><br/>
<pre><code>train_id train_category train_name train_lastStopName position_lat position_lng position_bearing
84/26411/18/24/95 R R 7377 Sonceboz-Sombeval 47.194977 7.166263 200.0
84/26701/18/20/95 R R 11065 Coppet 46.316335 6.187086 50.0
...
</code></pre>
</div>
<section id="inmemory">
<ol>
<li>
Open <code>web-viewer/config/default.json</code> and change eventbus hostname to match the one on OpenShift.
<br/><br/>
<pre><code>"eventbus": "http://delayed-trains-myproject.apps.streaming-data-workshop.${OPENSHIFT_MASTER_IP}.nip.io/eventbus"</code></pre>
<div class="alert alert-warning">
Remember to replace <code>${OPENSHIFT_MASTER_IP}</code> with the correct value!
</div>
</li>
<li>Start train positions visualizer:
<br/><br/>
<pre><code>> cd web-viewer
> ./start.sh</code></pre>
</li>
<li>Open <kbd>workshop.trains.DelayedTrains</kbd> class in the <kbd>delayed-trains</kbd> module</li>
<li>Implement the TODOs found in the code:
<ul>
<li><kbd>TODO 1</kbd> Modify the EventBus bridge <code>options</code> to allow sending messages to the browser.
Use <code>addOutboundPermitted</code> method to add a new instance of <code>PermittedOptions</code> for
<code>DELAYED_TRAINS_POSITIONS_ADDRESS</code>.<br/>
<a class="btn btn-info" role="button" onclick="$('#array-sol-26').toggle();">Show/Hide Solution</a>
<div id="array-sol-26" style="display:none;">
<pre><code class="java">options.addOutboundPermitted(new PermittedOptions().setAddress(DELAYED_TRAINS_POSITIONS_ADDRESS));</code></pre>
</div>
</li>
<li><kbd>TODO 2</kbd> Use <code>eventBus.publish()</code> method to publish the asynchronous result to the
<code>DELAYED_TRAINS_POSITIONS_ADDRESS</code> address<br/>
<a class="btn btn-info" role="button" onclick="$('#array-sol-25').toggle();">Show/Hide Solution</a>
<div id="array-sol-25" style="display:none;">
<pre><code class="java">vertx.eventBus().publish(DELAYED_TRAINS_POSITIONS_ADDRESS, ar.result());</code></pre>
</div>
</li>
<li><kbd>TODO 3</kbd> Create an Ickle query to get trains ids for all trains positions with a given train name.<br/>
<a class="btn btn-info" role="button" onclick="$('#array-sol-23').toggle();">Show/Hide Solution</a>
<div id="array-sol-23" style="display:none;">
<pre><code class="java">Query query = queryFactory.create("select tp.trainId from workshop.model.TrainPosition tp where name = :trainName");
query.setParameter("trainName", trainName);</code></pre>
</div>
</li>
<li><kbd>TODO 4</kbd> Call the query and get the result<br/>
<a class="btn btn-info" role="button" onclick="$('#array-sol-24').toggle();">Show/Hide Solution</a>
<div id="array-sol-24" style="display:none;">
<pre><code class="java">List< Object[]> trains = query.list();</code></pre>
</div>
</li>
</ul>
<div>
<a class="btn btn-danger" role="button" onclick="$('#array-sol-5').toggle();">Show/Hide Solution</a>
<div id="array-sol-5" style="display:none;">
<div class="alert alert-info" role="alert">
Solution for this exercise can be found in:
<kbd>delayed-trains/src/main/solution/workshop/trains/DelayedTrains.java</kbd>
<br/><br/>
To run it, call: <code>mvn fabric8:deploy -Psolution</code>
</div>
</div>
</div>
</li>
<li>Redeploy after making the code changes:
<br/><br/>
<pre><code>> cd delayed-trains
> mvn fabric8:deploy</code></pre>
</li>
<li>Go to <kbd>delayed-dashboard</kbd> module and re-launch <code>dashboard.DelayedDashboard</code> class from your IDE.
This will restart train data injectors.
Verify that you still see delayed trains appearing in the dashboard!
</li>
<li>Go to the <a href="http://localhost:3000/" target="_target">front-end app</a>, you should see small dots representing trains moving around.
<br/><br/>
<img src="img/delayed-trains-map.png" class="img-rounded" height="75%" width="75%">
</li>
<li>
Finally, please delete the OpenShift cluster from the Google Cloud Shell:
<br/><br/>
<pre><code>> cd streaming-data-workshop/openshifter
> docker run -e -ti -v `pwd`:/root/data docker.io/osevg/openshifter destroy streaming-data-workshop</code></pre>
If any timeouts happen while destroying OpenShift, call <code>delete-openshift.sh</code> script from the same folder.
</li>
</ol>
</section>
</div> <!-- /container -->
<br/><br/>
<script type="text/javascript" src="js/jquery-3.2.1.min.js"></script>
</body></html>