Skip to content

Commit a24d7fe

Browse files
authored
Merge pull request #3 from life-research/add-transactor-metrics
Add transactor metrics
2 parents 16ab9e3 + 97eff59 commit a24d7fe

File tree

7 files changed

+944
-4
lines changed

7 files changed

+944
-4
lines changed

README.md

Lines changed: 345 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,346 @@
11
# datomic-tx-metrics
2-
Collecting Datomic Transactor + JVM metrics for consumption using a web endpoint (e.g. by Prometheus).
2+
Collecting Datomic Transactor + JVM metrics for consumption by [Prometheus](https://prometheus.io/) by offering a web endpoint.
3+
4+
## How does it work?
5+
6+
### Registering the metrics collector at the transactor.
7+
Add the following line to your `transactor.properties` file:
8+
9+
```
10+
metrics-callback=datomic-tx-metrics.core/tx-metrics-callback-handler
11+
```
12+
13+
Next ensure that the JAR (you can download one in the release section) is present within Datomic's `/lib` directory to be loaded at runtime. An example leveraging a Docker container can be found in the examples section of this repository.
14+
15+
### Configuring the metrics collector
16+
17+
* specifiy the port of the web server fired up by the metrics collector using the environment variable `METRICS_PORT` when starting the transactor (defaults to _11509_)
18+
19+
### Scraping Metrics
20+
21+
* Metrics collector fires up a web server when loaded by the transactor
22+
* Metrics are typically sent from the transactor to the callback function every `10s` (keep this in mind since values might not change over this duration).
23+
* Scrape the collected metrics by requesting the started metrics web server under the `/metrics` endpoint
24+
25+
The following is an exemplary Prometheus configuration file for scraping the metrics endpoint:
26+
27+
```yaml
28+
global:
29+
scrape_interval: 15s
30+
scrape_timeout: 10s
31+
evaluation_interval: 15s
32+
alerting:
33+
alertmanagers:
34+
- static_configs:
35+
- targets: []
36+
scheme: http
37+
timeout: 10s
38+
api_version: v1
39+
scrape_configs:
40+
- job_name: prometheus
41+
honor_timestamps: true
42+
scrape_interval: 15s
43+
scrape_timeout: 10s
44+
metrics_path: /metrics
45+
scheme: http
46+
static_configs:
47+
- targets:
48+
- localhost:9090
49+
- job_name: datomic-tx-metrics
50+
scrape_interval: 10s
51+
scrape_timeout: 5s
52+
metrics_path: /metrics
53+
scheme: http
54+
static_configs:
55+
- targets:
56+
- localhost:11509
57+
```
58+
_**Note:** adjust the target according to your own deployment state._
59+
60+
61+
## What JVM metrics are covered?
62+
63+
The following JVM metrics are covered as defined by [Prometheus Hotspot](https://github.com/prometheus/client_java/tree/parent-0.5.0/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot):
64+
65+
* Standard Exports
66+
* MemoryPoolsExports
67+
* GarbageCollectorExports
68+
* ThreadExports
69+
* ClassLoadingExports
70+
* VersionInfoExports
71+
72+
## What Datomic transactor metrics are covered?
73+
74+
The following CloudWatch metrics that can be created by the transactor are supported:
75+
76+
- [x] Alarm
77+
- [x] AlarmIndexingJobFailed
78+
- [x] AlarmBackPressure
79+
- [x] AlarmUnhandledException
80+
- [x] Alarm{AnythingElse}
81+
- [x] AvailableMB
82+
- [ ] ClusterCreateFS
83+
- [x] CreateEntireIndexMsec
84+
- [x] CreateFulltextIndexMsec
85+
- [x] Datoms
86+
- [x] DBAddFulltextMsec
87+
- [ ] FulltextSegments
88+
- [x] GarbageSegments
89+
- [x] HeartbeatMsec
90+
- [x] HeartbeatMsec (samples)
91+
- [ ] HeartMonitorMsec
92+
- [x] IndexDatoms
93+
- [x] IndexSegments
94+
- [x] IndexWrites
95+
- [x] IndexWriteMsec
96+
- [ ] LogIngestBytes
97+
- [ ] LogIngestMsec
98+
- [x] LogWriteMsec
99+
- [ ] Memcache
100+
- [x] MemoryIndexMB
101+
- [ ] MetricReport
102+
- [ ] ObjectCache
103+
- [ ] MemcachedPutMsec
104+
- [ ] MemcachedPutFailedMsec
105+
- [x] RemotePeers
106+
- [x] StorageBackoff (total time per transactor metric report)
107+
- [x] StorageBackoff (total number of retries)
108+
- [x] Storage{Get,Put}Bytes (throughput per transactor metric report)
109+
- [x] Storage{Get,Put}Bytes (operations count per transactor metric report)
110+
- [x] Storage{Get,Put}Msec
111+
- [x] TransactionBatch
112+
- [x] TransactionBytes (total volume of transaction data to log, peers)
113+
- [x] TransactionDatoms (total datoms transacted)
114+
- [x] TransactionDatoms (total transactions)
115+
- [x] TransactionMsec (total time spent on transactions)
116+
- [ ] Valcache
117+
- [ ] ValcachePutMsec
118+
- [ ] ValcachePutFailedMsec
119+
120+
The following additional metrics are calculated based on the metrics stated above:
121+
122+
- [x] Object Cache Hit Ratio
123+
124+
125+
## Example Metrics
126+
127+
```
128+
# HELP datomic_remote_peers Number of remote peers connected.
129+
# TYPE datomic_remote_peers gauge
130+
datomic_remote_peers 1.0
131+
# HELP datomic_index_segments Number of segments in the index.
132+
# TYPE datomic_index_segments gauge
133+
datomic_index_segments 30732.0
134+
# HELP datomic_index_creation_msec Time to create index in msec, reported at end of indexing job.
135+
# TYPE datomic_index_creation_msec gauge
136+
datomic_index_creation_msec 29960.0
137+
# HELP datomic_storage_backoff_retries_total Total number of retried storage operations.
138+
# TYPE datomic_storage_backoff_retries_total counter
139+
datomic_storage_backoff_retries_total 0.0
140+
# HELP datomic_heartbeats Number of heartbeats.
141+
# TYPE datomic_heartbeats gauge
142+
datomic_heartbeats 12.0
143+
# HELP datomic_storage_read_msec Time spent reading from storage.
144+
# TYPE datomic_storage_read_msec gauge
145+
datomic_storage_read_msec 4.0
146+
# HELP datomic_transactions_total Total number of transactions.
147+
# TYPE datomic_transactions_total counter
148+
datomic_transactions_total 1789.0
149+
# HELP datomic_index_fulltext_creation_msec Time to create fulltext portion of index in msec.
150+
# TYPE datomic_index_fulltext_creation_msec gauge
151+
datomic_index_fulltext_creation_msec 0.0
152+
# HELP datomic_storage_write_bytes_total Total number of bytes written to storage.
153+
# TYPE datomic_storage_write_bytes_total counter
154+
datomic_storage_write_bytes_total 2.14121767E8
155+
# HELP datomic_transactions_add_fulltext_msec_total Total time of transactions spent to add fulltext.
156+
# TYPE datomic_transactions_add_fulltext_msec_total counter
157+
datomic_transactions_add_fulltext_msec_total 161.0
158+
# HELP datomic_object_cache_hits_ratio Datomic object cache hit ratio.
159+
# TYPE datomic_object_cache_hits_ratio gauge
160+
datomic_object_cache_hits_ratio 0.9966473960079482
161+
# HELP datomic_transactions_write_log_msec_total Total time of transactions spent writing to log per transaction batch.
162+
# TYPE datomic_transactions_write_log_msec_total counter
163+
datomic_transactions_write_log_msec_total 12408.0
164+
# HELP datomic_alarms_indexing_job_failed Number of alarms related to the indexing job.
165+
# TYPE datomic_alarms_indexing_job_failed gauge
166+
datomic_alarms_indexing_job_failed 0.0
167+
# HELP datomic_transacted_bytes_total Total volume of transaction data to log, peers in bytes.
168+
# TYPE datomic_transacted_bytes_total counter
169+
datomic_transacted_bytes_total 8.1213863E7
170+
# HELP datomic_alarms_backpressure Number of alarms related to the transactor using back pressure.
171+
# TYPE datomic_alarms_backpressure gauge
172+
datomic_alarms_backpressure 0.0
173+
# HELP datomic_transactions_batch Number of transactions batched into a single write to the log.
174+
# TYPE datomic_transactions_batch gauge
175+
datomic_transactions_batch 356.0
176+
# HELP datomic_heartbeats_msec Time spent writing to storage as part of the heartbeat (transactor writes location).
177+
# TYPE datomic_heartbeats_msec gauge
178+
datomic_heartbeats_msec 60004.0
179+
# HELP datomic_transacted_datoms_total Number of transacted datoms.
180+
# TYPE datomic_transacted_datoms_total counter
181+
datomic_transacted_datoms_total 4690819.0
182+
# HELP datomic_index_datoms Number of datoms stored by the index, all sorts.
183+
# TYPE datomic_index_datoms gauge
184+
datomic_index_datoms 1.58230545E8
185+
# HELP datomic_object_cache_size Number of segments in the Datomic object cache.
186+
# TYPE datomic_object_cache_size gauge
187+
datomic_object_cache_size 4431.0
188+
# HELP datomic_storage_write_operations_total Total number of storage write operations.
189+
# TYPE datomic_storage_write_operations_total counter
190+
datomic_storage_write_operations_total 13143.0
191+
# HELP datomic_transactions_msec_total Total time of transactions in msec.
192+
# TYPE datomic_transactions_msec_total counter
193+
datomic_transactions_msec_total 1113417.0
194+
# HELP jvm_info JVM version info
195+
# TYPE jvm_info gauge
196+
jvm_info{version="1.8.0_222-b10",vendor="Oracle Corporation",runtime="OpenJDK Runtime Environment",} 1.0
197+
# HELP jvm_threads_current Current thread count of a JVM
198+
# TYPE jvm_threads_current gauge
199+
jvm_threads_current 70.0
200+
# HELP jvm_threads_daemon Daemon thread count of a JVM
201+
# TYPE jvm_threads_daemon gauge
202+
jvm_threads_daemon 33.0
203+
# HELP jvm_threads_peak Peak thread count of a JVM
204+
# TYPE jvm_threads_peak gauge
205+
jvm_threads_peak 73.0
206+
# HELP jvm_threads_started_total Started thread count of a JVM
207+
# TYPE jvm_threads_started_total counter
208+
jvm_threads_started_total 93.0
209+
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
210+
# TYPE jvm_threads_deadlocked gauge
211+
jvm_threads_deadlocked 0.0
212+
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
213+
# TYPE jvm_threads_deadlocked_monitor gauge
214+
jvm_threads_deadlocked_monitor 0.0
215+
# HELP datomic_index_writes Number of segments written by indexing job, reported at end.
216+
# TYPE datomic_index_writes gauge
217+
datomic_index_writes 2310.0
218+
# HELP datomic_available_ram_megabytes Unused RAM on transactor in MB.
219+
# TYPE datomic_available_ram_megabytes gauge
220+
datomic_available_ram_megabytes 1480.0
221+
# HELP datomic_storage_backoff_msec Time spent in backoff/retry around calls to storage.
222+
# TYPE datomic_storage_backoff_msec gauge
223+
datomic_storage_backoff_msec 0.0
224+
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
225+
# TYPE jvm_memory_bytes_used gauge
226+
jvm_memory_bytes_used{area="heap",} 2.806176568E9
227+
jvm_memory_bytes_used{area="nonheap",} 1.57299624E8
228+
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
229+
# TYPE jvm_memory_bytes_committed gauge
230+
jvm_memory_bytes_committed{area="heap",} 4.132962304E9
231+
jvm_memory_bytes_committed{area="nonheap",} 1.79191808E8
232+
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
233+
# TYPE jvm_memory_bytes_max gauge
234+
jvm_memory_bytes_max{area="heap",} 4.132962304E9
235+
jvm_memory_bytes_max{area="nonheap",} -1.0
236+
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
237+
# TYPE jvm_memory_bytes_init gauge
238+
jvm_memory_bytes_init{area="heap",} 4.294967296E9
239+
jvm_memory_bytes_init{area="nonheap",} 2555904.0
240+
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
241+
# TYPE jvm_memory_pool_bytes_used gauge
242+
jvm_memory_pool_bytes_used{pool="Code Cache",} 3.6131072E7
243+
jvm_memory_pool_bytes_used{pool="Metaspace",} 1.00296496E8
244+
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 2.0872056E7
245+
jvm_memory_pool_bytes_used{pool="PS Eden Space",} 8.73379536E8
246+
jvm_memory_pool_bytes_used{pool="PS Survivor Space",} 1.47745544E8
247+
jvm_memory_pool_bytes_used{pool="PS Old Gen",} 1.785055616E9
248+
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
249+
# TYPE jvm_memory_pool_bytes_committed gauge
250+
jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.6503552E7
251+
jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.16301824E8
252+
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 2.6386432E7
253+
jvm_memory_pool_bytes_committed{pool="PS Eden Space",} 1.106771968E9
254+
jvm_memory_pool_bytes_committed{pool="PS Survivor Space",} 1.6252928E8
255+
jvm_memory_pool_bytes_committed{pool="PS Old Gen",} 2.863661056E9
256+
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
257+
# TYPE jvm_memory_pool_bytes_max gauge
258+
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
259+
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
260+
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
261+
jvm_memory_pool_bytes_max{pool="PS Eden Space",} 1.106771968E9
262+
jvm_memory_pool_bytes_max{pool="PS Survivor Space",} 1.6252928E8
263+
jvm_memory_pool_bytes_max{pool="PS Old Gen",} 2.863661056E9
264+
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
265+
# TYPE jvm_memory_pool_bytes_init gauge
266+
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
267+
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
268+
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
269+
jvm_memory_pool_bytes_init{pool="PS Eden Space",} 1.073741824E9
270+
jvm_memory_pool_bytes_init{pool="PS Survivor Space",} 1.78782208E8
271+
jvm_memory_pool_bytes_init{pool="PS Old Gen",} 2.863661056E9
272+
# HELP datomic_successful_metric_reports Number of successful metric reports over a 1 min period.
273+
# TYPE datomic_successful_metric_reports gauge
274+
datomic_successful_metric_reports 1.0
275+
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
276+
# TYPE jvm_classes_loaded gauge
277+
jvm_classes_loaded 19802.0
278+
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
279+
# TYPE jvm_classes_loaded_total counter
280+
jvm_classes_loaded_total 19802.0
281+
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
282+
# TYPE jvm_classes_unloaded_total counter
283+
jvm_classes_unloaded_total 0.0
284+
# HELP datomic_index_writes_msec Time per index segment write.
285+
# TYPE datomic_index_writes_msec gauge
286+
datomic_index_writes_msec 1246.0
287+
# HELP datomic_alarms_other Number of alarms that are not related to any other specific alarm metric.
288+
# TYPE datomic_alarms_other gauge
289+
datomic_alarms_other 0.0
290+
# HELP datomic_storage_write_msec Time spent writing to storage.
291+
# TYPE datomic_storage_write_msec gauge
292+
datomic_storage_write_msec 315.0
293+
# HELP datomic_storage_read_bytes_total Total number of bytes read from storage.
294+
# TYPE datomic_storage_read_bytes_total counter
295+
datomic_storage_read_bytes_total 7814735.0
296+
# HELP datomic_garbage_segments Number of garbage segments created.
297+
# TYPE datomic_garbage_segments gauge
298+
datomic_garbage_segments 2095.0
299+
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
300+
# TYPE jvm_gc_collection_seconds summary
301+
jvm_gc_collection_seconds_count{gc="PS Scavenge",} 196.0
302+
jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 6.501
303+
jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 4.0
304+
jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 0.34
305+
# HELP datomic_memory_index_consumed_megabytes RAM consumed by memory index in MB.
306+
# TYPE datomic_memory_index_consumed_megabytes gauge
307+
datomic_memory_index_consumed_megabytes 12.0
308+
# HELP datomic_datoms Number of unique datoms in the index.
309+
# TYPE datomic_datoms gauge
310+
datomic_datoms 5.8511225E7
311+
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
312+
# TYPE process_cpu_seconds_total counter
313+
process_cpu_seconds_total 608.24
314+
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
315+
# TYPE process_start_time_seconds gauge
316+
process_start_time_seconds 1.571158208795E9
317+
# HELP process_open_fds Number of open file descriptors.
318+
# TYPE process_open_fds gauge
319+
process_open_fds 532.0
320+
# HELP process_max_fds Maximum number of open file descriptors.
321+
# TYPE process_max_fds gauge
322+
process_max_fds 1048576.0
323+
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
324+
# TYPE process_virtual_memory_bytes gauge
325+
process_virtual_memory_bytes 1.0522513408E10
326+
# HELP process_resident_memory_bytes Resident memory size in bytes.
327+
# TYPE process_resident_memory_bytes gauge
328+
process_resident_memory_bytes 3.559723008E9
329+
# HELP datomic_alarms Number of alarms/problems that have occurred.
330+
# TYPE datomic_alarms gauge
331+
datomic_alarms 0.0
332+
# HELP datomic_alarms_unhandled_exception Number of alarms related to unhandled exceptions.
333+
# TYPE datomic_alarms_unhandled_exception gauge
334+
datomic_alarms_unhandled_exception 0.0
335+
# HELP datomic_storage_read_operations_total Total number of storage read operations.
336+
# TYPE datomic_storage_read_operations_total counter
337+
datomic_storage_read_operations_total 2027.0
338+
339+
```
340+
341+
## Troubleshooting
342+
343+
__Problem:__ The transactor refuses to start because there is an error related to netty (some methods cannot be found).
344+
345+
__Solution:__ Depending on the Datomic version in use the netty version that comes with it may be too old missing methods required by this project. Resolve this issue by replacing the netty-all*.jar in Datomic's `/lib` directory with a newer one. _This is also covered by the docker example in the __examples__ section of the repository._
346+

examples/docker/Dockerfile

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
FROM openjdk:8u222-jre
2+
3+
ENV DATOMIC_VERSION 0.9.5966
4+
5+
ARG DATOMIC_ACC_USER
6+
ARG DATOMIC_ACC_PASS
7+
8+
RUN wget -q --http-user=${DATOMIC_ACC_USER} --http-password=${DATOMIC_ACC_PASS} https://my.datomic.com/repo/com/datomic/datomic-pro/$DATOMIC_VERSION/datomic-pro-$DATOMIC_VERSION.zip -O datomic-pro-$DATOMIC_VERSION.zip
9+
RUN unzip -q /datomic-pro-${DATOMIC_VERSION}.zip
10+
RUN rm /datomic-pro-${DATOMIC_VERSION}.zip
11+
RUN mv /datomic-pro-${DATOMIC_VERSION} /datomic
12+
13+
ENV DATOMIC_TX_METRICS_VERSION 0.1.0-alpha
14+
15+
ADD https://github.com/life-research/datomic-tx-metrics/releases/download/v${DATOMIC_TX_METRICS_VERSION}/datomic-tx-metrics-${DATOMIC_TX_METRICS_VERSION}-standalone.jar /datomic/lib/
16+
ADD /transactor.properties /datomic/
17+
ADD /logback.xml /datomic/bin/logback.xml
18+
ADD /start.sh /datomic/start
19+
RUN chmod +x /datomic/start
20+
21+
VOLUME /datomic/log
22+
23+
# Replacing the netty version that comes with Datomic may be necessary if it's
24+
# too old. Otherwise the metric collector's server won't start because of
25+
# missing functions which eventually leads to the transactor not being able to
26+
# finish startup.
27+
RUN find /datomic/lib -name 'netty-all-.*' -delete
28+
ADD https://repo1.maven.org/maven2/io/netty/netty-all/4.1.42.Final/netty-all-4.1.42.Final.jar /datomic/lib
29+
30+
EXPOSE 4334
31+
EXPOSE 8080
32+
33+
ADD /datomic-tx-metrics.jar /datomic/lib/
34+
35+
WORKDIR /datomic
36+
CMD ["./start"]

0 commit comments

Comments
 (0)