Skip to content

CLIENT-3439 Provide user-level metrics for applications using Aerospike Database (Metrics v2) #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: stage
Choose a base branch
from

Conversation

mirzakaracic
Copy link
Contributor

@mirzakaracic mirzakaracic commented Mar 15, 2025

Adding detailed metrics and resultcodes with namespaces to golang client.

  • Added latency for node/namespace/ and cluster
  • Added bytes-sent bytes-received for node/namespace/command and cluster
  • Added connection acquired for node/namespace/command and cluster
  • Added parsing latency for node/namespace/command and cluster
  • Added all (0 - 255) server error counts for node/namespace/command and cluster
  • Added labels for the cluster

Valided and optimized updateOrInsert calls. Below is a list of benchmarks as result of those optimizations.

	_, err = func() (int, Error) {
		if metricsEnabled {
			var dataSent = 0
			start := time.Now()
			dataSent, err = cmd.conn.Write(cmd.dataBuffer[:cmd.dataOffset])
			cmd.applyDetailedMetricsBytesSentAndTransmission(ifc, dataSent, start)
			return dataSent, err
		}
		return cmd.conn.Write(cmd.dataBuffer[:cmd.dataOffset])
	}()

=== RUN BenchmarkApplyDetailedMetricsBytesSentAndTransmission
BenchmarkApplyDetailedMetricsBytesSentAndTransmission
BenchmarkApplyDetailedMetricsBytesSentAndTransmission-14 8375888 131.0 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkApplyDetailedConnectionAq
BenchmarkApplyDetailedConnectionAq
BenchmarkApplyDetailedConnectionAq-14 11469990 103.1 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkApplyDetailedParsing
BenchmarkApplyDetailedParsing
BenchmarkApplyDetailedParsing-14 11457337 104.4 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkCommandMergeCommandResultCodeMetrics
BenchmarkCommandMergeCommandResultCodeMetrics
BenchmarkCommandMergeCommandResultCodeMetrics-14 3045164 401.4 ns/op 64 B/op 5 allocs/op
=== RUN BenchmarkCommandMergeDetailMetrics
BenchmarkCommandMergeDetailMetrics
BenchmarkCommandMergeDetailMetrics-14 5773930 202.5 ns/op 24 B/op 2 allocs/op
=== RUN BenchmarkConnectionAcWithMetrics
BenchmarkConnectionAcWithMetrics
BenchmarkConnectionAcWithMetrics-14 10457097 115.9 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkParseAcWithMetrics
BenchmarkParseAcWithMetrics
BenchmarkParseAcWithMetrics-14 10541271 114.7 ns/op 24 B/op 1 allocs/op
PASS

Benchmark results without inline optimizations, aka on metrics callection in the command.executeAt method. As example

	if metricsEnabled {
		start := time.Now()
		dataSent, err = cmd.conn.Write(cmd.dataBuffer[:cmd.dataOffset])
		cmd.applyDetailedMetricsBytesSentAndTransmission(ifc, dataSent, start)
	} else {
		_, err = cmd.conn.Write(cmd.dataBuffer[:cmd.dataOffset])
	}

=== RUN BenchmarkApplyDetailedMetricsBytesSentAndTransmission
BenchmarkApplyDetailedMetricsBytesSentAndTransmission
BenchmarkApplyDetailedMetricsBytesSentAndTransmission-14 8488948 132.0 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkApplyDetailedConnectionAq
BenchmarkApplyDetailedConnectionAq
BenchmarkApplyDetailedConnectionAq-14 11312936 107.7 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkApplyDetailedParsing
BenchmarkApplyDetailedParsing
BenchmarkApplyDetailedParsing-14 10988397 108.8 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkCommandMergeCommandResultCodeMetrics
BenchmarkCommandMergeCommandResultCodeMetrics
BenchmarkCommandMergeCommandResultCodeMetrics-14 2924882 412.2 ns/op 64 B/op 5 allocs/op
=== RUN BenchmarkCommandMergeDetailMetrics
BenchmarkCommandMergeDetailMetrics
BenchmarkCommandMergeDetailMetrics-14 5800849 207.9 ns/op 24 B/op 2 allocs/op

------------> benchmarks without inline optmizations

=== RUN BenchmarkConnectionAcWithMetrics
BenchmarkConnectionAcWithMetrics
BenchmarkConnectionAcWithMetrics-14 9772182 120.3 ns/op 24 B/op 1 allocs/op
=== RUN BenchmarkParseAcWithMetrics
BenchmarkParseAcWithMetrics
BenchmarkParseAcWithMetrics-14 10514532 115.9 ns/op 24 B/op 1 allocs/op
PASS
ok github.com/aerospike/aerospike-client-go/v8 9.834s

Filtered result snipped of Stats() snapshot:

{
  "127.0.0.1:3100": {
    "batch-read-metrics": {
      "buckets": [
        0,
        0,
        0,
        3
      ],
      "count": 3,
      "max": 538,
      "min": 451,
      "sum": 1494
    },
    "batch-write-metrics": {
      "buckets": [
        0,
        0,
        0,
        1
      ],
      "count": 1,
      "max": 502,
      "min": 502,
      "sum": 502
    },
    "circuit-breaker-hits": 0,
    "closed-connections": 0,
    "connections-attempts": 100,
    "connections-error-other": 0,
    "connections-error-timeout": 0,
    "connections-failed": 0,
    "connections-idle-dropped": 0,
    "connections-pool-empty": 0,
    "connections-pool-overflow": 0,
    "connections-successful": 100,
    "delete-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "detailed-metrics": {
      "test": {
        "BatchRead": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 430,
            "min": 214,
            "sum": 858
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 285,
            "min": 269,
            "sum": 823
          },
          "connection-aq": {
            "buckets": [
              2,
              0,
              0,
              1
            ],
            "count": 3,
            "max": 9,
            "min": 0,
            "sum": 9
          },
          "latency": {
            "buckets": [
              0,
              0,
              1,
              2
            ],
            "count": 3,
            "max": 10,
            "min": 5,
            "sum": 23
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 511,
            "min": 438,
            "sum": 1444
          }
        },
        "BatchWrite": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 214,
            "min": 214,
            "sum": 214
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 271,
            "min": 271,
            "sum": 271
          },
          "connection-aq": {
            "buckets": [
              1,
              0,
              0,
              0
            ],
            "count": 1,
            "max": 0,
            "min": 0,
            "sum": 0
          },
          "latency": {
            "buckets": [
              0,
              0,
              1,
              0
            ],
            "count": 1,
            "max": 6,
            "min": 6,
            "sum": 6
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 491,
            "min": 491,
            "sum": 491
          }
        },
        "Put": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 30,
            "min": 30,
            "sum": 240
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 100,
            "min": 100,
            "sum": 800
          },
          "connection-aq": {
            "buckets": [
              8,
              0,
              0,
              0
            ],
            "count": 8,
            "max": 1,
            "min": 0,
            "sum": 1
          },
          "latency": {
            "buckets": [
              0,
              0,
              3,
              5
            ],
            "count": 8,
            "max": 15,
            "min": 5,
            "sum": 64
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 1263,
            "min": 291,
            "sum": 3575
          }
        }
      }
    },
    "detailed-resultcode-counts": {
      "test": {
        "BatchRead": {
          "OK": 24
        },
        "BatchWrite": {
          "OK": 8
        },
        "Put": {
          "OK": 8
        }
      }
    },
    "exists-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "get-header-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "get-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "node-added-count": 1,
    "node-removed-count": 0,
    "open-connections": 100,
    "operate-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "partition-map-updates": 1,
    "put-metrics": {
      "buckets": [
        0,
        0,
        0,
        8
      ],
      "count": 8,
      "max": 1358,
      "min": 302,
      "sum": 3743
    },
    "query-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "scan-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "tends-failed": 0,
    "tends-successful": 2,
    "tends-total": 2,
    "transaction-error-count": 0,
    "transaction-retry-count": 0,
    "udf-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    }
  },
  "cluster-aggregated-stats": {
    "batch-read-metrics": {
      "buckets": [
        0,
        0,
        0,
        3
      ],
      "count": 3,
      "max": 538,
      "min": 451,
      "sum": 1494
    },
    "batch-write-metrics": {
      "buckets": [
        0,
        0,
        0,
        1
      ],
      "count": 1,
      "max": 502,
      "min": 502,
      "sum": 502
    },
    "circuit-breaker-hits": 0,
    "closed-connections": 0,
    "connections-attempts": 100,
    "connections-error-other": 0,
    "connections-error-timeout": 0,
    "connections-failed": 0,
    "connections-idle-dropped": 0,
    "connections-pool-empty": 0,
    "connections-pool-overflow": 0,
    "connections-successful": 100,
    "delete-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "detailed-metrics": {
      "test": {
        "BatchRead": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 430,
            "min": 214,
            "sum": 858
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 285,
            "min": 269,
            "sum": 823
          },
          "connection-aq": {
            "buckets": [
              2,
              0,
              0,
              1
            ],
            "count": 3,
            "max": 9,
            "min": 0,
            "sum": 9
          },
          "latency": {
            "buckets": [
              0,
              0,
              1,
              2
            ],
            "count": 3,
            "max": 10,
            "min": 5,
            "sum": 23
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              3
            ],
            "count": 3,
            "max": 511,
            "min": 438,
            "sum": 1444
          }
        },
        "BatchWrite": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 214,
            "min": 214,
            "sum": 214
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 271,
            "min": 271,
            "sum": 271
          },
          "connection-aq": {
            "buckets": [
              1,
              0,
              0,
              0
            ],
            "count": 1,
            "max": 0,
            "min": 0,
            "sum": 0
          },
          "latency": {
            "buckets": [
              0,
              0,
              1,
              0
            ],
            "count": 1,
            "max": 6,
            "min": 6,
            "sum": 6
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              1
            ],
            "count": 1,
            "max": 491,
            "min": 491,
            "sum": 491
          }
        },
        "Put": {
          "bytes-received": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 30,
            "min": 30,
            "sum": 240
          },
          "bytes-sent": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 100,
            "min": 100,
            "sum": 800
          },
          "connection-aq": {
            "buckets": [
              8,
              0,
              0,
              0
            ],
            "count": 8,
            "max": 1,
            "min": 0,
            "sum": 1
          },
          "latency": {
            "buckets": [
              0,
              0,
              3,
              5
            ],
            "count": 8,
            "max": 15,
            "min": 5,
            "sum": 64
          },
          "parsing": {
            "buckets": [
              0,
              0,
              0,
              8
            ],
            "count": 8,
            "max": 1263,
            "min": 291,
            "sum": 3575
          }
        }
      }
    },
    "detailed-resultcode-counts": {
      "test": {
        "BatchRead": {
          "OK": 24
        },
        "BatchWrite": {
          "OK": 8
        },
        "Put": {
          "OK": 8
        }
      }
    },
    "exceeded-max-retries": 0,
    "exceeded-total-timeout": 0,
    "exists-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "get-header-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "get-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "labels": [
      {
        "app-id": "",
        "cluster": "",
        "host": "127.0.0.1:3100",
        "node": "BB979E256BB9B1A"
      }
    ],
    "node-added-count": 1,
    "node-removed-count": 0,
    "open-connections": 100,
    "operate-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "partition-map-updates": 1,
    "put-metrics": {
      "buckets": [
        0,
        0,
        0,
        8
      ],
      "count": 8,
      "max": 1358,
      "min": 302,
      "sum": 3743
    },
    "query-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "scan-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    },
    "tends-failed": 0,
    "tends-successful": 2,
    "tends-total": 2,
    "transaction-error-count": 0,
    "transaction-retry-count": 0,
    "udf-metrics": {
      "buckets": [
        0,
        0,
        0,
        0
      ],
      "count": 0,
      "max": 0,
      "min": 0,
      "sum": 0
    }
  },
  "open-connections": 100,
  "total-nodes": 1
}

@mirzakaracic mirzakaracic requested a review from khaf March 15, 2025 06:58
@mirzakaracic mirzakaracic self-assigned this Mar 15, 2025
@mirzakaracic mirzakaracic marked this pull request as ready for review March 21, 2025 06:29
@mirzakaracic mirzakaracic changed the title Metrics impl CLIENT-3365 Metrics changes Mar 30, 2025
@mirzakaracic mirzakaracic force-pushed the metrics-impl branch 2 times, most recently from 3cd8f5a to ae20651 Compare April 8, 2025 05:55
@mirzakaracic mirzakaracic force-pushed the metrics-impl branch 3 times, most recently from d6806f2 to 05eff99 Compare April 22, 2025 22:02
Copy link
Collaborator

@khaf khaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. There are a few issues that need addressing:

  • Some closures are unnecessary and need removal.
  • Some closures in the hot path look to be costly both in terms of GC pressure and performance and need evaluation.
  • The amount of data received from the server is not recorded in the stats.
  • Histogram merge seems to have been broken.

@mirzakaracic mirzakaracic requested a review from khaf April 30, 2025 20:11
@mirzakaracic mirzakaracic changed the title CLIENT-3365 Metrics changes CLIENT-3439 Provide user-level metrics for applications using Aerospike Database (Metrics v2) Apr 30, 2025
@khaf
Copy link
Collaborator

khaf commented May 2, 2025

We had a good discussion with Mirza, and made a few improvements, the most important of which is that now the client can reliably return the amount of data received from the server. We also reduced the number of allocations where possible.

@mirzakaracic
Copy link
Contributor Author

I think we are good on the total bytes received. I have have performed the following operations

size := 8

util.WriteRecords(cli, keyPrefix, binName, valuePrefix, size)
util.BatchExists(cli, keyPrefix, size)
util.BatchReads(cli, keyPrefix, binName, size)
util.BatchReadHeaders(cli, keyPrefix, size)
util.BatchDelete(cli, keyPrefix, size)

The output for the commands ran above in the same sequence are

2025/05/02 12:30:57 Starting the cluster tend goroutine...
Put: ns=test set=demo key=batchkey0 bin=batchbin value=batchvalue0
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey1 bin=batchbin value=batchvalue1
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey2 bin=batchbin value=batchvalue2
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey3 bin=batchbin value=batchvalue3
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey4 bin=batchbin value=batchvalue4
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey5 bin=batchbin value=batchvalue5
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey6 bin=batchbin value=batchvalue6
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
Put: ns=test set=demo key=batchkey7 bin=batchbin value=batchvalue7
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 30 bytes, command type: Put
BatchExists creating 8 keys
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 214 bytes, command type: BatchExists
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey0 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey1 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey2 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey3 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey4 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey5 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey6 exists=true
2025/05/02 12:30:57 Record: ns=test set=demo key=batchkey7 exists=true
BatchRead creating 8 keys
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 430 bytes, command type: BatchRead
Record: ns=test set=demo key=batchkey0 bin=batchbin value=batchvalue0
Record: ns=test set=demo key=batchkey1 bin=batchbin value=batchvalue1
Record: ns=test set=demo key=batchkey2 bin=batchbin value=batchvalue2
Record: ns=test set=demo key=batchkey3 bin=batchbin value=batchvalue3
Record: ns=test set=demo key=batchkey4 bin=batchbin value=batchvalue4
Record: ns=test set=demo key=batchkey5 bin=batchbin value=batchvalue5
Record: ns=test set=demo key=batchkey6 bin=batchbin value=batchvalue6
Record: ns=test set=demo key=batchkey7 bin=batchbin value=batchvalue7
BatchReadHeaders creating 8 keys
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 214 bytes, command type: BatchReadHeader 
Record: ns=test set=demo key=batchkey0 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey1 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey2 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey3 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey4 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey5 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey6 generation=1 expiration=4294967295
Record: ns=test set=demo key=batchkey7 generation=1 expiration=4294967295
BatchReadHeaders creating 8 keys
2025/05/02 12:30:57 Node BB979E256BB9B1A [::1]:3100: Received 214 bytes, command type: BatchDelete
Record: ns=test set=demo key=batchkey0 resultCode=0
Record: ns=test set=demo key=batchkey1 resultCode=0
Record: ns=test set=demo key=batchkey2 resultCode=0
Record: ns=test set=demo key=batchkey3 resultCode=0
Record: ns=test set=demo key=batchkey4 resultCode=0
Record: ns=test set=demo key=batchkey5 resultCode=0
Record: ns=test set=demo key=batchkey6 resultCode=0
Record: ns=test set=demo key=batchkey7 resultCode=0

The map will forever require heap allocations. The Iterator on the other hand can be optimized away to NOT require allocations. At the moment Go's compiler is not that sophisticated and still allocates memory for the iterator, but I expect that to be resolved in the future. Having said that, Iterator still performs significantly better and allocates less memory even today.
@mirzakaracic
Copy link
Contributor Author

mirzakaracic commented May 21, 2025

So the reason we dont have the data corruption issue with maps is because we are creating a copy. This is also the overhead. We most likely could fix the issue when using iterator with locks maybe, I have not really thought this through, but since number of namespaces is not unbound I don't think it make sense to use a generator here.

There are swissmaps coming to golang in 1.24 as default implementation for map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants