-
Notifications
You must be signed in to change notification settings - Fork 7
Description
When sending certain get or set Redis commands to Skyhook, we are getting a 500 error back from Skyhook. When we debug the exception that gets thrown, we get something similar to the below:
INFO c.a.skyhook.listener.BaseListener - com.aerospike.client.AerospikeException$InvalidNode: Error -3,1,0,30000,1000,0: Node not found for partition wayfair:1194
This does not happen on every request and sometimes we are able to get or set keys correctly in the Aerospike cluster. There doesn't seem to be a pattern to the keys or values we try to get/set.
In chatting with Aerospike support, they reviewed our logs and said the following:
After taking a look through your logs, we can see that the cluster size looks stable throughout the time you experienced the client errors.
We also don’t see any heartbeat or fabric connection churn, which indicates that the cluster state seems healthy and stable.
It’s possible that a reason the client is throwing Node not found for partition errors is because Skyhook is still in beta. It could be related some connectivity issues that’s preventing it from properly tending to all nodes in the cluster.
On a separate note, I noticed several warnings in the server logs regarding login - internal user credential mismatch. This warning indicates that a login failed using a valid internal user that is found in the Access Control List with an incorrect password.
I don't note any login issues on our end. We are wondering if there are any other avenues we might need to try to get it working consistently.
Note: We did update the Aerospike client version in Skyhook to 6.1.6 while working on this.