You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(As posted on the Riak Slack with some modifications.)
I think I have found an interesting issue where hinted handoffs do not happen if you have a couple of bitcask backends. (Assuming default configuration for the related things below.)
Hinted handoffs are triggered when the riak_core_vnode (gen_fsm implementation) / riak_kv_node (the callback module we're interested of here) has been inactive for a period of 60 to 120 seconds [1]. It then sends an inactive event to the riak_core_vnode_manager which will trigger the handoffs [2].
Thus, if the riak_core_vnode/riak_kv_node receives an event at least once per minute the hinted handoffs are never triggered.
At first I though this was related to AAE activity (we're not using Tictac AAE though) or our Telegraf plugin that extracts Riak stats every 10 seconds (after reading this issue: basho/riak_core#715). However, I inspected a process that I expected to do a hinted handoff with sys:log/2 and found something interesting.
With an interval of about ten seconds it received a riak_vnode_req_v1 message with {backend_callback, reference(), merge_check} as the request[3]. That merge_check event is scheduled by riak_kv_bitcask_backend with an interval of 180 seconds +/- 30% [4]. And we currently have 15 backends (different backends for different TTL etc). So that means that an event like this is handled every 12 seconds on average and the riak_core_vnode/riak_kv_node never hits that inactivity timeout that's needed to trigger the handoffs.
We're currently running Riak 2.9.3 but it seems like my findings are relevant for the latest version as well.
Is there something that I'm missing here? As I see it, this is maybe not technically a bug but still an issue for what I guess is not an uncommon setup.
An easy workaround for us is to set vnode_inactivity_timeout to 5000 (5 to 10 seconds inactivity timeout) but that's to step around the idea of only doing handoffs when the vnode is not busy. Perhaps one should use the background manager (riak_core_bg_manager) for handoffs to achieve that kind of safety but I have not looked further into that. Anyway, the background manager is not used for handoffs by default.
I received a reply from @martinsumner containing the following:
I suspect your analysis is correct though, I don't think the consequences of having this many backends on things like the inactivity timeout have been considered. I would be worried about reducing the inactivity timeout, I think it would be better to change the config of the bitcask_merge_check_interval. This should be changeable now in the riak.conf (although it currently hidden so that isn't obvious - https://github.com/basho/riak_kv/blob/develop-3.0/priv/riak_kv.schema#L749-L762)
He also asked about the reasons of having so many backends and if it's just for the TTL. Here's my reply to that:
This is an old setup that stretches back a few years and I can't justify all the reasons but I can see that we have been using separate backends for specific buckets for full flexibility. Some examples of TTLs are 1 h, 3 h, 24 h and 4 w. We don't even use all of these ones today. After that we have added generic ones like expiry_1h and expiry_24h. Our default backend is LevelDB (we only have one there).
If there would be an easy way to have either bucket or object specific TTL with a single backend there would be nothing stopping us from using that except some time and effort to migrate. I can understand the complexity of testing these multi backend combinations so what you're writing there seems like good news.
References
[1] riak_core_vnode inactivity timeout
riak_core_vnode will send an inactive event to the riak_core_vnode_manager if it has not received any event for a period of 60 to 120 seconds.
%% Queue a callback for the backend after Time ms.
-speccallback_after(integer(), reference(), term()) ->reference().
callback_after(Time, Ref, Msg) whenis_integer(Time), is_reference(Ref) ->riak_core_vnode:send_command_after(Time, {backend_callback, Ref, Msg}).
The text was updated successfully, but these errors were encountered:
It is assumed that vnodes that are ready to handoff will naturally go inactive at some point (as requests have migrated away from it)
However, there may be regular scheduled messages to a vnode - e.g. polling for tictac AAE, backend callbacks - and these disturb and reset the timeout, potentially meaning the timeout will never occur even when application traffic is diverted from the vnode
vnodes can then be stuck awaiting handoff
As there are no limits on backends, or on the operator reconfiguring timeouts - Riak has no awareness it has been configured and set-up in such a state that handoffs will not occur.
An interim fix for this particular case, would be to reconfigure the bitcask_merge_check_interval - and to make this more obvious perhaps in the next release this should be a made a visible setting (it is currently by default hidden) with an additional comment warning about impacting the vnode_inactivity_timeout.
The only thing is perhaps there should be an infrequent poll which will always prompt a maybe_trigger_handoff request. So for most clusters without timeout issues, a 60s timeout will be sufficient to trigger handoff ... whereas if the timeout isn't being triggered, eventually the infrequent poll will trigger it.
But this always leads to the possibility of triggering handoff earlier in some cases, before the vnode is inactive. Could this lead to strange bugs?
(As posted on the Riak Slack with some modifications.)
I think I have found an interesting issue where hinted handoffs do not happen if you have a couple of bitcask backends. (Assuming default configuration for the related things below.)
Hinted handoffs are triggered when the
riak_core_vnode
(gen_fsm
implementation) /riak_kv_node
(the callback module we're interested of here) has been inactive for a period of 60 to 120 seconds [1]. It then sends aninactive
event to theriak_core_vnode_manager
which will trigger the handoffs [2].Thus, if the
riak_core_vnode
/riak_kv_node
receives an event at least once per minute the hinted handoffs are never triggered.At first I though this was related to AAE activity (we're not using Tictac AAE though) or our Telegraf plugin that extracts Riak stats every 10 seconds (after reading this issue: basho/riak_core#715). However, I inspected a process that I expected to do a hinted handoff with
sys:log/2
and found something interesting.With an interval of about ten seconds it received a
riak_vnode_req_v1
message with{backend_callback, reference(), merge_check}
as therequest
[3]. Thatmerge_check
event is scheduled byriak_kv_bitcask_backend
with an interval of 180 seconds +/- 30% [4]. And we currently have 15 backends (different backends for different TTL etc). So that means that an event like this is handled every 12 seconds on average and theriak_core_vnode
/riak_kv_node
never hits that inactivity timeout that's needed to trigger the handoffs.This issue has been brought up before by @bipthelin but we did not know the reason back then (https://postriak.slack.com/archives/C6R0LPH4N/p1570609054000900). @martinsumner linked this issue (#1706) as a good explanation of how handoffs are triggered but the issue itself was related to Tictac AAE (https://postriak.slack.com/archives/C6R0LPH4N/p1570618826012100).
We're currently running Riak 2.9.3 but it seems like my findings are relevant for the latest version as well.
Is there something that I'm missing here? As I see it, this is maybe not technically a bug but still an issue for what I guess is not an uncommon setup.
An easy workaround for us is to set
vnode_inactivity_timeout
to 5000 (5 to 10 seconds inactivity timeout) but that's to step around the idea of only doing handoffs when the vnode is not busy. Perhaps one should use the background manager (riak_core_bg_manager
) for handoffs to achieve that kind of safety but I have not looked further into that. Anyway, the background manager is not used for handoffs by default.I received a reply from @martinsumner containing the following:
He also asked about the reasons of having so many backends and if it's just for the TTL. Here's my reply to that:
References
[1]
riak_core_vnode
inactivity timeoutriak_core_vnode
will send aninactive
event to theriak_core_vnode_manager
if it has not received any event for a period of 60 to 120 seconds.Snippets from riak_core_vnode.erl:
[2]
riak_core_vnode_manager
Snippets from riak_core_vnode_manager.erl:
[3] Using
sys:log/2
to see messages received by a vnode processI expected
riak_kv_node
1278813932664540053428224228626747642198940975104 to do its handoff and stop.[4] Scheduling of the
merge_check
eventSnippets from riak_kv_bitcask_backend.erl:
Snippet from riak_kv_backend.erl:
The text was updated successfully, but these errors were encountered: