You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In persist.py:MultiWorkerQueue:get_worker() only the oidset_name and device_name are used to determine which persist worker should get a task. This results in hotspots if an oidset get an especially large number of instances.
In this specific case a circuit testing configuration on a router is utilizing a large number of epipes causing a hotspot for one particular persist worker which cannot keep up. Since the work unit is essentially oidset:device there isn't an obvious way to break up an especially large work unit to better spread the load.
There may be a potential bug lying in wait as well since the current code will basically divide the work amongst queues in a relatively non-determinisitc way. If one was to restart espolld without draining all the queues in memcache entirely (or restarting memcache) then individual work units will almost certainly end up in different queues after the restart resulting in possible out-of-order writes.
The text was updated successfully, but these errors were encountered:
In persist.py:MultiWorkerQueue:get_worker() only the oidset_name and device_name are used to determine which persist worker should get a task. This results in hotspots if an oidset get an especially large number of instances.
In this specific case a circuit testing configuration on a router is utilizing a large number of epipes causing a hotspot for one particular persist worker which cannot keep up. Since the work unit is essentially oidset:device there isn't an obvious way to break up an especially large work unit to better spread the load.
There may be a potential bug lying in wait as well since the current code will basically divide the work amongst queues in a relatively non-determinisitc way. If one was to restart espolld without draining all the queues in memcache entirely (or restarting memcache) then individual work units will almost certainly end up in different queues after the restart resulting in possible out-of-order writes.
The text was updated successfully, but these errors were encountered: