Is there a theoretical limit to how much data can be saved in one RIAK node? #1145
-
My question is not about hardware limitations, but are there any restrictions on the RIAK itself. (File descriptor limits, things like that?) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
That is a fascinating question, and the answer will vary enormously with use-case. In a general sense, the only real limitations you will find will be caused by hardware - primarily memory and disk space. Riak KV itself doesn't have any specific limits, but system resource use will grow as the amount of data stored grows. I am not aware of any maximum limits that people have hit that cannot be tuned on either the OS or Riak side. The biggest Riak install I have encountered since I started performing Riak support in 2016 was 14PB averaging around 50TB in each node. Some backends have different resource usage patterns that may cause separate issues at scale though e.g. Bitcask loads all keys in to memory. This matters if you have a lot of small objects compared to a few large objects. Using other backends such as leveled, or even multi-backend, can help avoid this. Optimisation is down to use case e.g. if you plan to store lots of large files that are not accessed much, your optimisation would be different to what you would do if you had lots of small files that are regularly updated but discarded after 60 days. Another optimisation you may wish to consider is what happens in a node failure situation: when Riak automatically spins up new vnodes to compensate for physical node(s) failure, it will temporarily duplicate your data for the failed node(s) until they become available again. If you have GBs on each node then this is fairly fast, but if you have PBs then this will take a very long time even with SSDs. This is one reason to have a very large ring size if you are going to store a vast amount of data. |
Beta Was this translation helpful? Give feedback.
-
One key practical limit is handoffs, especially with ordered backends (leveldb/leveled) where the PUT speed slows with more frequent stalls as the vnode grows. If vnodes get too big, transfers may start to timeout. A transfer that times out is very expensive .. as it may require a lot of work to get back to where it was. In the worst case handoffs perpetually timeout -> restart -> timeout -> ..... There's been recent PRs to try and improve the handoff timeout situation, with some additional configuration tweaks added. But I would recommend as part of any non-functional test to determine sizing, to test how long a node replace takes, and if a node replace will complete reliably. If the node replace takes too long for your environment (e.g. so you couldn't get it to reliably complete out of core hours), or will not complete reliably due to transfer time outs - then there is too much data in the node. We've just gone through a big exercise picking node sizes as part of a major Riak migration to a cloud environment, and ultimately it was the transfer time that was the primary factor in determining when we needed to horizontally not vertically scale. We set our deadline of how fast we wished node replaces (and also node repairs) to complete, and used that set the limit on data per node. |
Beta Was this translation helpful? Give feedback.
One key practical limit is handoffs, especially with ordered backends (leveldb/leveled) where the PUT speed slows with more frequent stalls as the vnode grows. If vnodes get too big, transfers may start to timeout. A transfer that times out is very expensive .. as it may require a lot of work to get back to where it was. In the worst case handoffs perpetually timeout -> restart -> timeout -> .....
There's been recent PRs to try and improve the handoff timeout situation, with some additional configuration tweaks added. But I would recommend as part of any non-functional test to determine sizing, to test how long a node replace takes, and if a node replace will complete reliably. If the nod…