But I still don't understand why I should care?
As mentioned earlier, persistent data structures perform a copy whenever they are modified meaning there is never any chance that two threads could be modifying the same instance at any one time. And, because they are very efficient copies, you don't need to worry about using up gobs of memory in the process.
Even if threading isn't a concern, because they're immutable,
you can pass them around between objects, methods, and
functions in the same thread and never worry about data
corruption; no more defensive calls to Object#dup
!
What's the downside--there's always a downside?
There's a potential performance hit when compared with MRI's built-in, native, hand-crafted C-code implementation of Hash.
For example:
hash = Hamster::Hash.empty
(1..10000).each { |i| hash = hash.put(i, i) }
# => 0.05s
(1..10000).each { |i| hash.get(i) }
# => 0.008s
vs.
hash = {}
(1..10000).each { |i| hash[i] = i }
# => 0.004s
(1..10000).each { |i| hash[i] }
# => 0.001s
The previous comparison wasn't really fair. Sure, if all you
want to do is replace your existing uses of Hash
in single-
threaded environments then don't even bother. However, if you
need something that can be used efficiently in concurrent
environments where multiple threads are accessing--reading AND
writing--the contents things get much better.
A more realistic comparison might look like:
hash = Hamster::Hash.empty
(1..10000).each { |i| hash = hash.put(i, i) }
# => 0.05s
(1..10000).each { |i| hash.get(i) }
# => 0.008s
versus
hash = {}
(1..10000).each { |i| hash = hash.dup; hash[i] = i }
# => 19.8s
(1..10000).each { |i| hash[i] }
# => 0.001s
What's even better -- or worse depending on your perspective
-- is that after all that, the native Hash
version still
isn't thread-safe and still requires some synchronization
around it slowing it down even further.
The Hamster version on the other hand was unchanged from the original whilst remaining inherently thread-safe, and 3 orders of magnitude faster.
You still need synchronisation so why bother with the copying?
Well, I could show you one but I'd have to re-write/wrap most
Hash methods to make them generic, or at the very least write
some application-specific code that synchronized using a Mutex
and ... well ... it's hard, I always make mistakes,
I always end up with weird edge cases and race conditions so,
I'll leave that as an exercise for you :)
And don't forget that even if threading isn't a concern for you, the safety provided by immutability alone is worth it, not to mention the lazy implementations.