-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to insert list in ets:insert, ets:lookup refactor #1405
base: main
Are you sure you want to change the base?
Add option to insert list in ets:insert, ets:lookup refactor #1405
Conversation
src/libAtomVM/ets.c
Outdated
} | ||
EtsErrorCode result = ets_table_insert(ets_table, tuple, ctx); | ||
if (result != EtsOk) { | ||
AVM_ABORT(); // Abort because operation might not be atomic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't do VM abort: calling AVM_ABORT() means that an unrecoverable happened, such as memory corruption, a bad internal bug and any other kind of situation that required an entire VM crash and reboot.
Are we in this specific situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we don't have any other tool to ensure atomicity here. In case the insert fails at the Nth element, elements (0,N -1) will be inserted into the list, which could result in hard-to-debug behavior. It is unlikely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand: without this abort, if we're short on memory, we'd leave list partially inserted. If someone tries to persist inserts someday we'd leave the system in inconsistent state.
To avoid that we need to either abort or allocate the list of previous values and rollback in case of error (ensuring that nothing allocates in rollback path since we're most likely dealing with OOM). Abort is easier to do here.
This check needs to have UNLIKELY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair point, there is another feasible approach:
table nodes can be pre-allocated before making any change to the list, so in case of allocation failure freeing up allocated nodes can be easily done before making any actual change.
ets_hashtable_insert
will need an additional node parameter, and a dedicated allocation function might be created (e.g. ets_hashtable_new_node
). Furthermore key and and entry parameters can be moved to the ets_hashtable_new_node
function if it can help.
This change will have a very small impact since ets_hashtable_insert
is used in just one or two places.
I suggest doing this with an additional commit inside this PR, so we can make the review easier and separate this activity in 2 tasks.
This change will remove any implicit allocation and make abort not necessary.
76774f0
to
6ac7831
Compare
return EtsTableNotFound; | ||
} | ||
|
||
EtsErrorCode result = ets_table_lookup(ets_table, key, ret, ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on this code recently, I noticed that hashtable lookup take keypos arg which isn't needed (we have node->key
and keypos can't change after table creation). May be worth to do it in this PR or in the followup.
6ac7831
to
c2bc9d2
Compare
Signed-off-by: Tomasz Sobkiewicz <[email protected]>
f54ef62
to
45eccdc
Compare
return NULL; | ||
} | ||
size_t size = (size_t) memory_estimate_usage(entry); | ||
if (memory_init_heap(heap, size) != MEMORY_GC_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering, why we create new heap instead of piggybacking on owner process' heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: this code is just a refactored existing code block that was doing the same.
Long answer: the GC is completely decoupled from ETS at the moment and ETS tables are not tied at all to a Context, so GC will not use their items as roots, so if we use the process heap we would screw up them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly: they could be on the owner's heap if we could guarantee that they won't be GC'd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that would require some bigger changes and there might be some extra complexity.
45eccdc
to
e5908a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still few minor changes are required, thanks for this work so far.
CHANGELOG.md
Outdated
@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
- Added the ability to run beams from the CLI for Generic Unix platform (it was already possible with nodejs and emscripten). | |||
- Added support for 'erlang:--/2'. | |||
- Added preliminary support for ESP32P4 (no networking support yet). | |||
- Added support for list insertion in 'ets:insert/2'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it got by mistake in the wrong section of the changelog file.
Changed aimed to main
branch should be under ## Unreleased
(not in ## [0.6.6] - Unreleased
section).
The right place should be just below Added
supervisor:terminate_child/2,
supervisor:restart_child/2`.
src/libAtomVM/ets_hashtable.c
Outdated
@@ -94,38 +131,34 @@ EtsHashtableErrorCode ets_hashtable_insert(struct EtsHashTable *hash_table, term | |||
#endif | |||
|
|||
struct HNode *node = hash_table->buckets[index]; | |||
struct HNode *last_node = NULL; | |||
if (node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore since we are doing while (node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a node with completely new key. We're using linked lists for collisions. If you don't have last_node
then node
will be always null – you'd create a new node and then instead of attaching it to the end of a linked list, you'd add it directly to the bucket, overwriting existing nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably I didn't explain good enough, or maybe I'm overlooking something.
I mean that in the following snippet:
if (cond) {
while (cond) {
}
}
if (cond)
is redundant: if cond is false, while will be executed 0 times anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, you're absolutely right. Github didn't show enough context in inline diff to see that.
387dc1c
to
54951ff
Compare
{ | ||
Heap *heap = malloc(sizeof(Heap)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small trick and optimization:
Heap struct can be embedded inside HNode this way:
struct HNode
{
struct HNode *next;
term key;
term entry;
Heap heap;
};
Also, this will allow to avoid malloc overhead, simplifies code and also reduces memory fragmentation
src/libAtomVM/ets_hashtable.c
Outdated
@@ -94,38 +131,34 @@ EtsHashtableErrorCode ets_hashtable_insert(struct EtsHashTable *hash_table, term | |||
#endif | |||
|
|||
struct HNode *node = hash_table->buckets[index]; | |||
struct HNode *last_node = NULL; | |||
if (node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably I didn't explain good enough, or maybe I'm overlooking something.
I mean that in the following snippet:
if (cond) {
while (cond) {
}
}
if (cond)
is redundant: if cond is false, while will be executed 0 times anyway.
return NULL; | ||
} | ||
size_t size = (size_t) memory_estimate_usage(entry); | ||
if (memory_init_heap(heap, size) != MEMORY_GC_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that would require some bigger changes and there might be some extra complexity.
CHANGELOG.md
Outdated
@@ -25,7 +26,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
- Added the ability to run beams from the CLI for Generic Unix platform (it was already possible with nodejs and emscripten). | |||
- Added support for 'erlang:--/2'. | |||
- Added support for list insertion in 'ets:insert/2'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pure nitpicking moment here, I always try to keep "changes local".
Let me explain better:
Let's suppose one day we find that the commit for pre-allocating nodes introduces a regression, the easiest solution would be just doing git revert 54951ffa8176cbec5ce757313abe60c9f38c8395
, but this would revert back also the changelog fix, and the entry would jump into the wrong section.
The ideal approach is using git commit --fixup=01456a3e7544a144acbcf57e58b2adb02e270d50
and then git rebase --autosquash
so the fixup is applied to the right commit, so changes belong to the right commit and reverting one doesn't revert unrelated changes.
src/libAtomVM/ets_hashtable.c
Outdated
memory_destroy_heap(node->heap, global); | ||
node->heap = heap; | ||
free(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that grouping together memory_destroy_heap(node->heap, global)
and free(node)
in a small helper function would make harder to forget one of the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will refactor every occurrence of this.
I'm sorry about this additional round of comments, hope they can be somehow interesting to you and not just boring nitpicking. |
Don't worry, thank you for your patience when reviewing it ❤️ |
4bc9bb5
to
6e249eb
Compare
Signed-off-by: Tomasz Sobkiewicz <[email protected]>
6e249eb
to
0f71f40
Compare
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
Changes:
Use Cases for the Helper Functions:
The new helper functions can be utilized in the following ETS operations to reduce code duplication:
Every mentioned function will be implemented after merging of this PR.