Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to insert list in ets:insert, ets:lookup refactor #1405

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TheSobkiewicz
Copy link
Contributor

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

Changes:

  • Enabled ets:insert/2 to accept lists for bulk insertion.
  • Extracted helper functions for ets:lookup/2 and ets:insert/2 that do not apply table locks.

Use Cases for the Helper Functions:

The new helper functions can be utilized in the following ETS operations to reduce code duplication:

  • ets:update_element/3
  • ets:insert_new/2
  • ets:update_counter/3
  • ets:update_counter/4
  • ets:take/2
  • ets:delete_object/2

Every mentioned function will be implemented after merging of this PR.

}
EtsErrorCode result = ets_table_insert(ets_table, tuple, ctx);
if (result != EtsOk) {
AVM_ABORT(); // Abort because operation might not be atomic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't do VM abort: calling AVM_ABORT() means that an unrecoverable happened, such as memory corruption, a bad internal bug and any other kind of situation that required an entire VM crash and reboot.
Are we in this specific situation?

Copy link
Contributor Author

@TheSobkiewicz TheSobkiewicz Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have any other tool to ensure atomicity here. In case the insert fails at the Nth element, elements (0,N -1) will be inserted into the list, which could result in hard-to-debug behavior. It is unlikely to happen.

Copy link
Contributor

@jakub-gonet jakub-gonet Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand: without this abort, if we're short on memory, we'd leave list partially inserted. If someone tries to persist inserts someday we'd leave the system in inconsistent state.

To avoid that we need to either abort or allocate the list of previous values and rollback in case of error (ensuring that nothing allocates in rollback path since we're most likely dealing with OOM). Abort is easier to do here.

This check needs to have UNLIKELY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair point, there is another feasible approach:
table nodes can be pre-allocated before making any change to the list, so in case of allocation failure freeing up allocated nodes can be easily done before making any actual change.

ets_hashtable_insert will need an additional node parameter, and a dedicated allocation function might be created (e.g. ets_hashtable_new_node). Furthermore key and and entry parameters can be moved to the ets_hashtable_new_node function if it can help.
This change will have a very small impact since ets_hashtable_insert is used in just one or two places.
I suggest doing this with an additional commit inside this PR, so we can make the review easier and separate this activity in 2 tasks.

This change will remove any implicit allocation and make abort not necessary.

src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
tests/erlang_tests/test_ets.erl Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch 4 times, most recently from 76774f0 to 6ac7831 Compare January 9, 2025 15:44
src/libAtomVM/ets.c Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
return EtsTableNotFound;
}

EtsErrorCode result = ets_table_lookup(ets_table, key, ret, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this code recently, I noticed that hashtable lookup take keypos arg which isn't needed (we have node->key and keypos can't change after table creation). May be worth to do it in this PR or in the followup.

@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from 6ac7831 to c2bc9d2 Compare January 12, 2025 03:18
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch 2 times, most recently from f54ef62 to 45eccdc Compare January 15, 2025 16:13
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Outdated Show resolved Hide resolved
src/libAtomVM/ets.c Show resolved Hide resolved
return NULL;
}
size_t size = (size_t) memory_estimate_usage(entry);
if (memory_init_heap(heap, size) != MEMORY_GC_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, why we create new heap instead of piggybacking on owner process' heap?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: this code is just a refactored existing code block that was doing the same.

Long answer: the GC is completely decoupled from ETS at the moment and ETS tables are not tied at all to a Context, so GC will not use their items as roots, so if we use the process heap we would screw up them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly: they could be on the owner's heap if we could guarantee that they won't be GC'd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that would require some bigger changes and there might be some extra complexity.

src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from 45eccdc to e5908a2 Compare January 16, 2025 12:56
Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still few minor changes are required, thanks for this work so far.

CHANGELOG.md Outdated
@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added the ability to run beams from the CLI for Generic Unix platform (it was already possible with nodejs and emscripten).
- Added support for 'erlang:--/2'.
- Added preliminary support for ESP32P4 (no networking support yet).
- Added support for list insertion in 'ets:insert/2'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it got by mistake in the wrong section of the changelog file.
Changed aimed to main branch should be under ## Unreleased (not in ## [0.6.6] - Unreleased section).
The right place should be just below Added supervisor:terminate_child/2, supervisor:restart_child/2`.

src/libAtomVM/ets_hashtable.c Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
@@ -94,38 +131,34 @@ EtsHashtableErrorCode ets_hashtable_insert(struct EtsHashTable *hash_table, term
#endif

struct HNode *node = hash_table->buckets[index];
struct HNode *last_node = NULL;
if (node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore since we are doing while (node)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a node with completely new key. We're using linked lists for collisions. If you don't have last_node then node will be always null – you'd create a new node and then instead of attaching it to the end of a linked list, you'd add it directly to the bucket, overwriting existing nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably I didn't explain good enough, or maybe I'm overlooking something.
I mean that in the following snippet:

if (cond) {
   while (cond) {
   }
}

if (cond) is redundant: if cond is false, while will be executed 0 times anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you're absolutely right. Github didn't show enough context in inline diff to see that.

src/libAtomVM/ets_hashtable.h Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch 3 times, most recently from 387dc1c to 54951ff Compare January 20, 2025 14:25
{
Heap *heap = malloc(sizeof(Heap));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small trick and optimization:
Heap struct can be embedded inside HNode this way:

struct HNode
{
    struct HNode *next;
    term key;
    term entry;
    Heap heap;
};

Also, this will allow to avoid malloc overhead, simplifies code and also reduces memory fragmentation

@@ -94,38 +131,34 @@ EtsHashtableErrorCode ets_hashtable_insert(struct EtsHashTable *hash_table, term
#endif

struct HNode *node = hash_table->buckets[index];
struct HNode *last_node = NULL;
if (node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably I didn't explain good enough, or maybe I'm overlooking something.
I mean that in the following snippet:

if (cond) {
   while (cond) {
   }
}

if (cond) is redundant: if cond is false, while will be executed 0 times anyway.

return NULL;
}
size_t size = (size_t) memory_estimate_usage(entry);
if (memory_init_heap(heap, size) != MEMORY_GC_OK) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that would require some bigger changes and there might be some extra complexity.

src/libAtomVM/ets_hashtable.c Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
@@ -25,7 +26,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Added the ability to run beams from the CLI for Generic Unix platform (it was already possible with nodejs and emscripten).
- Added support for 'erlang:--/2'.
- Added support for list insertion in 'ets:insert/2'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure nitpicking moment here, I always try to keep "changes local".
Let me explain better:
Let's suppose one day we find that the commit for pre-allocating nodes introduces a regression, the easiest solution would be just doing git revert 54951ffa8176cbec5ce757313abe60c9f38c8395, but this would revert back also the changelog fix, and the entry would jump into the wrong section.
The ideal approach is using git commit --fixup=01456a3e7544a144acbcf57e58b2adb02e270d50 and then git rebase --autosquash so the fixup is applied to the right commit, so changes belong to the right commit and reverting one doesn't revert unrelated changes.

memory_destroy_heap(node->heap, global);
node->heap = heap;
free(node);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that grouping together memory_destroy_heap(node->heap, global) and free(node) in a small helper function would make harder to forget one of the two.

Copy link
Contributor Author

@TheSobkiewicz TheSobkiewicz Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will refactor every occurrence of this.

@bettio
Copy link
Collaborator

bettio commented Jan 22, 2025

I'm sorry about this additional round of comments, hope they can be somehow interesting to you and not just boring nitpicking.
Thanks for the contribution so far ❤️.

@TheSobkiewicz
Copy link
Contributor Author

I'm sorry about this additional round of comments, hope they can be somehow interesting to you and not just boring nitpicking. Thanks for the contribution so far ❤️.

Don't worry, thank you for your patience when reviewing it ❤️

@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch 2 times, most recently from 4bc9bb5 to 6e249eb Compare January 22, 2025 17:35
@TheSobkiewicz TheSobkiewicz force-pushed the thesobkiewicz/nifs/ets/refactor_insert branch from 6e249eb to 0f71f40 Compare January 22, 2025 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants