Replies: 1 comment 21 replies
-
@adriangb there are many reasons but I'll try to give a couple examples:
|
Beta Was this translation helpful? Give feedback.
21 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to understand where time goes in writing, which seems to take ~500ms for me even with small files and ~50 commits.
My test code looks as follows:
I'm running a proxy on
localhost:8000
so that I can see every request made. For only the last write (the one that's timed) I see:That's a surprising amount of requests. I don't understand why it's necessary to re-read the entire commit history: it should be up to date, at most a single list operation should confirm that. I also don't understand why it's reading data from parquet files before writing.
All in all I would have thought that only 4 or 5 requests would be needed:
_delta_log
to get the last commit id.Am I misunderstanding the Delta Lake protocol? Should I be using something lower-level?
Beta Was this translation helpful? Give feedback.
All reactions