-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stash objcore references until the end of the task to avoid copies #4269
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, an argument had been made (IIRC by @mbgrydeland) that keeping object references until the end of the task would increase their lifetime by too much, but restarts in VCL really should be done within milliseconds in most cases - and if keeping references is an actual problem in specific situations, those can be avoided by either not restarting or rolling back also.
A restart occurring after vcl_miss
can take orders of magnitude longer than milliseconds.
I'm not convinced this is a good idea. I think objcores are resources we should always try to hold onto sparingly, and workspace allocations are an acceptable trade off.
a055c50
to
15c1510
Compare
For this scenario to be relevant, it would need to be a
What would be the convincing argument for that? I brought my case, so if you disagree, what is yours? |
a3881c1
to
ed965b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, I am aware and I was specifically referring to this argument in the last paragraph of the initial comment.
And I quoted the last paragraph, except the last sentence I don't understand.
This proposal gives object references the same lifetime as
PRIV_TASK
and removes the then unnecessary workspace copies. As a side effect, it also solves the case for #3768, because this also avoids most copies of static strings to workspace.
It was a quick glance, but I don't see how ensuring an at-least-task lifetime to objcores solves anything for static strings. In #3768 I added a new function to the runtime that sets a header with a string guaranteed to outlive the task (user's responsibility) and taught libvcc to better keep track of constant expressions to favor the new function when possible.
I would argue that with objcores guaranteed to outlive VCL execution, we could teach libvcc to recognize obj.stuff
as a constant expression:
set req.http.foo = "foo"; # 3768 would avoid a workspace copy
set req.http.foo = obj.http.foo; # 3768 could avoid a workspace copy
set req.http.foo = "bar, " + obj.http.foo; # cannot avoid copy
Your change is only affecting core code and it would easily compose with #3768 to extend obj.*
"constness" to VCL code (though not very common). But it certainly doesn't solve workspace overflows on constant string assignments, I have seen cases where a significant bunch of headers are set in vcl_deliver
and vcl_synth
(CORS and other policies).
I brought my case, so if you disagree, what is yours?
My main concern is that object storage is a more critical resource than workspaces. If we need a lot of workspace, we can lower task concurrency. When storage is saturated and in constant churn (when you eventually reach full capacity) it becomes crucial that dying objects actually go away. Increasing latency here means that your churn throughput will increase, reducing your cache/storage efficiency.
Forget what I said about restarts. Unlike retries, I think they should generally be avoided and be the exception instead of the norm, so I don't have that much of a problem adding the objcore retention caveat to this feature.
I had a closer look and in the normal case we stash nothing and drop the objcore reference in cnt_finish()
, right? In that case I'm more open to it, and it would be a good occasion to revisit #3768 with more "task-constant" expressions. But I really don't like how the stash is put together. Please also note that dropping the ref in cnt_finish()
makes the normal case "not quite task-scoped" but is it good enough? If the answer is yes, then the stash should be cleared alongside.
bin/varnishd/cache/cache_req_fsm.c
Outdated
stash = WS_Alloc(ws, (unsigned)stash_sz(l)); | ||
if (stash == NULL) | ||
stash = malloc(stash_sz(l)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be allocated from the workspace once and for all similarly to the req->top
allocation. Especially since we fix max_restarts
when req
is initialized. We should stop performing heap allocations for task-scoped allocations (we could submit a pull request to always allocate dynamic privs from workspace). What is the point of giving a fixed budget if we double dip in the heap?
In fact, shouldn't the stash belong to struct reqtop
?
edit: never mind my last question, I crossed two streams.
bin/varnishd/cache/cache_req_fsm.c
Outdated
@@ -684,8 +751,7 @@ cnt_lookup(struct worker *wrk, struct req *req) | |||
WRONG("Illegal return from vcl_hit{}"); | |||
} | |||
|
|||
/* Drop our object, we won't need it */ | |||
(void)HSH_DerefObjCore(wrk, &req->objcore, HSH_RUSH_POLICY); | |||
stash_oc(&req->ocstash, &req->objcore, req->ws, req->max_restarts + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean req->restarts + 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The stash is only allocated once for the maximum capacity the task might need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't resolve review comments on my behalf.
I crossed two streams again. By the time I left my comment I had convinced myself that the stash should be allocated upfront, and misinterpreted the meaning of "restarts" here.
Anyway, what I'm seeing here is duplication and spread of logic. Both stash_oc()
calls are identical, both are forcing the call site to know about internal logic, they should instead look like this:
Req_StashObjcore(req);
Req_StashObjcore(req, &req->objcore); // if we ever grow the need to stash another oc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can change the signature. The current generic arguments come from the idea that we might want to reuse the facility somewhere else, but we do not have that use case at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you like a5e8ebc better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this all belongs in cache_req.c
, and I'm thinking that clearing the stash could fit well in Req_Cleanup()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the code is declared static
in cache_req_fsm.c
because that's where it is used.
I think otherwise it would live in cache_hash.c
, because HSH_DerefObjCore()
lives there.
ocstash_fini()
/ ReqFiniObjcoreStash()
is called where VCL_TaskLeave()
is called, because the the two are very much related. Req_Cleanup()
is concerned with the lifetime of struct req
, which is longer than PRIV_TASK
.
side note: #3994 will probably also need this facility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Req_Cleanup()
barely outlives the task's lifetime, and at least calling it from there guarantees that privs were actually freed *before* clearing the stash. And a cleanup doesn't prevent req
from being reused for keep-alive, you are confusing it with Req_Release()
.
side note: #3994 will probably also need this facility
You are just agreeing with me that cache_req_fsm.c
is not a good place for this. At least with what I'm suggesting we can move the stash logic from cache_req.c
to cache_obj.c
and keep the call site in Req_Cleanup()
.
We can cross that bridge when obj_stale
makes its appearance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can cross that bridge when obj_stale makes its appearance.
That's were we agree.
With the slight interface modifications in #4271 and in particular calling the cleanup from Req_*.c
, I do fully agree with the move of the code. The two important questions, however, are if the new location to call fini/cleanup is better and if we really should have the upfront allocation.
The sentence is:
My point is: This patch makes the common case cheaper.
Without this patch, static strings need to be copied always. Without this patch, the
I think I understand what you are doing, and I think we should first have a more general improvement. We should first make sure that every reference in VCL at least has the same lifetime as
Now you seem confused to me. Also with #3768,
It is not just affecting core code, because vmods suffer from the same copying1. And I have said all the time that it would compose with the (Oh man, I am repeating myself for the fifth time now or so.)
I understand what an object reference implies. Also we should note that for most objects on a typical Varnish-Cache installation, the bulk of object references will be held because of body delivery. here the main point is that all the scenarios where the reference would be kept longer than before the patch are special cases, which can be avoided by not restarting, or rolling back in addition to the restart. None of these cases is typical, and the argument that for some special case some object would be held for somewhat longer than before the patch really seems not significant.
So was the first half of your comment from "before you looked closer" and the last paragraph from after?
Yes
The stash is only allocated if needed. The call to Footnotes
|
I can parse it now, got it. I generally agree with tradeoffs in favor of the common cases.
Agreed, with the understanding that "this patch" refers to #3768. But I think you are reading too much into my patch series, see below.
I don't think I can take credit for the
Not confused, right? The idea was assuming "objcores guaranteed to outlive VCL execution", but your
We actually agree, this is poor wording on my end. I think both changes would complement each other well and open the door to a new optimization.
Hours before the beginning and the end of my review...
Then considering the workspace gains we can expect, we should avoid the complications of a just-in-time allocation, and certainly not fall back to a heap allocation. I think we should give it the reqtop treatment and simply always make room for the stash. |
As the rest of the discussion looks like it might be resolved, I will only respond to the last paragraph:
Again, I wanted to keep impact on the common case minimal.
This is unlikely and simplifies the calling code. |
81391a5
to
a5e8ebc
Compare
The impact is already a negative workspace footprint.
So does a small systematic allocation (56B by default) during the |
It is not like I had not considered this option. My worry is that it will impact users with high On the malloc fallback, I think it is generally a good idea to not fail in our supporting facilities when we could be in an exception code path. |
And I disputed this in a previous comment (#4269 (comment)). If we have dedicated allocators in the form of workspaces for tasks, then task allocations should be performed there and respect the configured limits. It's doable, I already offered to submit such a change. And then there are the cases where a workspace allocation may not be appropriate (for example |
Sure is the workspace allocation failure handling doable. The point here is that if we run into the ws overflow at this point, a subsequent request will already fail, and we will induce massive overhead basically everywhere, from handling the error, possibly running into a restart etc. etc. But at any rate, doing a heap allocation for this exception path is, I think, offset by the gain in simplicity: Because we might already be on an exception path when we stash an oc, we save ourselves from complicating it further. Do not get me wrong: Yes, we should use the workspace whenever feasible. FTR on the other topic: The allocation of the |
The simplest approach is a preemptive allocation of the stash that completely removes the need for just-in-time allocations and gets rid of all allocation-related branches. Ignoring a failed workspace allocation here is just delaying the actual workspace failure somewhere in vcl_synth for the synth/fail cases (reminder, the built-in This is actually increasing the distance between the root cause and the symptoms. Again, reserving a stash upfront:
I do agree in the dynamic priv case that it brings error handling to the caller. In fact
That's exactly the distinction I was making. The struct should be allocated from the relevant workspace, the member is up to the VMOD author. Quoting myself:
Replace my "data" with your more accurate " 👎 for not noticing the pun :] I think the simplest actually is:
Pros and cons:
And if the stash turns out to have a prohibitive cost1 then we can complicate the picture and refactor the allocation to make it just-in-time. The extra care you added is premature optimization at this point. Footnotes
|
I disagree that it is the better option, because it charges all users with a workspace allocation which will not be needed in most cases. The |
It is not fixed. It depends on
No. The straight path does not need it. The common cases where it is needed are |
From a discussion here: varnishcache#4269 (comment)
From a discussion here: varnishcache#4269 (comment)
Sorry, I meant fixed for the duration of the task.
I meant that your change will pay dividends with the workspace savings outweighing the upfront allocation even for the common cases.
I submitted #4271 to illustrate what I meant about better encapsulation. What leaks out of |
bin/varnishd/cache/cache_req_fsm.c
Outdated
if (nxt == REQ_FSM_DONE) { | ||
ReqFiniObjcoreStash(req); | ||
INIT_OBJ(ctx, VRT_CTX_MAGIC); | ||
VCL_Req2Ctx(ctx, req); | ||
if (IS_TOPREQ(req)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe to clear the stash before the privs free callbacks run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The free callbacks should not do anything with references which the priv might still hold, but still I agree that changing the order is the safer option. Thank you for your valid point and good suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have in mind is collecting something during the task, coming from a header for example, and logging it when the task ends. We replaced the free()
callback with a fini()
one that takes a VRT_CTX
to allow privs to do things.
bin/varnishd/cache/cache_req_fsm.c
Outdated
ReqFiniObjcoreStash(req); | ||
|
||
INIT_OBJ(ctx, VRT_CTX_MAGIC); | ||
VCL_Req2Ctx(ctx, req); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe to clear the stash before the privs free callbacks run.
@dridi On the I agree that calling it in the middle of |
I'm not following you here, isn't That's why I said that |
FWIW I'm fine assuming longer objcore retention for very specific use cases:
Failing from deliver stashes the objcore for a very short time in #4271 because of the rollback happening before entering I don't remember ever seeing more than one restart. By far the most common case I have seen is the purge+restart combo, and it's not affected by this change. The second most common case I have seen is a request changed to perform a (usually cacheable) request to check authorizations before delivering privileged cache hits, and since I have seen cases where the first authorization request grabs a token so a rollback becomes impossible. I have never seen a real-world case for a pass-from-hit (yes, intended). So I'm fine with these changes:
|
@dridi thank you for your excellent summary. |
"almost 64k restarts ought to be enough for everyone"
a5e8ebc
to
8c8e19c
Compare
Context: Only while we hold a reference to an object are we allowed to access any attributes from it. With restarts, we might want to keep references to attributes of objects from previous restart iterations. Before this patch, this could lead to use-after-dereference, which was "fixed" by always copying object variables into the request's workspace (though this was problematic if VMODs did not copy always, as will be shown by a follow-up vtc addition). But that copyp is wasteful, so we should rather make sure that we keep references until we are done with the task and not copy (which is the second next commit). An argument had been made in the past that keeping object references until the end of the task would increase their lifetime by too much. It is true that, to support all possible use cases, we need to keep any oc references until the end of the task, which might include a final body delivery. But keeping the additional references can be avoided by either not restarting, or rolling back also. In general, until now we have charged all Varnish-Cache users with the cost for specific use cases, but we should rather only charge the specific use cases instead. Implementation: struct ocstash basically is a Variable Length Array (VLA) on the workspace, allocated when the request is created. For each invocation, ocstash_push() copies the passed objcore to one of max_restarts + 1 slots and clears the original location as HSH_DerefObjCore() would. Alternatives considered: * Dynamically allocating space for each objcore pointer was dismissed due to the substantial overhead. * Using a TASK_PRIV was tried and dismissed because this would require setting up a VRT_CTX or pulling the VRT_CTX out one level to the core of the FSM, which was intrusive and increased complexity substantially. * The original iteration of this patch would dynamically allocate the workspace for struct ocstash only when neeed, but after some involved discussion between Dridi and Nils this idea was given up in favor of the simpler upfront allocation. Co-authored-by: Dridi Boukelmoune <[email protected]>
8c8e19c
to
191be48
Compare
@dridi I have now taken your version of the patch in slightly modified form and added you as Co-Auther by the de-facto standard form (I should have looked this up for some other commits added recently and done the same...). Regarding the jit vs. upfront allocation, one argument came to my mind which, I think, had not been mentioned, but which made me change my opinion: For a scenario with a substantial amount of restarts which only happen under some circumstances, users might be taken by surprise if their code "suddenly" needs more workspace. Because, to make their use case work, they need to increase the workspace size anyway, they rather notice sooner than by surprise. So, simply put, I now agree that this is the better option by the argument of "predictability before maximum efficiency". A diff to your version can be found in 75e88ca. I have done the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change LGTM, and apologies if the review got on your nerves. You can otherwise tell me I'm wrong as much as you want. As long as you are saying it in good faith I can't take offense.
In the end, I'm happy with the result of our discussion.
/* name */ max_restarts, | ||
/* type */ uint, | ||
/* min */ "0", | ||
/* max */ NULL, | ||
/* max */ "65534", // (1<<16)-2 #4269 | ||
/* def */ "4", | ||
/* units */ "restarts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was up to me, the limit would be 20. I already have a very hard time justifying more than twoone restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but if there is no technical reason, I guess we should not restrict it. Who knows what people might be doing with it...
I at least remember having had a previous iteration of r02618.vtc using a ridiculously high max_restarts
before getting the idea to use the predictable vtc xids.
/* name */ max_retries, | ||
/* type */ uint, | ||
/* min */ "0", | ||
/* max */ NULL, | ||
/* max */ "65534", // (1<<16)-2 #4269 | ||
/* def */ "4", | ||
/* units */ "retries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was up to me, the limit would be 20.
/*-------------------------------------------------------------------- | ||
* Facility to keep obcore references until the end of the task across restarts | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion:
Facility to keep references of discarded objcores exposed to VCL code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"discarded objcores" to me sounds like they would be removed from cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to reduce my workspace saving expectations here. I somehow convinced myself that this change was preventing copies from objcore to workspace once the objcore is guaranteed to outlive VCL processing.
It turns out we already skip copies, when most notably during resp
setup. HTTP_Decode()
already dips directly in the objcore without a detour from workspace.
So I still think this is a good idea, but we need to deal with set HEADER = HEADER
to actually reap benefits.
if (reason && !WS_Allocated(ctx->ws, reason, -1)) { | ||
reason = WS_Copy(ctx->ws, reason, -1); | ||
if (!reason) { | ||
VRT_fail(ctx, "Workspace overflow"); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be safe actually:
return (synth(200, some_vmod.some_string()));
In that case we can't assume a lifetime beyond the VRT_synth()
call for the reason
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #4269 (comment) regarding this and the next four review comments.
if (q == NULL) { | ||
if (h == NULL) | ||
return (""); | ||
if (WS_Allocated(ws, h, -1)) | ||
return (h); | ||
} else if (h == NULL && WS_Allocated(ws, q, -1)) { | ||
if (q == NULL && h == NULL) | ||
return (""); | ||
if (q == NULL) | ||
return (h); | ||
if (h == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, we can't assume a task lifetime of the arguments.
r = VRT_StrandsWS(ctx->ws, NULL, s); | ||
if (r != NULL && *r != '\0') | ||
AN(WS_Allocated(ctx->ws, r, strlen(r) + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our concern is coverage for VRT_StrandsWS()
then we need to keep a check that the result effectively belongs to the workspace.
if (r != NULL && *r != '\0') | ||
AN(WS_Allocated(ctx->ws, r, strlen(r) + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous.
if (*b != '\0') | ||
AN(WS_Allocated(hp->ws, b, strlen(b) + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous.
b00092.vtc shows how copying from an object is skipped when restarting:
Yes, this is no copy to But the main benefit is that we do not need to copy for all the other cases, which you questioned again. After this PR, the contract would officially become that all pointers passed around for VCL have to have at least Regarding your code comments: I do not want to add 4 more comments to this, but what kind of a pointer do you think for |
From a discussion here: varnishcache#4269 (comment)
This proposal was motivated by #3768, which is about avoiding to make copies of constant strings by special casing. This PR does not yet include one additional detail from #3768 1, but it solves the underlying root cause.
Context
Any reference handled in VCL needs to have at least
PRIV_TASK
lifetime. We notoriously shied away from formalizing this definition, but it is a factual consequence from how workspaces work. Rollbacks reset the respective task's workspace and thus finalize allPRIV_TASK
s.One way of ensuring
PRIV_TASK
lifetime is by copying the referenced value (usually a string) to the task's workspace, and we do this today.Yet this is wasteful, because static strings from VCL and pointers to memory on the heap already outlive the task lifetime.
The only objects which did not already have
PRIV_TASK
lifetime were attributes from objects, because object references got returned before restarts. b92.vtc illustrates this case.For this reason and this reason only do we currently copy all strings to the respective workspace.
Avoid copies by giving object references task lifetime
This proposal gives object references the same lifetime as
PRIV_TASK
and removes the then unnecessary workspace copies. As a side effect, it also solves the case for #3768 1, because this also avoids most copies of static strings to workspace.In the past, an argument had been made (IIRC by @mbgrydeland) that keeping object references until the end of the task would increase their lifetime by too much, but restarts in VCL really should be done within milliseconds in most cases - and if keeping references is an actual problem in specific situations, those can be avoided by either not restarting or rolling back also. In general, until now we have charged all Varnish-Cache users with the cost for specific use cases, but we should rather only charge the specific use cases instead.
Footnotes
When we set an existing header to a new header with the same name as in
set req.http.name = resp.http.name
, we currently create a newHEADER
on the workspace. This could be avoided by allowing aHEADER
argument to some newSetHdr()
variant similarly to vcc: Teach HEADER symbols to accept constant strings #3768. ↩ ↩2