You know how async methods that await something incomplete end up creating a few objects, right? There's
the boxed state machine, an Action that moves it forward, a Task[<T>], etc - right?
Well... what about if there just wasn't?
And what if all you had to do was change your async ValueTask<int> method to async PooledValueTask<int>?
And I hear you; you're saying "but I can't change the public API!". But what if a PooledValueTask<int> really was
a ValueTask<int>? So you can just cheat:
public ValueTask<int> DoTheThing() // the outer method is not async
{
	return ReallyDoTheThing(this);
	static async PooledValueTask<int> ReallyDoTheThing(SomeType obj)
	{
		... await ...
		// (use obj.* instead of this.*)
		... return ...
	}
}(the use of a static local function here avoids a <>c__DisplayClass wrapper from how the local-function capture context is implemented by the compiler)
And how about if maybe just maybe in the future it could be (if this happens) just:
[SomeKindOfAttribute] // <=== this is the only change
public async ValueTask<int> DoTheThing()
{
	// no changes here at all
}(although note that in some cases it can work better with the static trick, as above)
Would that be awesome? Because that's what this is!
The PooledValueTask[<T>] etc exist mostly to define a custom builder. The builder in this library uses aggressive pooling of classes
that replace the boxed approach used by default; we recycle them when the state machine completes.
It also makes use of the IValueTaskSource[<T>] API to allow incomplete operations to be represented without a Task[<T>], but with a custom backer.
And we pool that too, recycling it when the task is awaited. The only downside: you can't await the same result twice now, because
once you've awaited it the first time, it has gone. A cycling token is used to make sure you can't accidentally read the incorrect
values after the result has been awaited.
We can even do this for Task[<T>], except here we can only avoid the boxed state machine; hence PooledTask[<T>] exists too. No custom backing in this case, though, since a Task[<T>] will
need to be allocated (except for Task.CompletedTask, which we special-case).
Based on an operation that uses Task.Yield() to ensure that the operations are incomplete; ".NET" means the inbuilt out-of-the box implementation; "Pooled" means the implementation from this library.
In particular, notice:
- zero allocations for PooledValueTask[<T>]vsValueTask[<T>](on .NET Core; significantly reduced on .NET Framework)
- reduced allocations for PooledTask[<T>]vsTask[<T>]
- no performance degredation; just lower allocations
| Method |  Job | Runtime |   Categories |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 |  Gen 2 | Allocated |
|------- |----- |-------- |------------- |---------:|----------:|----------:|-------:|-------:|-------:|----------:|
|   .NET |  Clr |     Clr |      Task<T> | 2.159 us | 0.0427 us | 0.0474 us | 0.0508 | 0.0039 |      - |     344 B |
| Pooled |  Clr |     Clr |      Task<T> | 2.037 us | 0.0246 us | 0.0230 us | 0.0273 | 0.0039 |      - |     182 B |
|   .NET | Core |    Core |      Task<T> | 1.397 us | 0.0024 us | 0.0022 us | 0.0176 |      - |      - |     120 B |
| Pooled | Core |    Core |      Task<T> | 1.349 us | 0.0058 us | 0.0054 us | 0.0098 |      - |      - |      72 B |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr |         Task | 2.065 us | 0.0200 us | 0.0167 us | 0.0508 | 0.0039 |      - |     336 B |
| Pooled |  Clr |     Clr |         Task | 1.979 us | 0.0179 us | 0.0167 us | 0.0273 | 0.0039 |      - |     182 B |
|   .NET | Core |    Core |         Task | 1.390 us | 0.0159 us | 0.0149 us | 0.0176 |      - |      - |     112 B |
| Pooled | Core |    Core |         Task | 1.361 us | 0.0055 us | 0.0051 us | 0.0098 |      - |      - |      72 B |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr | ValueTask<T> | 2.087 us | 0.0403 us | 0.0431 us | 0.0547 | 0.0078 | 0.0039 |     352 B |
| Pooled |  Clr |     Clr | ValueTask<T> | 1.924 us | 0.0248 us | 0.0220 us | 0.0137 | 0.0020 |      - |     100 B |
|   .NET | Core |    Core | ValueTask<T> | 1.405 us | 0.0078 us | 0.0073 us | 0.0195 |      - |      - |     128 B |
| Pooled | Core |    Core | ValueTask<T> | 1.374 us | 0.0116 us | 0.0109 us |      - |      - |      - |         - |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr |    ValueTask | 2.056 us | 0.0206 us | 0.0183 us | 0.0508 | 0.0039 |      - |     344 B |
| Pooled |  Clr |     Clr |    ValueTask | 1.948 us | 0.0388 us | 0.0416 us | 0.0137 | 0.0020 |      - |     100 B |
|   .NET | Core |    Core |    ValueTask | 1.408 us | 0.0140 us | 0.0117 us | 0.0176 |      - |      - |     120 B |
| Pooled | Core |    Core |    ValueTask | 1.366 us | 0.0039 us | 0.0034 us |      - |      - |      - |         - |Note that most of the remaining allocations are actually the work-queue internals of Task.Yield() (i.e. how
ThreadPool.QueueUserWorkItem works) - we've removed virtually all of the unnecessary overheads that came from the
async machinery. Most real-world scenarios aren't using Task.Yield() - they are waiting on external data, etc - so
they won't see these. Plus they are effectively zero on .NET Core 3.
The tests do the exact same thing; the only thing that changes is the return type, i.e. whether it is
async Task<int>, async ValueTask<int>, async PooledTask<int> or async PooledValueTask<int>.
All of them have the same threading/execution-context/sync-context semantics; there's no cheating going on.