Skip to content

Conversation

lukego
Copy link
Member

@lukego lukego commented Nov 17, 2017

This release will follow quickly on the heels of the much-delayed v2017.09 release.

alexandergall and others added 30 commits July 28, 2017 08:32
Initialisation of the shm frame needs to be done in the constructor
method to avoid being overwritten by the frame created in core.app.

Add a copy of the mtu to the device-specific stats table, add a
counter to record the speed of the interface.
Avoid using index sets and just query the registers since it's
hard to share this state among the apps. Also do the MAC registration
in the right place.
RSS is always enabled, so there shouldn't be a case in which
this is a no-op anyway. Also the check is complicated by the fact
that RSS can be enabled in many ways (with DCB, VMDq, etc.).
Also add code to unset MAC on stop()
This code is mostly copied from the intel10g.lua driver.
Also add a new VLAN test and adjust old tests as needed.
The previous iteration was unreliable for unknown reasons. This
refactoring should be clearer and run more reliably.
Also add new test for VLAN stripping
 * PFVFTE to enable Tx from appropriate VFs
 * RTTD1TC for bandwidth allocation algorithm for Tx
The former wasn't updated for an API change and the latter
was missing a module declaration.
The tag insertion part of this test didn't work because the
MAC didn't match when the app tried to resend the packet. It's
not necessary since the transmit test covers this anyway.
@lukego
Copy link
Member Author

lukego commented Nov 21, 2017

Huzzah! The new intel_mp driver is now adopted as the default and the intel10g driver (which represents some of the first lines of code ever written in Snabb) becomes a legacy backup option.

Great hacking to all the many intel_mp hackers !!! 👍

@lukego
Copy link
Member Author

lukego commented Dec 5, 2017

Sorry about the wait. I wanted to recommend this for release now but I see a performance regression that needs to be resolved. I will first check if this is related to overuse of trace barriers in #1242.

@lukego
Copy link
Member Author

lukego commented Dec 6, 2017

Confirmed that the performance regression is due to the JIT barrier. The lukego-optimize performance with that change reverted matched master again. I will test using fewer barriers (on entry or exit to an app, but not both) and see if that is better. Otherwise the JIT barrier might not be suitable (too expensive) for app/engine transitions.

@lukego
Copy link
Member Author

lukego commented Dec 8, 2017

I have reverted the calls to jit.tracebarrier() around app callbacks. These seem to be a bit of a performance drag overall and their benefit is not really established.

So the jit.tracebarrier() primitive still exists but the engine doesn't currently use it.

Let's wait for the standard Hydra tests to complete now and then we should be 👍 for the release.

@lukego
Copy link
Member Author

lukego commented Dec 11, 2017

@eugeneia There is a performance regression on the iperf benchmark but I propose that we ship this now anyway.

I have been combing through recent CI results with @wingo on Slack and it seems like the problem is caused by voodoo. I have another branch with almost identical contents that does not show the issue. The only difference is whether the jit.tracebarrier() primitive exists in the C code. Having that code present seems to provoke the problem even though it is never called.

I am reluctant to make a "nonsense" change to "solve" this problem. I would prefer to accept it for now and focus on finding the root cause of why we see variance in the iperf benchmark. I see the new RaptorJIT tooling as the way to do this and so I want to spend my time now on integrating that with Snabb. Hence my willingness to accept this symptom of the root problem (wider variance on the iperf benchmark) for the moment.

Hypothetically if the problem is something obscure, like whether two Lua loop bytecode addresses hash into the same JIT hot counter, then there are probably very many different ways that it could be provoked (e.g. choice of C compiler version) and so I am not really confident that nailing down one such issue in the test environment would translate into a real world benefit. This would need to be solved more thoroughly in the JIT after seeing exactly what is really going on.

WDYT?

@eugeneia
Copy link
Member

Totally agree. I think #1244 is not included in this release, or is it? Maybe, as a last resort I would test to see if it resolves this regression like it did mine. This doesn’t block the release from my point of view though. Ready when you are!

@lukego
Copy link
Member Author

lukego commented Dec 11, 2017

@eugeneia Glad to hear that this change has helped you. I pushed it here and let's see what Hydra says tomorrow.

@lukego
Copy link
Member Author

lukego commented Dec 12, 2017

@eugeneia This was worth a try but the new benchmark results show the same issue: https://hydra.snabb.co/build/2822290/download/2/report.html#iperf.

@wingo
Copy link
Contributor

wingo commented Dec 12, 2017

In a way that's good to know that the results are the same. I was unsure whether to blame a voodoo change to the software itself or some problem with the statistics!

@eugeneia eugeneia merged commit 6c0f065 into master Dec 12, 2017
eugeneia added a commit that referenced this pull request Dec 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants