Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: slow-path in async packet processing #4383

Merged
merged 9 commits into from
Sep 28, 2023

Conversation

rohrerj
Copy link
Contributor

@rohrerj rohrerj commented Aug 23, 2023

This pullrequest is the third out of 3 with the goal to restructure the border router as described in the Design Document.

This pullrequest contain the implementation of the slow-path.


This change is Reviewable

@rohrerj rohrerj requested a review from matzf as a code owner August 23, 2023 14:22
Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1.
Reviewable status: 1 of 5 files reviewed, 6 unresolved discussions (waiting on @rohrerj)


router/dataplane.go line 572 at r1 (raw file):

	packet        packet
	slowPathState slowPathState
}
  • camelCase, not snake_case
  • embed packet to make the access somewhat shorter?

router/dataplane.go line 807 at r1 (raw file):

	// DRKey key derivation for SCMP authentication
	drkeyProvider drkeyProvider
}
  • unused mac and cachedMac
  • the macBuffers whcih is shared with scionPacketProcessor is a bit silly now -- the slow processor only uses drkeyInput, the normal processor uses the other two buffers. I think we can simplify this be getting rid of the struct and just using a macInputBuffer []byte, initializing it to the max size of the different buffers. This seems safe.

router/dataplane.go line 1203 at r1 (raw file):

	bfdLayer layers.BFD

	slowPathState slowPathState

Don't make this a member here. This can be part of the return value from processPkt. Either in the processingResult (and do not return an error) or in the error type. Conceptually, I think both ways are ok, but I believe if we pack it in the error, we cannot avoid the heap allocation.


router/dataplane.go line 1211 at r1 (raw file):

	code       slayers.SCMPCode
	parameters []interface{}
	cause      error
  • slowPathState: I don't think "state" is a good name for this. Something like "request", or "metadata", or something along these lines would be more fitting.
  • mode:
    • "mode" seems an odd name, perhaps "type"?
    • this should be an "enum" (type int bla; const bla ... = iota; ....)
    • maybe the name for traceroute should rather be more generic "router alert"
  • parameters: this will always allocate and needs ugly casts. Instead, just add the union of all parameters that we'll need. It's not our fault this looks stupid, it's go's fault for not having ergonomic sum types.
  • cause: ok to pass a reference to a predefined global variable, but we should be clean up the places that allocate new errors

router/dataplane.go line 2155 at r1 (raw file):

	}

	return p.buffer.Bytes(), scmpError{TypeCode: scmpH.TypeCode, Cause: cause}

IMO we should remove the scmpError now. In the slowPathProcessor, an scmp error is the expected result, I don't think this should be treated as an error anymore.
It's also not used for anything useful, only a debug log statement uses this information. This, we can just move somewhere else
We can just as well move the Debug log statement somewhere else.


router/export_test.go line 83 at r1 (raw file):

	}
	result, err := p.processPkt(m.Buffers[0], srcAddr, ifID)
	if errors.Is(err, slowPathRequired) {

IMO squeezing the slow path processing into the same tests is the wrong approach here. Instead, I'd rather adapt the existing test cases so that they only check that the slow path processing is triggered when necessary. Add separate test cases for the actual slow path processing.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 7 unresolved discussions (waiting on @rohrerj)

a discussion (no related file):
The overall structure looks good to me 👍

As we move some methods from one type to another, the resulting "physical" organization of the definitions etc. in the dataplane.go file is rather messy now. Perhaps it is good to keep this as is for this PR to keep the diff small, and do a separate cleanup pass later.


Copy link
Contributor Author

@rohrerj rohrerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 6 unresolved discussions (waiting on @matzf)


router/dataplane.go line 572 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…
  • camelCase, not snake_case
  • embed packet to make the access somewhat shorter?

Done.


router/dataplane.go line 807 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…
  • unused mac and cachedMac
  • the macBuffers whcih is shared with scionPacketProcessor is a bit silly now -- the slow processor only uses drkeyInput, the normal processor uses the other two buffers. I think we can simplify this be getting rid of the struct and just using a macInputBuffer []byte, initializing it to the max size of the different buffers. This seems safe.

Done.


router/dataplane.go line 1203 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

Don't make this a member here. This can be part of the return value from processPkt. Either in the processingResult (and do not return an error) or in the error type. Conceptually, I think both ways are ok, but I believe if we pack it in the error, we cannot avoid the heap allocation.

I included the slowPathRequest in the processingResult but I still had to return an error because the entire logic of processPkt works like "continue if no error". To make it work without returning an error it would require a lot of changes in the packet processing logic.


router/dataplane.go line 1211 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…
  • slowPathState: I don't think "state" is a good name for this. Something like "request", or "metadata", or something along these lines would be more fitting.
  • mode:
    • "mode" seems an odd name, perhaps "type"?
    • this should be an "enum" (type int bla; const bla ... = iota; ....)
    • maybe the name for traceroute should rather be more generic "router alert"
  • parameters: this will always allocate and needs ugly casts. Instead, just add the union of all parameters that we'll need. It's not our fault this looks stupid, it's go's fault for not having ergonomic sum types.
  • cause: ok to pass a reference to a predefined global variable, but we should be clean up the places that allocate new errors

The problem with the "cause" field is that if we have a parametrized error, we either have to create a new error here or pass all parameters such that we can reconstruct the error. But the union of all those parameters would get rather large and we would need a lot of additional logic to reconstruct the correct errors. So what would you suggest here?


router/dataplane.go line 2155 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

IMO we should remove the scmpError now. In the slowPathProcessor, an scmp error is the expected result, I don't think this should be treated as an error anymore.
It's also not used for anything useful, only a debug log statement uses this information. This, we can just move somewhere else
We can just as well move the Debug log statement somewhere else.

Done.


router/export_test.go line 83 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

IMO squeezing the slow path processing into the same tests is the wrong approach here. Instead, I'd rather adapt the existing test cases so that they only check that the slow path processing is triggered when necessary. Add separate test cases for the actual slow path processing.

I created a separate function in export_test for slow path processing. Some already existing tests require slow path and because of that I moved them to a new test function for slow path.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only occurred to me now that "path" is a heavily used term in SCION, so calling this "slow-path" might be puzzling. If someone has a good idea for a better term, it might still be worth quickly renaming this.

Reviewed 1 of 3 files at r2, all commit messages.
Reviewable status: 2 of 5 files reviewed, 5 unresolved discussions (waiting on @rohrerj)


router/dataplane.go line 1203 at r1 (raw file):

Previously, rohrerj wrote…

I included the slowPathRequest in the processingResult but I still had to return an error because the entire logic of processPkt works like "continue if no error". To make it work without returning an error it would require a lot of changes in the packet processing logic.

Sounds good.


router/dataplane.go line 1211 at r1 (raw file):

Previously, rohrerj wrote…

The problem with the "cause" field is that if we have a parametrized error, we either have to create a new error here or pass all parameters such that we can reconstruct the error. But the union of all those parameters would get rather large and we would need a lot of additional logic to reconstruct the correct errors. So what would you suggest here?

Right, ignore the point on the "cause". It's not really related, we can clean this up separately. The idea would be to just drop the additional information in the error. This is only used for debug logging, so we can instead just do the debug logging in place (if it is enabled), and return simple error objects.


router/dataplane.go line 676 at r2 (raw file):

				packet:          p,
				slowPathRequest: result.SlowPathRequest,
			}:

Nit: bad formatting by gofmt here, closing brace visually matches up with the one from select. It's weird...
This can fit on a single line though, I think that makes it much more readable.

Suggestion:

			case slowQ <- slowPacket{p, result.SlowPathRequest}:

router/dataplane.go line 849 at r2 (raw file):

		return p.handleSCMPTraceRouteRequest(pkt.slowPathRequest.interfaceId)
	default:
		return processResult{}, serrors.New("Unsupported slow-path type", "type",

panic instead?


router/dataplane.go line 1179 at r2 (raw file):

const slowPathSCMP slowPathType = 0
const slowPathRouterAlert slowPathType = 1

Suggestion:

const (
	slowPathSCMP slowPathType = iota
	slowPathRouterAlert
)

router/dataplane_test.go line 653 at r2 (raw file):

			result, err := dp.ProcessPkt(tc.srcInterface, input)
			assert.ErrorIs(t, err, router.SlowPathRequired)
			_, err = dp.ProcessSlowPath(tc.srcInterface, input, result.SlowPathRequest)

This test currently doesn't check very much other than running the code a bit. It's not worse than before, but it seems like this would be an opportunity to to improve this.
I'd split the test; one part checks that the ProcessPkt creates the expected slowPathRequest -- like the test here, but instead of calling ProcessSlowPath, compare the parameters of the result.SlowPathRequest with some expected values.
A separate part defines slowPathRequest as inputs and checks that the resulting packet is the expected SCMP message.

Copy link
Contributor Author

@rohrerj rohrerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 4 unresolved discussions (waiting on @matzf)


router/dataplane.go line 1211 at r1 (raw file):

Previously, matzf (Matthias Frei) wrote…

Right, ignore the point on the "cause". It's not really related, we can clean this up separately. The idea would be to just drop the additional information in the error. This is only used for debug logging, so we can instead just do the debug logging in place (if it is enabled), and return simple error objects.

I changed the code to only use errors without parameters for the cause and added a debug log statement for all those scmp cases.


router/dataplane.go line 849 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

panic instead?

Done.


router/dataplane.go line 1179 at r2 (raw file):

const slowPathSCMP slowPathType = 0
const slowPathRouterAlert slowPathType = 1

Done.


router/dataplane_test.go line 653 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

This test currently doesn't check very much other than running the code a bit. It's not worse than before, but it seems like this would be an opportunity to to improve this.
I'd split the test; one part checks that the ProcessPkt creates the expected slowPathRequest -- like the test here, but instead of calling ProcessSlowPath, compare the parameters of the result.SlowPathRequest with some expected values.
A separate part defines slowPathRequest as inputs and checks that the resulting packet is the expected SCMP message.

I modified the test to check whether the returned slowPathRequest is as expected. But it is hard to check whether the out packet corresponds to a expected packet because we don't have nice functions to create / parse SCMP packets. (or at least I was not able to find one). And hardcode a byte array for comparison is hard to read / understand.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 2 of 3 files at r2, 1 of 3 files at r3.
Reviewable status: 3 of 5 files reviewed, 3 unresolved discussions (waiting on @rohrerj)


router/dataplane.go line 1211 at r1 (raw file):

Previously, rohrerj wrote…

I changed the code to only use errors without parameters for the cause and added a debug log statement for all those scmp cases.

💯


router/dataplane.go line 138 at r3 (raw file):

	macVerificationFailed         = serrors.New("MAC verification failed")
	badPacketSize                 = serrors.New("bad packet size")
	slowPathRequired              = serrors.New("slow-path required")

Somewhat aside (non-blocking): these serrors errors all include a rather pointless stacktrace which just clutters the log. We can easily improve this by creating the values with errors.New instead.


router/dataplane_test.go line 653 at r2 (raw file):

Previously, rohrerj wrote…

I modified the test to check whether the returned slowPathRequest is as expected. But it is hard to check whether the out packet corresponds to a expected packet because we don't have nice functions to create / parse SCMP packets. (or at least I was not able to find one). And hardcode a byte array for comparison is hard to read / understand.

Ok. Well there is slayers.SCMP and the corresponding messages like slayers.SCMPParameterProblem if you want to try. There are a load of examples on how to use these, e.g. in the braccept/cases.


router/export_test.go line 105 at r3 (raw file):

}

type SlowPathRequest struct {

Just move it to the "internal" tests instead of jumping through hoops to pretend exposing this?

Copy link
Contributor Author

@rohrerj rohrerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 1 unresolved discussion (waiting on @matzf)


router/dataplane.go line 138 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Somewhat aside (non-blocking): these serrors errors all include a rather pointless stacktrace which just clutters the log. We can easily improve this by creating the values with errors.New instead.

Done


router/dataplane_test.go line 653 at r2 (raw file):

Previously, matzf (Matthias Frei) wrote…

Ok. Well there is slayers.SCMP and the corresponding messages like slayers.SCMPParameterProblem if you want to try. There are a load of examples on how to use these, e.g. in the braccept/cases.

I added checks that the serialized scmp type code matches the expectation and that the correct scmp error layer type is contained in the layers (e.g. the slayers.LayerTypeSCMPParameterProblem)


router/export_test.go line 105 at r3 (raw file):

Previously, matzf (Matthias Frei) wrote…

Just move it to the "internal" tests instead of jumping through hoops to pretend exposing this?

Done.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r4, all commit messages.
Reviewable status: 4 of 5 files reviewed, 7 unresolved discussions (waiting on @rohrerj)


router/dataplane_internal_test.go line 46 at r4 (raw file):

)

var key = []byte("testkey_xxxxxxxx")

testKey, or macKey or hopfieldKey or so.
Btw, if you make this a string it can be defined as const. Just pass a string to computeMAC and cast to []byte internally.


router/dataplane_internal_test.go line 424 at r4 (raw file):

	ctrl := gomock.NewController(t)
	defer ctrl.Finish()
	key := []byte("testkey_xxxxxxxx")

Use global key instead.


router/dataplane_internal_test.go line 505 at r4 (raw file):

			dp := tc.prepareDP(ctrl)
			input := tc.mockMsg()
			result, err := dp.ProcessPkt(tc.srcInterface, input)

If we invoke the processor / slow processor directly, we can avoid the dance with the ipv4 message and the test-exported dp.ProcessSlowPath.
I'm not entirely sure if this really ends up as more readable & maintanable. Please take another look and decide.


router/dataplane_internal_test.go line 513 at r4 (raw file):

				cause:    result.SlowPathRequest.cause,
			}
			assert.Equal(t, tc.expectedSlowPathRequest, selectedResult)

Just result.SlowPathRequest instead of building selectedResult.


router/dataplane_internal_test.go line 521 at r4 (raw file):

			packet := gopacket.NewPacket(result.OutPkt, slayers.LayerTypeSCION, gopacket.Default)
			scmpLayer := packet.Layer(slayers.LayerTypeSCMP)
			scmp := scmpLayer.(*slayers.SCMP)

Condense

Suggestion:

			scmp := packet.Layer(slayers.LayerTypeSCMP).(*slayers.SCMP)

router/dataplane_internal_test.go line 532 at r4 (raw file):

				}
			}
			assert.True(t, layerFound)

Use packet.Layer() to find the expected layer. Returns nil if not found.

Suggestion:

			assert.NotNil(t, packet.Layer(tc.expectedLayerType))

router/dataplane_internal_test.go line 540 at r4 (raw file):

	t.Helper()
	ret := &ipv4.Message{}
	spkt.Path = dpath

Remove this effectively unused dpath parameter.

Copy link
Contributor Author

@rohrerj rohrerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 5 files reviewed, 7 unresolved discussions (waiting on @matzf)


router/dataplane_internal_test.go line 46 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

testKey, or macKey or hopfieldKey or so.
Btw, if you make this a string it can be defined as const. Just pass a string to computeMAC and cast to []byte internally.

I renamed it to testKey. The problem with making this a string is that it is not only used for computeMAC but also for the NewDP function and since we use it at multiple places and everywhere in form of a byte array, I think it makes sense to leave it like that.


router/dataplane_internal_test.go line 424 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Use global key instead.

Done.


router/dataplane_internal_test.go line 505 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

If we invoke the processor / slow processor directly, we can avoid the dance with the ipv4 message and the test-exported dp.ProcessSlowPath.
I'm not entirely sure if this really ends up as more readable & maintanable. Please take another look and decide.

Since we moved it to the internal test I think it makes sense to change that.


router/dataplane_internal_test.go line 513 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Just result.SlowPathRequest instead of building selectedResult.

Done.


router/dataplane_internal_test.go line 521 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Condense

Done.


router/dataplane_internal_test.go line 532 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Use packet.Layer() to find the expected layer. Returns nil if not found.

Done.


router/dataplane_internal_test.go line 540 at r4 (raw file):

Previously, matzf (Matthias Frei) wrote…

Remove this effectively unused dpath parameter.

Done.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @rohrerj)

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @rohrerj)


router/dataplane.go line 139 at r6 (raw file):

	errPeeringNonemptySeg2        = serrors.New("non-zero-length segment[2] in peering path")
	errShortPacket                = serrors.New("Packet is too short")
	errBFDSessionDown             = serrors.New("bfd session down")

Bad merge, you wanted to change this all to errors.New.


router/dataplane.go line 820 at r6 (raw file):

	drkeyProvider drkeyProvider

	peering bool

I see this is just copied over from the "scionPacketProcessor", but I think here this structure is (even more) sub-optimal. There is only a single place that uses this variable. I think this would be much clearer if we'd just move the determinePeer call into prepareSCMP, where this is used.
(non-blocking -- it's okay-ish for now)


router/dataplane.go line 2119 at r6 (raw file):

		return nil, serrors.Wrap(cannotRoute, err, "details", "reversing path for SCMP")
	}
	revPath := revPathTmp.(*scion.Decoded)

See comment on slowPathPacketProcessor.peering; add peering, err := determinePeer(revPath) e.g. here.

Copy link
Contributor Author

@rohrerj rohrerj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 5 files reviewed, 2 unresolved discussions (waiting on @matzf)


router/dataplane.go line 139 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

Bad merge, you wanted to change this all to errors.New.

Done.


router/dataplane.go line 820 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

I see this is just copied over from the "scionPacketProcessor", but I think here this structure is (even more) sub-optimal. There is only a single place that uses this variable. I think this would be much clearer if we'd just move the determinePeer call into prepareSCMP, where this is used.
(non-blocking -- it's okay-ish for now)

Done.


router/dataplane.go line 2119 at r6 (raw file):

Previously, matzf (Matthias Frei) wrote…

See comment on slowPathPacketProcessor.peering; add peering, err := determinePeer(revPath) e.g. here.

I had to make some changes to determinePeer() because revPath was not of type scion.Raw. But now it should work.

Copy link
Contributor

@matzf matzf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @rohrerj)


router/dataplane.go line 820 at r6 (raw file):

Previously, rohrerj wrote…

Done.

💯

@matzf matzf merged commit b4a2011 into scionproto:master Sep 28, 2023
1 check passed
jiceatscion pushed a commit to jiceatscion/scion that referenced this pull request Sep 28, 2023
Add a low-priority slow path for special case packet handling, in
particular SCMP errors and traceroutes, for the the asynchronous packet
processing pipeline in the router added in scionproto#4351.

This is implementation part three of three described in the design
document (scionproto#4339, doc/dev/design/BorderRouter.rst).

Closes scionproto#4334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants