Skip to content

Conversation

@DanielG
Copy link
Contributor

@DanielG DanielG commented Mar 25, 2020

Currently e131 has the ip option which also allows specifying an interface. This has very fuzzy behaviour though, if the interface is not up e131 will pick, basically, a random interface (the first configured, non-loopback interface).

I'm integrating OLA into an appliance for a customer where we do not want this behaviour, so I introduce a new interface option which binds to exactly the specified interface regardless of it being up or down. We do this by using the interface-index as the identifier for the interface rather than it's IP address. I also introduce a flag for the InterfacePickler not to filter interfaces based on their up/down status.

This code has only been tested on Linux so far, testing on other platforms would be appreciated. I'm especially unsure about whether platforms that don't support interface indexes exist and if we have to keep a fallback to IP addresses for them. From the ifdefs in PosixInterfacePicker.cpp it does look like this would be possible in theory but I don't know if such platforms actually exist.

FYI: I have a backport to the 0.10.x branch ready to go too.


New docs copied below:

ip = [a.b.c.d|<interface_name>]
The IP address or interface name to bind to. If not specified or no
matching interface is found the first "up" or configured non-loopback
interface is used. Prefer interface if this is not desired.

interface = <interface_name>
The name of the interface to bind to. Overrides the "ip" option if given.
Unlike the "ip" option this will use the specified interface whenever
possible at all, especially even if it is down. If the interface does not
exist an error is thrown rather than picking just any interface.

DanielG added 3 commits March 25, 2020 08:58
This works for unconfigured interfaces without an IP.

TODO: Some platforms don't have iifidx, at least the ifdefing in the
interface picker allows for it. Fall back to IP in that case.
@peternewman
Copy link
Member

Hi Daniel,

Firstly thanks for this and for starting what is hopefully a steady flow of OLA contributions. 😄

Some general conceptual comments and questions to start off with, as opposed to a traditional code review.

Currently e131 has the ip option which also allows specifying an interface. This has very fuzzy behaviour though, if the interface is not up e131 will pick, basically, a random interface (the first configured, non-loopback interface).

I'm integrating OLA into an appliance for a customer where we do not want this behaviour, so I introduce a new interface option which binds to exactly the specified interface regardless of it being up or down. We do this by using the interface-index as the identifier for the interface rather than it's IP address. I also introduce a flag for the InterfacePickler not to filter interfaces based on their up/down status.

What's the reason for using interface index rather than continuing with the IP/interface name combo? It seems rather user-unfriendly in comparison?

To look at this from an alternative perspective, rather than having two competing ways of specifying an interface (and based on the readme you're not setting the interface via ID in the config file), could this not be achieved by instead adding an option to the plugin which sets the specific_only option from InterfacePicker? Possibly making it a three way option for fallback, fail or allow down?

This code has only been tested on Linux so far, testing on other platforms would be appreciated. I'm especially unsure about whether platforms that don't support interface indexes exist and if we have to keep a fallback to IP addresses for them. From the ifdefs in PosixInterfacePicker.cpp it does look like this would be possible in theory but I don't know if such platforms actually exist.

The code for the NetworkResponder has been in place for some time now, and uses interface indexes, so we're hopefully safe, but as above I'm not sure this particular change is necessary.

FYI: I have a backport to the 0.10.x branch ready to go too.

I've been trying to keep 0.10.x as bug-fixes, I think this is more feature than fix.

As a side question, if the interface is down, have you tested what happens if OLAd tries to send to it?

See also this issue:
#1546

I suspect the long term fix is to dynamically see interfaces appearing and connect to them.

@DanielG
Copy link
Contributor Author

DanielG commented Mar 25, 2020 via email

@peternewman
Copy link
Member

Currently e131 has the ip option which also allows specifying an interface. This has very fuzzy behaviour though, if the interface is not up e131 will pick, basically, a random interface (the first configured, non-loopback interface).

What's the reason for using interface index rather than continuing with the IP/interface name combo? It seems rather user-unfriendly in comparison?

Oh, I'm sorry, I should have been more clear. I was just talking about the implementation. The interface option of course takes an interface name :)

Ah okay.

To look at this from an alternative perspective, rather than having two competing ways of specifying an interface (and based on the readme you're not setting the interface via ID in the config file), could this not be achieved by instead adding an option to the plugin which sets the specific_only option from InterfacePicker?

I mean I guess, but I think it's much more intuitive to bind to an interface with an option called "interface", not "ip. You have to admit that's pretty obscure :) I honestly don't think any of this legacy behaviour is really useful.

My main concern is if we have two different ways of specifying which IP/interface to use, it gets very confusing, which is the preferred one if they're both set etc. I appreciate that the current name of "ip" for the setting is a bit confusing, but I think the solution is to potentially migrate to a different name and continue the overloading, rather than the two options thing. Much in the same way as ideally you'd hope you might be able to use a hostname/FQDN or IP in a field in a program and get the same result.

Possibly making it a three way option for fallback, fail or allow down?

I think that'd just be unnecessarily complicated. What use-cases do you actually see where silently doing something random with your network configuration is a good thing? Just sounds like a security problem waiting to happen to me.

It does log it, so it's not entirely silent.

It's a different use case though, if you take a standard Raspberry Pi and move the SD card from one machine to another, the persistent network stuff will mean the second Pi has a different network interface name. Likewise if you do it by DHCP obviously the IP will change. I fully appreciate in a permanent install or an appliance you probably don't want it spontaneously changing, but for a lot of people just experimenting, you just want Art-Net or E1.31 to work on the connected interfaces.

This code has only been tested on Linux so far, testing on other platforms would be appreciated. I'm especially unsure about whether platforms that don't support interface indexes exist and if we have to keep a fallback to IP addresses for them. From the ifdefs in PosixInterfacePicker.cpp it does look like this would be possible in theory but I don't know if such platforms actually exist.

The code for the NetworkResponder has been in place for some time now, and uses interface indexes, so we're hopefully safe, but as above I'm not sure this particular change is necessary.

Interface indecies are absolutely necessary. There is no real other way to reference an interface in the face of changing or missing IP address otherwise.

Surely an interface name will achieve the same thing as an index? Admittedly I haven't looked into the finer detail of how interface aliases with alternative IPs behave on Linux for example compared to indices.

FYI: I have a backport to the 0.10.x branch ready to go too. I've been trying to keep 0.10.x as bug-fixes, I think this is more feature than fix.

I would actually consider ip = iface not using iface in some circumstances a bug.

It's the documented behaviour and the log mirrors it. I can see your point, but it feels like a big behavioural change and to the core of the code which currently hasn't been widely tested on the huge array of platforms and OSes we support to just drop in as a bug fix.

Depending on how long you're expecting to keep master unreleased though I would seriously consider thinking of this as more of a bug.

As soon as we've got 0.10.8 out with Python 3 fixed, we'll likely move onto master as 0.11.0.

As a side question, if the interface is down, have you tested what happens if OLAd tries to send to it?

Yeah some function further down in the e131 stack returns an error and that gets logged but otherwise ignored so everything just keeps chugging along, which is the right behaviour here I think.

Yeah, again I'd wonder from a support perspective if the interface should present when it's down, that doesn't match the USB behaviour and could potentially be confusing (why's it not sending when I've patched it).

See also this issue: #1546 I suspect the long term fix is to dynamically see interfaces appearing and connect to them.

That and IPv6 support would be great yeah. OLA just seems to have a very '90s ish attitude to networking, "IP address don't change" and "there is exactly one IP per interface" are such old-school assumptions ;)

Well I need to add an IPv6 class for some of the E1.33/LLRP support, which will be a step in the right direction. Hopefully adding more support after that won't be too tough.

I think the netlink listener will catch the IP address not changing issues, although as mentioned in the other issue, I think firing a NOHUP at us when an interface changes would be a potentially suitable workaround in the short term.

Where do you feel we've got a limit of one IP per interface?

Anyway patches are certainly welcome for all these other features too.

@DanielG
Copy link
Contributor Author

DanielG commented Mar 27, 2020 via email

@peternewman
Copy link
Member

On Fri, Mar 27, 2020 at 08:28:21AM -0700, Peter Newman wrote: My main concern is if we have two different ways of specifying which IP/interface to use, it gets very confusing, which is the preferred one if they're both set etc.

That's actually right there in the docs: "Overrides the "ip" option if given." the code also prints a warning about this :)

But that means you've got to read the docs or the logs, if it's just one overriden field, all that stuff goes away. Even more so if/when we add some dynamic UI which matches the existing preferences, we don't need to add some magic way to allow you to select only one of the two options or deal with validating just one at a time. The config behaviour becomes explicit. I realise you do have to get your head around the fact you can enter an IP or an interface name in the same field, but I feel that has less edge cases than dealing with two duplicate fields.

I guess there's no real reason not to just error out if both ip and interface are set. Then there's no opportunity for confusion. I generally prefer erroring out over continuing in ambigous cases but so far my impression what that that wasn't the done thing in OLA.

I assume you're referring to the fallback to a default IP? I'd suggest there's a difference between picking a sensible default if something isn't set or is set to an invalid value (while still accepting your point of view) and the ambiguity that arises from being able to set two conflicting settings.

I appreciate that the current name of "ip" for the setting is a bit confusing, but I think the solution is to potentially migrate to a different name and continue the overloading, rather than the two options thing. Much in the same way as ideally you'd hope you might be able to use a hostname/FQDN or IP in a field in a program and get the same result.

I see your point and indeed if specifying an IP address to select the multicast interface were a good idea I'd be inclined to agree we should continue overloading them, but I'm actually starting to think we should just deprecate the use of IP address to identify the multicast interface completely because of a point you bring up below: DHCP.

We should perhaps encourage people to use an interface name by default rather than an IP when the protocol is multicast, but given we have non-multicast protocols too (e.g. Art-Net), I think it's best to offer both sets of options everywhere rather than artificially restrict the available methods for some plugins.

It's a different use case though, if you take a standard Raspberry Pi and move the SD card from one machine to another, the persistent network stuff will mean the second Pi has a different network interface name.

To be fair, on these sorts of installations I'd consider it pretty unlikely that people are messing with the ip setting in the first place because you are likely to only have one network interface to begin with, no?

Probably, although a Pi could do Wi-Fi (a few people have asked about this for DMX in the past) and wired, or a USB NIC.

Likewise if you do it by DHCP obviously the IP will change.

Honestly what even is the point of matching on IPs here? It really just makes things complicated in the face of changing addresses. For unicast an explicit IP makes senes because you can have interfaces with a number of addresses but want to select a specific one -- but for multicast there really is no point in going more granular than an interface!

Yeah it was a bit of a random comment. But a little bit towards the fact that in a DHCP scenario you might want it to just find an active interface (e.g. on my laptop I may be wired or wireless, I just want the plugins to use the active interface).

I fully appreciate in a permanent install or an appliance you probably don't want it spontaneously changing, but for a lot of people just experimenting, you just want Art-Net or E1.31 to work on the connected interfaces.

I totally get that, but I think the right thing to do to make that use-case easy would be send packets on all multicast capable interfaces by default. That way you wouldn't even have to touch the config unless you want to restrict where packets get sent explicitly. IMO it's really a failing of the OS APIs that something called _multi_cast can only send on a single interface at a time..

I'd certainly welcome sending on all interfaces as a separate PR. Although I think if we're going to support multiple interfaces simultaneously (which is essentially what that would entail), then presenting them as multiple devices is the ideal solution, as it then enables for example mapping between two networks and potentially changing properties between them, or a play/record setup.

Interface indecies are absolutely necessary. There is no real other way to reference an interface in the face of changing or missing IP address otherwise.

Surely an interface name will achieve the same thing as an index? Admittedly I haven't looked into the finer detail of how interface aliases with alternative IPs behave on Linux for example compared to indices.

At the OS syscall level everything is about interface indices (or IP addresses, but that's mostly discouraged these days). The syscalls that actually configure the IP stack don't take interface names but interface-indices.

Yeah fair enough, I think we're probably both agreeing, but I think we should use the interface name within settings, and map it to the interface index before we do a syscall.

I would actually consider ip = iface not using iface in some circumstances a bug.

It's the documented behaviour

Is it documented? Sure wasn't in the option docs, I added that bit. Are there some other docs I haven't seen?

Possibly not as explicitly as I first thought. It's noted here:
http://docs.openlighting.org/ola/doc/latest/classola_1_1network_1_1_interface_picker.html

and the log mirrors it.

I'm just going to trust you on that one I don't remember seeing logs about it shrug.

It's not as explicit as it could be there it seems, the key bit is here if you want to add something to your PR:
https://github.com/OpenLightingProject/ola/blob/master/common/network/InterfacePicker.cpp#L134-L138

I can see your point, but it feels like a big behavioural change and to the core of the code which currently hasn't been widely tested on the huge array of platforms and OSes we support to just drop in as a bug fix.

That's a fair point. As is it probably doesn't even work on windows yet :)

Cool, glad we're on the same wavelength there.

As a side question, if the interface is down, have you tested what happens if OLAd tries to send to it?

Yeah some function further down in the e131 stack returns an error and that gets logged but otherwise ignored so everything just keeps chugging along, which is the right behaviour here I think. Yeah, again I'd wonder from a support perspective if the interface should present when it's down, that doesn't match the USB behaviour and could potentially be confusing (why's it not sending when I've patched it).

Hmm, yeah I guess we could not register the device until it comes up. Not sure how to do that though. Getting notifications about interfaces going up/down is even more tricky across platforms than the rest of the networking calls. Probably above my paygrade. ;) Honestly given that the logs will be spammed with failed to send messages I don't really think this would be a huge problem in practice. It would really just need a cursory reading of the logs to see whats going on.

Yeah, hence my proposal of the fallback as it currently does where it doesn't register the device. Sending a NOHUP or reloading plugins would then find it the next time around. Or as I proposed adding an option (which defaults to off) to create interfaces which are down.

Unfortunately through bitter experience I know people don't read the logs most of the time, so having sane defaults is the key.

Where do you feel we've got a limit of one IP per interface?

It's not so much of a limit as it is assuming an interface has an IP at all. For example by passing an IP address to IP_MULTICAST_IF we assume it's got one! It could very well be unconfigured and have zero though. That's where interface-indices come to the rescue.

Ah yeah gotcha.

So looking at moving this forward, I think your PR does the following, and my comments on each part in summary. After the discussion so far, I don't think you've particularly managed to convince me to change my mind sorry. So if you can either change it to match the comments below, or alternatively open a new PR with just the bits we agree on in it and we can get them merged and then you can try and convince me of the other bits!

  1. Add the ability to return down interfaces via InterfacePicker - No objection to this, apart from the down parameter should be optional (defaulting to false) for backwards compatibility, and we should add something to the Interface object to indicate it's status up/down
  2. SetMulticastInterface by Interface rather than IP - This seems a sensible change, but we should keep the old method available for backwards compatibility as a basic wrapper (and deprecate it)
  3. Add a force option to E1.31 plugin/node so it doesn't automatically find an alternative - No objection to the principal of this
  4. Add a separate interface option to set the interface which is used if set, never falls back to an alternative - As discussed above, I'm reluctant to do this as I think it will generate more confusion than it solves. We can look at coming up with a less confusing name for the setting though, but I think it should all be in one setting as it's just two different ways of telling it which NIC to use (and I want to keep the IP option for parity with other plugins).
  5. Make E1.31 plugin/node pick a down interface if forcing it - I think this should be optional and default to off to avoid confusion with how the rest of the system works, but I can see the potential utility for adding it as a workaround for properly detecting up/down interfaces. I still think this may as well be combined with the force option, so it either uses your preference but results in a a default interface if not, or results in no device if it's down, or results in the down one (and will then start working when it comes up). We can then potentially add this behaviour to the other plugins too.

@DanielG
Copy link
Contributor Author

DanielG commented Apr 6, 2020 via email

@peternewman
Copy link
Member

Sorry for going a bit quiet on this one for a while @DanielG !

On Thu, Apr 02, 2020 at 06:42:04PM -0700, Peter Newman wrote: 1. Add the ability to return down interfaces via InterfacePicker - No objection to this, apart from the down parameter should be optional (defaulting to false) for backwards compatibility

Backwards compatibility for who? Does ola have out-of-tree plugins? Otherwise I really don't see the point of this if everything in-tree builds fine.

There are a few random odd builds. It's basically only an = false or another wrapper. Having been bitten by this in the past, particularly with stuff like libftdi, and the amount of cotton wool we have to add around it, I'd rather not leave other people in the same situation. A number of other systems link against OLA, and may choose to use bits of our API such as this.

and we should add something to the Interface object to indicate it's status up/down

I don't think caching the interface state there is a good idea since that'd go back to the network fallacies I mentioned before -- the state can and will change. I could imagine having a method to check the up/down status on the spot instead, but getting that supported for all platforms would be significant extra engineering work I'm not really prepared to do just to get this merged. Seems like this should be future work to me

Fair point, but how do we handle and report on this information in the meantime? Is the compromise to add this method with a note in the documentation saying it may be stale, and when the new magic function comes along we can replace this with a call to that function.

Or something similar to this function where we effectively have two returns, the data and whether we've been able to fetch it. In the case of this, it could return the data and true where we can, or false where we can't and show unknown in any GUI:
http://docs.openlighting.org/ola/doc/latest/classola_1_1rdm_1_1_network_manager.html#a6a45d7eef57f14158675f03fd2cee442

  1. SetMulticastInterface by Interface rather than IP - This seems a sensible change, but we should keep the old method available for backwards compatibility as a basic wrapper (and deprecate it)

I'm actually hoping we can just support all platforms with iidxes. Windos' networking stack, which is derrived from the BSDs, has that 0.x.x.x/8 trick, so I'm hoping the real BSDs (including OSX) support that too and on Linux it isn't a problem anyways.

That's great, I'd still like to keep the wrapper for compatibility though.

  1. Add a separate interface option to set the interface [...] but I think it should all be in one setting as it's just two different ways of telling it which NIC to use (and I want to keep the IP option for parity with other plugins).

My one gripe with that is that adding some boolean options to control this makes things a lot more complicated to document and understand. Consider that introducing a boolean option means you pull more logic into that single ip option. I see your point about overloading the option with the ability to specify interface and IP as well as keeping the ip name for consistency across plugins though. To move this forward, how about using the syntax ip=interface:<iface> instead of a separate option. I can live with overloading the ip option like that since at least it's symbolic and that should satisfy your request to not introduce a new option too. I could also imagine having something like ip=force-interface:<iface> for the other behaviour I want instead of having a boolean there.

The problem is that breaks all the existing documentation/notes/behaviour! Which is many and varied (even on our website).

Also I'd like to try and reduce the amount of overriding and this is adding more to it, not less. If we add another option, we can present it in a future GUI as a simple tickbox, we can't do that for this. This is no more overriding of the IP field (and any future validation), than having to deal with an IP or a FQDN or an interface name is.

  1. Make E1.31 plugin/node pick a down interface if forcing it - I think this should be optional and default to off to avoid confusion with how the rest of the system works, but I can see the potential utility for adding it as a workaround for properly detecting up/down interfaces.

I think it makes sense to just conflate the use-it-if-it's-down behavior with force-interface for now. Once we have proper support for dynamically detecting interfaces we can simply remove that behaviour since it would just be subsumed anyways.

The reason I think this bit needs a second boolean is around the plugin behaviour. Until we can dynamically track the status of an interface, then setting force should produce no interface if it's currently down (with a HUP/reload of plugins allowing it to appear, e.g. via an if up/down script). Apart from that doesn't fit your particular use case, but is hugely less confusing for most people and generates less "support" requests.

I still think this may as well be combined with the force option, so it either uses your preference but results in a a default interface if not, or results in no device if it's down, or results in the down one (and will then start working when it comes up). We can then potentially add this behaviour to the other plugins too.

I would much prefer conflating the use-it-if-it's-down behavior with forcing an interface. If I'm reading that right you want that to be a separate toggle though? Adding this to the other network plugins seems easy enough once we can finally agree on how to do it ;)

Indeed it should just be a few mechanical edits, hence why I'd like to spend a bit more time getting it right once before we roll it out across everything.

Copy link

@Saputra0905 Saputra0905 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants