-
Notifications
You must be signed in to change notification settings - Fork 107
Description
When I remove a secondary NIC from one of my SmartOS systems, the admin NIC doesn't get fully configured in the GZ. The global zone doesn't come online; however VM's configured on that same admin NIC do show up on the network.
From reading the code (it's a headless system build from random consumer-grade parts I had lying around, so no IPMI to easily debug the problem in-situ while the admin NIC is down) my problem is probably because the NIC I removed was part of an aggr with a custom MTU set in /usbkey/config, and configuring this MTU happens before the admin NIC is initialized, and if configuring that MTU fails we exit with a fatal error.
Note that not being able to create the aggr in itself does not seem to trigger an immediate exit; only when a custom MTU is configured we end with a fatal error. It would probably be nice that if configuring some other NIC failed, the admin NIC would still be fully configured to make the GZ at least accessible over the network?
Context: This is where the aggr setup and MTU setup happens before the admin NIC configuration:
illumos-joyent/usr/src/cmd/svc/milestone/net-physical
Lines 476 to 482 in 5760e8d
| # Create aggregations | |
| create_aggrs | |
| # Make any mtu adjustments that may be necessary | |
| setup_mtu | |
| # Setup admin NIC |
To fix this, we can perhaps skip trying to set the MTU on the aggr if creating that aggr failed, since that in itself is not an fatal error (apparently). Would be as simple as moving the MTU part into the if-check above it here:
illumos-joyent/usr/src/cmd/svc/milestone/net-physical
Lines 277 to 289 in 5760e8d
| echo "Creating aggr: ${aggr} (mode=${mode}, links=${links})" | |
| dladm create-aggr -l ${links//,/ -l } -L ${mode} ${aggr} | |
| if [[ $? -eq 0 ]]; then | |
| add_active_aggr_links ${aggr} ${macs} | |
| fi | |
| if [[ -n "$mtu" ]]; then | |
| dladm set-linkprop -p mtu=${mtu} ${aggr} | |
| if [[ $? -ne 0 ]]; then | |
| echo "Failed to set mtu on aggr ${aggr} to ${mtu}" | |
| exit $SMF_EXIT_ERR_FATAL | |
| fi | |
| fi |
(And perhaps something can be said for also moving setup_mtu so that MTU failures don't impact the admin interface being brought up, though that can also be a separate change.)