Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NIC management strategy #402

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

a-crate
Copy link
Contributor

@a-crate a-crate commented Jul 17, 2024

See commit messages for details. Short version: bring back OS rules with a configuration option to ignore them.
Stop removing the default configuration on Debian 12, write a config with a higher priority.
Add a network recovery strategy, executed when then MDS is inaccessible on startup. Remove all guest agent configuration and go back to the OS default, exit with a failure if the MDS is still inaccessible with the default configuration.

/hold
/cc @ChaitanyaKulkarni28 @drewhli

@a-crate
Copy link
Contributor Author

a-crate commented Jul 17, 2024

Passes CIT network test:

2024/07/17 19:06:13 finished test network/debian-12-netplan-newagent (ID 187rw) in project acrate-dev, time spent: 26m 24s
2024/07/17 19:06:13 cleaning up after test network/debian-12-netplan-newagent (ID 187rw) in project acrate-dev, progress: total: 1, running: 0, finished: 1, delta: 0
2024/07/17 19:07:11 test network/debian-12-netplan-newagent had 16 leftover resources
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-wisboqeuj6al4v4p26vj2d3n from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-sdx6cy5czikkcypafr5rrpke from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-ba5h4uquy4cktbsldj6ba2g3 from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-ba5h4uquy4cktbsldj6ba2g3-v6 from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-ygmk6bsxwqwslgva56uurekb from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-odm2l5xrgkhkgaqqdwsg4qbt from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-1-network-187rw-e7mxb3iqxpacep2yufaua5xe from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-ygmk6bsxwqwslgva56uurekb from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-ba5h4uquy4cktbsldj6ba2g3 from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-odm2l5xrgkhkgaqqdwsg4qbt from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-e7mxb3iqxpacep2yufaua5xe from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-wisboqeuj6al4v4p26vj2d3n from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-ba5h4uquy4cktbsldj6ba2g3-v6 from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/firewalls/network-2-network-187rw-sdx6cy5czikkcypafr5rrpke from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/networks/network-1-network-187rw from test network/debian-12-netplan-newagent
2024/07/17 19:07:11 deleted resource projects/acrate-dev/global/networks/network-2-network-187rw from test network/debian-12-netplan-newagent
<?xml version="1.0" encoding="UTF-8"?>
<testsuites time="312.600" tests="10" skipped="1">
	<testsuite name="network-debian-12-netplan-newagent" tests="10" failures="0" errors="0" id="0" skipped="1" time="312.600">
		<testcase name="TestDHCP" classname="" time="0.000">
			<skipped message="Skipped"><![CDATA[    dhcp_test.go:45: DHCP test not supported on: projects/acrate-dev/global/images/debian-12-netplan-newagent]]></skipped>
		</testcase>
		<testcase name="TestDefaultMTU" classname="" time="0.000"></testcase>
		<testcase name="TestSendPing" classname="" time="190.990"></testcase>
		<testcase name="TestNTP" classname="" time="0.040"></testcase>
		<testcase name="TestAliases" classname="" time="30.010"></testcase>
		<testcase name="TestAliasAfterReboot" classname="" time="30.010"></testcase>
		<testcase name="TestAliasAgentRestart" classname="" time="61.550"></testcase>
		<testcase name="TestGVNIC" classname="" time="0.000"></testcase>
		<testcase name="TestWaitForPing" classname="" time="0.000"></testcase>
		<testcase name="TestStaticIP" classname="" time="0.000">
			<system-out><![CDATA[    static_ip_test.go:68: found addr 10.128.0.2/32 on interface 0
    static_ip_test.go:68: found addr 192.168.0.3/32 on interface 1]]></system-out>
		</testcase>
	</testsuite>
</testsuites>

@a-crate
Copy link
Contributor Author

a-crate commented Jul 17, 2024

Routing is functional but not correct, there are martian packets coming from the MDS in kernel logs with this build.

@a-crate a-crate changed the title Fix netplan configuration on Debian 12. Update NIC management strategy Jul 22, 2024
@a-crate
Copy link
Contributor Author

a-crate commented Jul 23, 2024

I'll re-request review when I'm ready, want to make some more changes.

@google-oss-prow google-oss-prow bot removed the lgtm label Jul 25, 2024
@a-crate
Copy link
Contributor Author

a-crate commented Jul 30, 2024

Updated rollback strategy, NetworkManager and Netplan were not applying changes on rollback. Added unit test for FallbacktoDefault.

@a-crate a-crate force-pushed the netplan branch 2 times, most recently from f1c2aee to e8c4b80 Compare July 31, 2024 03:44
@a-crate
Copy link
Contributor Author

a-crate commented Jul 31, 2024

Reworked, see commit message. MDS reachability is WAI (only on primary NIC).

2024/07/31 02:37:17 Running in project compute-image-test-pool-001 zone us-central1-a. Tests will run in projects: [compute-image-test-pool-001]
2024/07/31 02:37:17 using -filter ^(network|loadbalancer)$
2024/07/31 02:37:17 Add test workflow for test loadbalancer on image projects/acrate-dev/global/images/debian-12-netplan-newagent
2024/07/31 02:37:18 Add test workflow for test network on image projects/acrate-dev/global/images/debian-12-netplan-newagent
2024/07/31 02:37:18 Done with setup
2024/07/31 02:37:20 Storing artifacts and logs in gs://compute-image-test-pool-001-cloud-test-outputs/2024-07-31T02:37:20Z
2024/07/31 02:37:20 running test loadbalancer/debian-12-netplan-newagent (ID yrn5w) in project compute-image-test-pool-001, progress: total: 2, running: 1, finished: 0, delta: 2
2024/07/31 02:38:20 running test network/debian-12-netplan-newagent (ID t82st) in project compute-image-test-pool-001, progress: total: 2, running: 2, finished: 0, delta: 2
2024/07/31 02:44:32 finished test loadbalancer/debian-12-netplan-newagent (ID yrn5w) in project compute-image-test-pool-001, time spent: 07m 11s
2024/07/31 02:44:32 cleaning up after test loadbalancer/debian-12-netplan-newagent (ID yrn5w) in project compute-image-test-pool-001, progress: total: 2, running: 1, finished: 1, delta: 1
2024/07/31 02:50:14 finished test network/debian-12-netplan-newagent (ID t82st) in project compute-image-test-pool-001, time spent: 11m 54s
2024/07/31 02:50:14 cleaning up after test network/debian-12-netplan-newagent (ID t82st) in project compute-image-test-pool-001, progress: total: 2, running: 0, finished: 2, delta: 0
<?xml version="1.0" encoding="UTF-8"?>
<testsuites time="1172.600" tests="16" skipped="1">
        <testsuite name="loadbalancer-debian-12-netplan-newagent" tests="6" failures="0" errors="0" id="0" time="794.110">
                <testcase name="TestL3Backend" classname="" time="76.150"></testcase>
                <testcase name="TestL3Backend" classname="" time="79.650"></testcase>
                <testcase name="TestL3Client" classname="" time="108.250"></testcase>
                <testcase name="TestL7Backend" classname="" time="146.580"></testcase>
                <testcase name="TestL7Backend" classname="" time="149.640"></testcase>
                <testcase name="TestL7Client" classname="" time="233.840"></testcase>
        </testsuite>
        <testsuite name="network-debian-12-netplan-newagent" tests="10" failures="0" errors="0" id="0" skipped="1" time="378.490">
                <testcase name="TestDHCP" classname="" time="0.000">
                        <skipped message="Skipped"><![CDATA[    dhcp_test.go:45: DHCP test not supported on: projects/acrate-dev/global/images/debian-12-netplan-newagent]]></skipped>
                </testcase>
                <testcase name="TestDefaultMTU" classname="" time="0.000"></testcase>
                <testcase name="TestSendPing" classname="" time="256.500"></testcase>
                <testcase name="TestNTP" classname="" time="0.030"></testcase>
                <testcase name="TestAliases" classname="" time="30.010"></testcase>
                <testcase name="TestAliasAfterReboot" classname="" time="30.010"></testcase>
                <testcase name="TestAliasAgentRestart" classname="" time="61.940"></testcase>
                <testcase name="TestGVNIC" classname="" time="0.000"></testcase>
                <testcase name="TestWaitForPing" classname="" time="0.000"></testcase>
                <testcase name="TestStaticIP" classname="" time="0.000">
                        <system-out><![CDATA[    static_ip_test.go:68: found addr 10.128.0.3/32 on interface 0
    static_ip_test.go:68: found addr 192.168.0.3/32 on interface 1]]></system-out>
                </testcase>
        </testsuite>
</testsuites>

@a-crate
Copy link
Contributor Author

a-crate commented Jul 31, 2024

/unhold

@a-crate a-crate force-pushed the netplan branch 2 times, most recently from 678686a to e538357 Compare July 31, 2024 17:35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it doesn't seem we're using osConfigRule, we can probably remove it, along with the no-op test for TestFindOSRule in manager_test.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, forgot to remove osConfigRule. I think the test is already gone though. PTAL.

Don't delete default netplan config on Debian 12. Write netplan configs to /run/netplan. Don't manage primary NIC unless the user has configured the new manage_primary_nic toggle.  Disable use-domains on secondary NICs with netplan. This avoids making the MDS reachable over secondary NICs instead of primary NICs when managing the primary interface. If the MDS is inaccessible during startup, rollback all agent configuration on all interfaces and try again. Apply changes during netplan and network manager rollbacks.
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-crate, ChaitanyaKulkarni28, drewhli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ChaitanyaKulkarni28,a-crate,drewhli]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 3aa08e9 into GoogleCloudPlatform:main Jul 31, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants