Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After run tcpdump.py in Rhel8.2 with Avocado(version: 80.0), it suspends. #1863

Open
Gene-Lo opened this issue Aug 19, 2020 · 8 comments
Open
Assignees

Comments

@Gene-Lo
Copy link

Gene-Lo commented Aug 19, 2020

After run tcpdump.py in Rhel8.2 with Avocado(version: 80.0), it suspends. Take below results for example:

【Example 1】
※Network Card: Marvell_QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER)
※SOL log:
image
※job-log:
20200816-Network9-tcpdump-job.log
※Configuration:
20200816-Network9-Configuration.zip

【Example 2】
※Network Card: Marvell_2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T)
※SOL log:
image
※job-log:
20200817-Network7-tcpdump-job.log
※Configuration:
20200817-Network7-Configuration.zip

【Example 3】
※Network Card: Mellanox_2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN)
※SOL log:
image
※job-log:
20200818-Network6-tcdump-job.log
※Configuration:
20200818-Network6-Configuration.zip

In the past, we also use tcpdump.py to test these Network Cards in Rhel7.6 and Rhel8.1, and all results were PASS. Please kindly check if it is script error in the latest version of tcpdump.py. Many thanks!!

@vaishnavibhat
Copy link
Contributor

Hi,
From the logs attached, I see that nping_count value is not set . We need to provide around 15-20 count more than than the count value. count is set to 100 by default. Can you please set the nping_value to 115 and try out the test?

2020-08-16 21:31:21,237 parameters       L0143 DEBUG| PARAMS (key=count, path=*, default=500) => 100
2020-08-16 21:31:21,237 parameters       L0143 DEBUG| PARAMS (key=nping_count, path=*, default=) => ''
2020-08-16 21:31:21,237 parameters       L0143 DEBUG| PARAMS (key=peer_ip, path=*, default=) => '192.168.10.2'
2020-08-16 21:31:21,237 parameters       L0143 DEBUG| PARAMS (key=peer_public_ip, path=*, default=) => '192.168.10.2'
2020-08-16 21:31:21,237 parameters       L0143 DEBUG| PARAMS (key=drop_accepted, path=*, default=10) => 10
2020-08-16 21:31:21,238 parameters       L0143 DEBUG| PARAMS (key=host_ip, path=*, default=) => '192.168.10.1'
2020-08-16 21:31:21,238 parameters       L0143 DEBUG| PARAMS (key=option, path=*, default=) => ''
2020-08-16 21:31:21,239 parameters       L0143 DEBUG| PARAMS (key=netmask, path=*, default=) => '255.255.255.0'

From the different screenshots attached, this is my explanation. I would recommend you to retry the tests with nping_count and see if its recreated.
Example 1) I see tcpdump with tcp option has succeeded and a hang is seen when udp option was tested. There might already be some tcp traffic in the machine
and we see this option has succeeded. When it came to udp, there was a hang as there is no udp traffic in the machine.
Example 2) From the log of Example 2, the test was actually running but very slow. When the test was aborted there were 64 packets captured which otherwise would have been zero packets captured.

2020-08-17 23:28:26,401 process          L0437 DEBUG| [stderr] tcpdump: listening on enP1p9s0f0, link-type EN10MB (Ethernet), capture size 262144 bytes
2020-08-17 23:28:57,745 process          L0437 DEBUG| [stderr] 64 packets captured```

@abdhaleegit
Copy link
Collaborator

I also seeing similar issues, this has happen after recent enhancement with tcpdump .. the test never completed after around 80 scenarios of 333 scenarios.. it is hung here

[36611.768878] bnx2x 0010:01:00.0 enP16p1s0f0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
(079/333) avocado-misc-tests/io/net/tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-IPv4_numeric-0292: PASS (81.79 s)
2020-08-27 04:23:27,047:op-test.common.OpTestUtil:try_sendcontrol:WARNING:OpTestSystem detected something, working on recovery
(080/333) avocado-misc-tests/io/net/tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-IPv4_numeric-9255: ^C^C
Interrupt requested. Waiting 2 seconds for test to finish (ignoring new Ctrl+C until then)
^M~.
2020-08-27 04:23:39,300:op-test.common.OpTestUtil:try_recover:WARNING:OpTestSystem detected something, working on recovery
command-line line 0: Unsupported option "afstokenpassing"^M
This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Activate the web console with: systemctl enable --now cockpit.socket

@Gene-Lo
Copy link
Author

Gene-Lo commented Sep 21, 2020

Dear @vaishnavibhat

After we edit the yaml file, and then run tcpdump.py again in Rhel8.2 with Avocado(version: 82.0), it doesn't suspend. However, some items show "ERROR: Could not get operational link state".

【Test Step】
Step 1. Prepare 2 terminals while each one is equipped with 1 Network-Card. Then connect the 2 Network-Cards with each other.
※Network-Card: Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP

Step 2. Edit yaml file: /root/tests/tests/avocado-misc-tests/io/net/tcpdump.py.data/tcpdump.yaml
image

Step 3. Run tcpdump.py via command: avocado run tcpdump.py -m tcpdump.py.data/tcpdump.yaml
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
JOB ID : fddfeb5f01f284c237bd009e27f3123a6ace79e8
JOB LOG : /root/tests/results/job-2020-09-18T21.34-fddfeb5/job.log
(01/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-generic-ef26: PASS (50.44 s)
(02/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-generic-e768: ERROR: Could not get operational link state. (6.23 s)
(03/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-generic-3773: PASS (75.19 s)
(04/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-generic-2d59: PASS (74.44 s)
(05/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-generic-cc54: PASS (74.52 s)
(06/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-generic-9181: PASS (69.99 s)
(07/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-generic-2312: PASS (73.53 s)
(08/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-generic-38aa: PASS (73.46 s)
(09/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-generic-26e5: PASS (69.98 s)
(10/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-generic_host-9627: PASS (48.19 s)
(11/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-generic_host-d9f4: PASS (74.42 s)
(12/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-generic_host-2bfa: PASS (74.31 s)
(13/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-generic_host-1105: PASS (70.96 s)
(14/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-generic_host-99c5: ERROR: Could not get operational link state. (6.20 s)
(15/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-generic_host-80b3: PASS (76.26 s)
(16/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-generic_host-3886: ERROR: Could not get operational link state. (6.20 s)
(17/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-generic_host-9086: ERROR: Could not get operational link state. (8.92 s)
(18/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-generic_host-f4f7: ERROR: Could not get operational link state. (9.37 s)
(19/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-generic_src-e048: PASS (106.73 s)
(20/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-generic_src-68f9: PASS (116.69 s)
(21/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-generic_src-1256: PASS (122.23 s)
(22/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-generic_src-14c1: PASS (116.67 s)
(23/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-generic_src-2dfe: PASS (121.21 s)
(24/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-generic_src-d3eb: PASS (117.77 s)
(25/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-generic_src-c061: ERROR: Could not get operational link state. (6.17 s)
(26/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-generic_src-6b42: PASS (125.05 s)
(27/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-generic_src-7ddd: ERROR: Could not get operational link state. (6.19 s)
(28/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-generic_dst-ee71: PASS (107.28 s)
(29/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-generic_dst-ec1c: PASS (117.68 s)
(30/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-generic_dst-c58d: ERROR: Could not get operational link state. (6.05 s)
(31/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-generic_dst-f64c: ERROR: Could not get operational link state. (9.17 s)
(32/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-generic_dst-5201: PASS (117.72 s)
(33/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-generic_dst-2747: ERROR: Could not get operational link state. (6.17 s)
(34/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-generic_dst-f21c: PASS (119.45 s)
(35/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-generic_dst-2c71: ERROR: Could not get operational link state. (6.12 s)
(36/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-generic_dst-138c: ERROR: Could not get operational link state. (9.03 s)
(37/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-protocol_tcp-1905: PASS (59.03 s)
(38/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-protocol_tcp-ff15: ERROR: Could not get operational link state. (6.10 s)
(39/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-protocol_tcp-1ce1: PASS (78.33 s)
(40/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-protocol_tcp-d753: ERROR: Could not get operational link state. (6.16 s)
(41/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-protocol_tcp-5366: PASS (75.91 s)
(42/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-protocol_tcp-427f: ERROR: Could not get operational link state. (6.13 s)
(43/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-protocol_tcp-e933: PASS (77.51 s)
(44/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-protocol_tcp-3cd9: ERROR: Could not get operational link state. (6.13 s)
(45/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-protocol_tcp-3a9f: ERROR: Could not get operational link state. (9.10 s)
(46/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-protocol_udp-b14a: PASS (114.43 s)
(47/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-protocol_udp-fd6d: ERROR: Could not get operational link state. (6.18 s)
(48/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-protocol_udp-9371: ERROR: Could not get operational link state. (10.94 s)
(49/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-protocol_udp-bdd3: ERROR: Could not get operational link state. (9.16 s)
(50/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-protocol_udp-5f54: PASS (126.18 s)
(51/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-protocol_udp-aa51: ERROR: Could not get operational link state. (6.16 s)
(52/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-protocol_udp-192f: PASS (125.81 s)
(53/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-protocol_udp-e445: ERROR: Could not get operational link state. (6.08 s)
(54/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-protocol_udp-2fdc: ERROR: Could not get operational link state. (9.11 s)
(55/63) tcpdump.py:TcpdumpTest.test;run-mtu-1500-options-protocol_icmp-d70d: PASS (61.02 s)
(56/63) tcpdump.py:TcpdumpTest.test;run-mtu-2000-options-protocol_icmp-a4d3: ERROR: Could not get operational link state. (6.23 s)
(57/63) tcpdump.py:TcpdumpTest.test;run-mtu-3000-options-protocol_icmp-23ed: PASS (76.05 s)
(58/63) tcpdump.py:TcpdumpTest.test;run-mtu-4000-options-protocol_icmp-03aa: PASS (73.13 s)
(59/63) tcpdump.py:TcpdumpTest.test;run-mtu-5000-options-protocol_icmp-4146: ERROR: Could not get operational link state. (6.23 s)
(60/63) tcpdump.py:TcpdumpTest.test;run-mtu-6000-options-protocol_icmp-5857: PASS (79.38 s)
(61/63) tcpdump.py:TcpdumpTest.test;run-mtu-7000-options-protocol_icmp-a58e: PASS (73.40 s)
(62/63) tcpdump.py:TcpdumpTest.test;run-mtu-8000-options-protocol_icmp-950d: ERROR: Could not get operational link state. (6.19 s)
(63/63) tcpdump.py:TcpdumpTest.test;run-mtu-9000-options-protocol_icmp-860d: PASS (75.87 s)
RESULTS : PASS 37 | ERROR 26 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /root/tests/results/job-2020-09-18T21.34-fddfeb5/results.html
JOB TIME : 3479.48 s
↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

【Test log】: job log & Manual-Test-log
Test log.zip

【Configuration】
《SUT6》
[Rhel8.2 Kernel]
4.18.0-193.14.3.el8_2.ppc64le

[FW config]
BMC: op940.00.mih-5-0-g86f9791c2
PNOR: OP9-v2.4-4.37-prod

[HW config]
CPU DD2.3 20core *2
Micron (MTA18ASF2G72PZ-2G9E1) 16G *16
Samsung PM985 960GB *1
PSU ACBEL 2000w *2
Slot1: Network2 - Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER
Slot2: Network7 - Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T)
Slot3: Network5 - Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER
Slot4: Network6 - Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN)
Slot5: Network10 - Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP
Slot6: Network3 - Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER
Slot7: Network9 - Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER)

《SUT8》
[Rhel8.2 Kernel]
4.18.0-193.19.1.el8_2.ppc64le

[FW config]
BMC: op940.00.mih-5-0-g86f9791c2
PNOR: OP9-v2.4-4.37-prod

[HW config]
CPU DD2.3 12core *2
Micron (MTA18ASF2G72PZ-2G9E1) 16G *16
Samsung PM985 960GB *1
PSU ACBEL 2000w *2
Slot1: Network2 - Mellanox 2-PORT EDR 100Gb IB CONNECTX-5 GEN4 PCIe x16 CAPI CAPABLE LP ADAPTER
Slot2: Network7 - Marvell 2-PORT E'NET (2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER 10GBase-T)
Slot3: Network5 - Mellanox 2-PORT 10Gb NIC&ROCE ConnectX-4Lx SR/Cu PCIe 3.0 LP CAPABLE ADAPTER
Slot4: Network6 - Mellanox 2-PORT 25/10Gb NIC&ROCE SR/Cu PCIe 3.0 (25/10Gb EVERGLADES EN)
Slot5: Network10 - Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP
Slot6: Network3 - Mellanox 2-PORT 100Gb ROCE EN CONNECTX-5 GEN4 PCIe x16 LP CAPABLE ADAPTER
Slot7: Network9 - Marvell QUAD E'NET (2X1 + 2X10 10Gb), PCIe Gen 2 X8/SHORT LP CAPABLE (SHINER SFP+ SR COPPER)

【Note】
Although there are many types of Network Cards equipped in the terminal, the Network Card we test is "Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP", and we only add static IP in "Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP".

@vaishnavibhat
Copy link
Contributor

Hi,

This mostly looks like a timing issue during MTU change . When a MTU change occurs, there is a link down - link up happening subsequently. Currently there is a timeout of 30s for this process. Here it looks like its taking a longer time for the same. I have posted this patch to increase the timeout. Addressing the review comments and waiting to get accepted.
avocado-framework/avocado#4252

@Gene-Lo
Copy link
Author

Gene-Lo commented Oct 22, 2020

Tag [email protected]
Sachin, please kindly track issue.

@vaishnavibhat
Copy link
Contributor

Using the yaml parameter mtu_timeout to extend wait for cases where we see timing issue with the adapter. Currently there is no specific wait time for drivers and each of them are designed and behaved differently.

@Gene-Lo
Copy link
Author

Gene-Lo commented Nov 11, 2020

Dears,
Here are the results that we change mtu_timeout to different values, and then run tcpdump.py in Rhel8.2 with Avocado(version: 82.0)

1. Set mtu_timeout to: 10
RESULTS : PASS 33 | ERROR 30 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log & Manual-Test-log】
job.log
Manual-Test-log for INIT 02.txt

【tcpdump.yaml】
Yaml_10

2. Set mtu_timeout to: 10000
RESULTS : PASS 34 | ERROR 29 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log & Manual-Test-log】
job.log
Manual-Test-log for INIT 03.txt

3. Set mtu_timeout to: 10000000
RESULTS : PASS 37 | ERROR 26 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log & Manual-Test-log】
job.log
job_debug for INIT 02.txt

4. Set mtu_timeout to: 1000000000000000
RESULTS : PASS 38 | ERROR 25 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log】
job.log

5. Set mtu_timeout to: 10000000000000000000000000000000000000000
RESULTS : PASS 35 | ERROR 28 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log】
job.log

6. Set mtu_timeout to: 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
RESULTS : PASS 40 | ERROR 23 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

【job.log】
job.log

【tcpdump.yaml】
image

From above results, mtu_timeout should not be the root cause of the issue. Please kindly check. Many thanks!!

@Gene-Lo
Copy link
Author

Gene-Lo commented Nov 11, 2020

In addition, in the past, when we we used the same Network-Card (Broadcom 5719 QP 1G (1G/100M/10M) Network Interface Card PCIe x4 LP), to run tcpdump.py in Rhel8.1 with Avocado(Version: 75.1), the results were all PASS.

【job.log】
20200302_Network10_tcpdump_job.txt

【tcpdump.yaml】
We only set below parameters, while other parameters were default values:
yaml

Please kindly check. Many thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants