-
Notifications
You must be signed in to change notification settings - Fork 3.9k
/
syscount_example.txt
184 lines (161 loc) · 6.27 KB
/
syscount_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Demonstrations of syscount, the Linux/eBPF version.
syscount summarizes syscall counts across the system or a specific process,
with optional latency information. It is very useful for general workload
characterization, for example:
# syscount
Tracing syscalls, printing top 10... Ctrl+C to quit.
[09:39:04]
SYSCALL COUNT
write 10739
read 10584
wait4 1460
nanosleep 1457
select 795
rt_sigprocmask 689
clock_gettime 653
rt_sigaction 128
futex 86
ioctl 83
^C
These are the top 10 entries; you can get more by using the -T switch. Here,
the output indicates that the write and read syscalls were very common, followed
immediately by wait4, nanosleep, and so on. By default, syscount counts across
the entire system, but we can point it to a specific process of interest:
# syscount -p $(pidof dd)
Tracing syscalls, printing top 10... Ctrl+C to quit.
[09:40:21]
SYSCALL COUNT
read 7878397
write 7878397
^C
Indeed, dd's workload is a bit easier to characterize. Occasionally, the count
of syscalls is not enough, and you'd also want an aggregate latency:
# syscount -L
Tracing syscalls, printing top 10... Ctrl+C to quit.
[09:41:32]
SYSCALL COUNT TIME (us)
select 16 3415860.022
nanosleep 291 12038.707
ftruncate 1 122.939
write 4 63.389
stat 1 23.431
fstat 1 5.088
[unknown: 321] 32 4.965
timerfd_settime 1 4.830
ioctl 3 4.802
kill 1 4.342
^C
The select and nanosleep calls are responsible for a lot of time, but remember
these are blocking calls. This output was taken from a mostly idle system. Note
the "unknown" entry -- syscall 321 is the bpf() syscall, which is not in the
table used by this tool (borrowed from strace sources).
Another direction would be to understand which processes are making a lot of
syscalls, thus responsible for a lot of activity. This is what the -P switch
does:
# syscount -P
Tracing syscalls, printing top 10... Ctrl+C to quit.
[09:58:13]
PID COMM COUNT
13820 vim 548
30216 sshd 149
29633 bash 72
25188 screen 70
25776 mysqld 30
31285 python 10
529 systemd-udevd 9
1 systemd 8
494 systemd-journal 5
^C
This is again from a mostly idle system over an interval of a few seconds.
Sometimes, you'd only care about failed syscalls -- these are the ones that
might be worth investigating with follow-up tools like opensnoop, execsnoop,
or trace. Use the -x switch for this; the following example also demonstrates
the -i switch, for printing at predefined intervals:
# syscount -x -i 5
Tracing failed syscalls, printing top 10... Ctrl+C to quit.
[09:44:16]
SYSCALL COUNT
futex 13
getxattr 10
stat 8
open 6
wait4 3
access 2
[unknown: 321] 1
[09:44:21]
SYSCALL COUNT
futex 12
getxattr 10
[unknown: 321] 2
wait4 1
access 1
pause 1
^C
Similar to -x/--failures, sometimes you only care about certain syscall
errors like EPERM or ENONET -- these are the ones that might be worth
investigating with follow-up tools like opensnoop, execsnoop, or
trace. Use the -e/--errno switch for this; the following example also
demonstrates the -e switch, for printing ENOENT failures at predefined intervals:
# syscount -e ENOENT -i 5
Tracing syscalls, printing top 10... Ctrl+C to quit.
[13:15:57]
SYSCALL COUNT
stat 4669
open 1951
access 561
lstat 62
openat 42
readlink 8
execve 4
newfstatat 1
[13:16:02]
SYSCALL COUNT
lstat 18506
stat 13087
open 2907
access 412
openat 19
readlink 12
execve 7
connect 6
unlink 1
rmdir 1
^C
Sometimes, you'd only care about a single syscall rather than all syscalls.
Use the --syscall option for this; the following example also demonstrates
the --syscall option, for printing at predefined intervals:
# syscount --syscall stat -i 1
Tracing syscall 'stat'... Ctrl+C to quit.
[12:51:06]
SYSCALL COUNT
stat 310
[12:51:07]
SYSCALL COUNT
stat 316
^C
USAGE:
# syscount -h
usage: syscount.py [-h] [-p PID] [-t TID] [-i INTERVAL] [-d DURATION] [-T TOP]
[-x] [-e ERRNO] [-L] [-m] [-P] [-l] [--syscall SYSCALL]
Summarize syscall counts and latencies.
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace only this pid
-t TID, --tid TID trace only this tid
-c PPID, --ppid PPID trace only child of this pid
-i INTERVAL, --interval INTERVAL
print summary at this interval (seconds)
-d DURATION, --duration DURATION
total duration of trace, in seconds
-T TOP, --top TOP print only the top syscalls by count or latency
-x, --failures trace only failed syscalls (return < 0)
-e ERRNO, --errno ERRNO
trace only syscalls that return this error (numeric or
EPERM, etc.)
-L, --latency collect syscall latency
-m, --milliseconds display latency in milliseconds (default:
microseconds)
-P, --process count by process and not by syscall
-l, --list print list of recognized syscalls and exit
--syscall SYSCALL trace this syscall only (use option -l to get all
recognized syscalls)