-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to configure kdump on CoreOS #28164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good.
As we expect administrators to use MCs instead of relying on SSH for node setup, I'm wondering if we should first provide a full example using a MachineConfig with a filter to select only one or some of the nodes and keep the manual setup instructions in a second section.
See the docs for kargs changes for MC examples.
Yes...but I think it will be common for admins to want to enable kdump on just one node (or just a subset), and we don't support machine specific machineconfigs yet. I think this is probably OK for now; an admin who actually wants to enable kdump on multiple nodes could indeed use a MachineConfig, and it'd probably be worth at least mentioning that. |
I'll mention this, but since it's unlikely that admins would want to enable kdump on all nodes, I'll omit the fulll example of using MachineConfigs for now, especially since there will soon be better kdump support through FCC. |
41f3e03
to
57a4edc
Compare
This is planned for 4.7. |
I tried to enable it and got an error:
|
@cynepco3hahue Did you set the
This is required on RHCOS, currently. It will no longer be required once |
@kelvinfan001 Thanks for information! |
@cynepco3hahue Can you confirm that the documented steps work for you? Thanks |
@openshift/team-documentation |
57a4edc
to
3f6848b
Compare
|
||
. Ensure that `kdump` has loaded a crash kernel by checking that `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`. | ||
|
||
=== Enabling kdump on day-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider making this a new module, especially if it ends up with additional procedures.
== Testing the kdump configuration | ||
|
||
ifdef::openshift-enterprise[] | ||
Please refer to the link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/installing-and-configuring-kdump_system-design-guide#testing-the-kdump-configuration_installing-and-configuring-kdump["Testing the kdump configuration" section] over at the {op-system-base} documentation for `kdump`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following apply throughout:
s/Please refer to the/See
s/over at the/in the
No quotations needed for titles
@kelvinfan001 thank you, this is looking great! I did a first pass review and left some suggestions. Also tagging @bobfuru since he works directly with CoreOS content. |
Added a few more comments and agree with suggestions from @bmcelvee. This is great work, @kelvinfan001 - thank you! |
3f6848b
to
f41b230
Compare
. Ensure that `kdump` has loaded a crash kernel by checking that `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`. | ||
|
||
== Enabling kdump on day-1 | ||
`kdump` is intended to be enabled per-node to debug kernel problems. It is not recommended to enable `kdump` on all of your nodes in the cluster. Although machine-specific `MachineConfigs` are not yet supported, it is possible to do the above through a systemd unit in a `MachineConfig` object on day-1 and have kdump enabled on all nodes in the cluster. You can create a `MachineConfig` object and inject that object into the set of manifest files used by Ignition during cluster setup. See "Customizing nodes" in the _Installing -> Installation configuration_ section for more information and examples on how to use Ignition configs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions in this paragraph:
- So that the sentence doesn't start lowercase: s/
kdump
is intended/Thekdump
service is intended/ - s/Although machine-specific
MachineConfigs
/Although machine-specific machine configs/ (do not pluralize an object ref, according to docs guidelines) - s/it is possible to do the above through a systemd unit/you can perform the previous step through a
systemd
unit/
+ | ||
[source, terminal] | ||
---- | ||
sudo rpm-ostree kargs --append='crashkernel=256M' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add $
prompts at the beginning of terminal commands (or #
as root)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reserve memory for the crash kernel during the first kernel booting, provide kernel arguments by entering the following command:
|
||
.Procedure | ||
|
||
The following steps are needed to enable `kdump` on {op-system}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this active voice. s/The following steps are needed to enable kdump
on {op-system}./Perform the following steps to enable kdump
on {op-system}:/
sudo rpm-ostree kargs --append='crashkernel=256M' | ||
---- | ||
|
||
. By default, the path in which the vmcore will be saved is `/var/crash`. It is also possible to write the dump over the network or to some other location on the local system by editing `/etc/kdump.conf`. For example, assuming `/var/usrlocal/cores` exists, enter the following command to edit `/etc/kdump.conf` to save the vmcore to `/var/usrlocal/cores`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit to single-space after .
for list items:
s/. By default/. By default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wonder where I got the idea to put two spaces.
sed -i "s/^path.*/path \/var\/usrlocal\/cores/" /etc/kdump.conf | ||
---- | ||
+ | ||
For additional information, see `kdump.conf`, a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options, and the comments in `/etc/kdump.conf` and `/etc/sysconfig/kdump`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe s/and the comments/and note the comments/ ?
sudo systemctl reboot | ||
---- | ||
|
||
. Ensure that `kdump` has loaded a crash kernel by checking that `kdump.service` has started and exited successfully and that `cat /sys/kernel/kexec_crash_loaded` prints `1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/that kdump.service
/that the kdump.service
endif::[] | ||
* link:https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html[Linux kernel documentation for kdump] | ||
* kdump.conf(5) — a manual page for the `/etc/kdump.conf` configuration file containing the full documentation of available options | ||
* kexec(8) — a manual page for kexec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/kexec/kexec
Thanks for the updates, @kelvinfan001 - just a few more minor nits and otherwise LGTM. |
f41b230
to
71782fb
Compare
RHCOS 4.7 includes `kexec-tools` (required for `kdump`) so investigating kernel crashes through `kdump` is now supported.
71782fb
to
ad48071
Compare
Thanks again, @bobfuru. I've updated the PR with your additional suggestions. |
LGTM!! 👍 |
/cherrypick enterprise-4.7 |
@bobfuru: new pull request created: #28657 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
RHCOS 4.7 includes
kexec-tools
(required forkdump
) soinvestigating kernel crashes through
kdump
is now supported.