-
Notifications
You must be signed in to change notification settings - Fork 23
Adopt the Module API in drgn-tools #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Drgn 0.0.31 brings huge improvements, especially with the module API and plugins. It also is the last release to officially support Python 3.6 upstream. As a result, it should serve as an excellent long-term minimum requirement, allowing us to assume these features are available and drop support code for older versions. Signed-off-by: Stephen Brennan <[email protected]>
Now that drgn 0.0.31 is our minimum, we can assume these helpers are provided by drgn. Drop their implementations and import them directly from drgn. Signed-off-by: Stephen Brennan <[email protected]>
This helper was upstreamed into d_path(). The sb_first_mount_point() function needs to remain, because it is used by the fsnotify helpers. Signed-off-by: Stephen Brennan <[email protected]>
The frame_name() implementation uses KernelModule to find the module corresponding to an address in order to label the stack frame with the module. This can now be done more efficiently with drgn's module API. frame_name() also implements fallingback logic for finding the frame name. When DWARF is not present, it falls back to a symbol, and if that's not found, it falls back to an address, or worst case, the hexadecimal code address. But since drgn 0.0.31, this exact fallback sequince is now built-in to StackFrame.name. The only difference is that our version doesn't require that the user has first loaded the "module_kallsyms" symbol finder, whereas drgn's does, because it only uses the symbol finder API to lookup symbols. As a result, we'll need to make sure that all callers have loaded it. The major ones will be (a) corelens, (b) drgn_tools.cli, and (c) any caller who directly uses "drgn" to execute a script. (a) and (b) can be solved easily in drgn-tools, but (c) will require a new drgn plugin to handle proper debuginfo handling. Signed-off-by: Stephen Brennan <[email protected]>
Drgn 0.0.31 introduces the module API, and every kernel module will have an automatically-detected drgn.Module associated with it by drgn. This Module is the key to seeing information about whether debuginfo is loaded, and it also has efficient lookup APIs. Remove duplicate and unused functionality drgn_tools.module and rely on drgn.Module for what we need. Some functionality, like get_symbol(), is now totally unused. That code also relates to the ModuleSymbolFinder, which is similarly unused, and even if it were, it could be replaced by the SymbolIndex which I've upstreamed to drgn. The only symbol-releated functionality which still belongs is module_exports() / KernelModule.exports(). Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Drgn's logging, in general, is quite useful, and in the future we may include a way to control the log level used by corelens & drgn. However, drgn logs a warning for missing debug files when you use the load_default_debug_info() function, which ruins the corelens output. We really don't want this, because corelens already handles module debuginfo. Add a filter to the drgn logger in order to handle this. While we're at it, we can simplify our logging in Corelens by quite a bit, by using the logging module instead of the existing bespoke system. Thankfully, logging has all the handler components that are necessary to easily reproduce the same behavior. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
In the happy case, every kernel module which is not OOT-tainted will be found in a kernel RPM. This is true in the overwhelming majority of cases, but a few edge cases can cause undesirable behavior. 1. Ksplice cold-patch modules are expected to be mismatched. The shipped debuginfo doesn't apply for them. 2. Some old kernels had module build ID mismatches which cannot be corrected. In either case, drgn will reject the debuginfo and continue calling debuginfo finders. This can result in our finders being called multiple times, pointlessly downloading and/or extracting RPMs multiple times. To handle cold-patches, we just don't consider them in-tree modules. To handle the remaining build ID mismatches, we record all module names for which we've attempted RPM extraction. If we're later called, we check each module against that set and avoid re-doing the work. Signed-off-by: Stephen Brennan <[email protected]>
Refactor the finders into methods of a single class. This allows sharing important cached info. Establish consistent arguments for corelens and drgn_tools.cli. The config file can only configure drgn -- corelens and drgn_tools.cli have their own set behaviors. While we're at it, update the corelens manpage. Signed-off-by: Stephen Brennan <[email protected]>
Debug info searches go in two steps: (1) kernel, (2) modules. You cannot know which modules must be loaded until the kernel is loaded. This is a problem for RPM downloading and extraction. While we can't avoid the double-extraction, it is pretty wasteful to download the RPM once for each step. The rpm_cache configuration was originally conceived as a band-aid over this issue. It lets the RPM get saved persistently, but it has the nice upside that the ol-local-rpm would find the saved RPM and ensure that the ol-download finder doesn't get called again. That's nice, but in reality we probably don't want rpm_cache enabled all that often. Instead, persist the downloaded file until the program exits. Ensure that even if ol-local-rpm is disabled, the ol-download finder will still call it with the cached RPM as necessary. Signed-off-by: Stephen Brennan <[email protected]>
Stop using KernelModule, add support for ksplice cold-patched modules, and add a verbose printing mode that shows all the modules. Signed-off-by: Stephen Brennan <[email protected]>
89485e5 to
f7e5829
Compare
f301ddc to
871697b
Compare
Now that we've migrated to the Module API, the KernelModule was nothing but a less efficient wrapper over the Module / struct module objects. Let's not keep it around. Signed-off-by: Stephen Brennan <[email protected]>
This is both unused, and can trivially be provided by the Module.build_id attribute. Signed-off-by: Stephen Brennan <[email protected]>
The "module" module is much smaller now! But some of the remaining members are no longer included in __all__. Fix that. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Drgn 0.0.32 was unexpectedly released with some bug fixes and improvements. It is still Python 3.6 compatible. We will adopt it as the new minimum version going forward. Signed-off-by: Stephen Brennan <[email protected]>
This reverts commit 1e20fbe. Now that our minimum drgn version is 0.0.32, we don't need the fix for skipping cursors. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Using found_count == max_pages as the loop break condition can fail, if we hit the "inode_path(inode) is None" case on line 238 when found_count == max_pages. Use a >= comparison so that if we've overshot, we still break out of the loop. This fixes uncommon issues where the test framework spends a long time dumping pages. Signed-off-by: Stephen Brennan <[email protected]>
The "maxitems" parameter set at 10,000 is tracked per mount, which means that the list_lru test can output far more than that. Drop this down to 100 per mount, only allow the test to process 20 mounts before completing. The purpose of the smoke test is just to verify that we have iterated over the data structures correctly. Having small limits is fine for that purpose. Custom (or zero) amounts can still be specified for those interested in running the full operation. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
ea697b9 to
b6c29d7
Compare
Signed-off-by: Stephen Brennan <[email protected]>
For objects with file size less than 100 MiB, we use a direct PUT request to send the data, rather than a multipart upload. However, rather than passing the block of data that we already read from the file (first_chunk), we are reading from the now exhausted file, so we always upload empty data. Fix this so that we get correct uploads. Signed-off-by: Stephen Brennan <[email protected]>
b6c29d7 to
deb8183
Compare
|
Ok, finally got all tests passing internal & external. I think this is ready for review. |
|
Hey @Asphaltt - you made some changes to debuginfo lookup paths in #166. This is the big refactor I mentioned. I know it's way too large to ask you to read through, but the core change is that now, we'll rely on drgn's built-in debuginfo finder, and we just add on our own logic for anything Oracle specific. So ideally, this should work for other distros with no extra effort. If you're able to test it out for your use cases, I'd love to hear your feedback. No worries if you can't though. |
biger410
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I apologize in advance for the size of this pull request, as well as the relative disorganization of the commits. My design for this changed one or twice during the implementation, and I didn't end up squashing those changes down. The result is that the branch shows the evolution of the design, which is reasonable, but I wouldn't necessarily recommend reviewing individual commits, except to read the messages with rationales.
This adopts the module API and moves our minimum drgn version to 0.0.32 (despite what the branch name says). Since this is the latest, and the last version supporting Python 3.6, we will likely be using this as our minimum for quite a long time. As such, I didn't see any reason to keep support for any older version.
The goals here were:
corelensanddrgn_tools.cli, instead relying on drgn debug info finders, and making their behavior more consistent.This adds a drgn plugin, with several debug info finders. It revamps the (hardly ever used) drgn-tools config file. It does modify some elements of the corelens & drgn_tools.cli arguments. It removes the
KernelModuleclass. And there are several other major changes. I've written a document that goes into far more detail than I feel comfortable writing in a PR description. I'll share it to those curious.