Skip to content

[foreman] Add collection of hammer output #3994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TurboTurtle
Copy link
Member

This commit adds collection of hammer output for various components within Foreman/Satellite. These are first listed and then individually collected for any entries listed.


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3994
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@TurboTurtle
Copy link
Member Author

@pmoravec here's the first pass at what we talked about on IRC. Right now it just collects subscriptions, lifecycle-environments, locations, and optionally content views for each org. Can definitely add or reduce that list here before merging.

Was thinking about capsules (I believe upstream foreman simply calls them proxies?), but unsure of the support benefit there. These are just the main ones I've been asked for in some recent interactions with RH support.

This commit adds collection of `hammer` output for various components
within Foreman/Satellite. These are first listed and then individually
collected for any entries listed.

Signed-off-by: Jake Hunsaker <[email protected]>
@TurboTurtle
Copy link
Member Author

@pmoravec I don't have easy access to a Debian box with Foreman 3.7 where the CI is failing. It seems like the output for hammer may be a different format given the error from the logs? It's failing to find name in the line from hammer organization list in order to get the index of org names:

Traceback (most recent call last):
  File "/tmp/cirrus-ci-build/sos/report/__init__.py", line 1254, in setup
    plug.setup()
  File "/tmp/cirrus-ci-build/sos/report/plugins/foreman.py", line 159, in setup
    n_idx = [
ValueError: 'name' is not in list

How does 3.7 map to current Satellite 6.16 (which is where I did my testing for this)?

@pmoravec
Copy link
Contributor

Foreman 3.7 should have the same hammer output, at least on Satellite 6.14 on RHEL8 - let me check it on an Ubuntu system.

This PR can be tricky to assess what detailed info is worth to collect or not. Since hammer commands can take a while (esp. when some service is timeouting (so worth lowering the cmd timeout even more?)) - we can use direct postgres commands for some stuff, but that is not a stable solution over releases that alter foreman's ERD.. Also some collected data can be too seldomly used and this is hard to assess.

On a first draft, the "verbosity level" sounds good to me, let me think more about it (and ask internally for more broader feedback).

@sayan3296
Copy link
Contributor

While this looks intriguing but can end-up causing more problems than we can think of right away.

p1. Right now we might be implementing couple of hammer commands for couple of objects. In future, Tech_support Engineers can keep on asking to include more and more hammer commands for different things justifying their needs and uses for case investigation. And that will continue to make sos more and more bulky whenever executed on a satellite system

p2. Hammer depends on API. The API response time for each customer's satellite is not same. Pretty good number of users can have slower API response rates than expected. Which means, the more number of hammer commands we use, the slower the sosreport generation would be. In a scenario where the sos is only needed for basic OS level stuff and logs, That extended time spent on generating the sos to collect hammer-based data, cannot be appreciated as we are extending the investigation duration here.

p3. If at all this is implemented, then execution of sos report should not execute any of these collections. It has to be a separate plugin or option, which user needs to pass and then only this data would be collected or else sos report would only collect the sos as it does today.

p4. DB queries are much faster to use, instead of relying on hammer. If we come up with the data required for a set of objects and prepare their queries, then it's quite a lot faster to execute during the sos report data collection, compared with hammer\api. Ofcourse, this suggestion can only be implemented by keeping the fact in mind that database ERD can change in future.

In simpler terms, I would prefer to not suggesting the inclusion of something that may slow down our most important data collection tool, in the hours of need.

@pmoravec
Copy link
Contributor

p4. DB queries are much faster to use, instead of relying on hammer.

Relying on stable ERD (which should be stable for basic stuff but can easily change, like we saw in dynflow stuff a year ago) is kind of a risk. And I dont want to enter the rabbit hole of having different DB queries for different foreman versions.

Moreover, foreman devels often repeat the DB is just an internal implementation detail that users should not directly access until needed. Which means that e.g. there is no guarantee of stable ERD and imho no announcement of ERD changes either.

Collecting whole several tables is an option to get the required data in a way independent on foreman version - but we might get ridiculous or even sensitive data.

Yet another option is grabbing the orgs/locs/CVs/.. lists from within foreman-rake console that should be stable over releases. But one single execution of foreman-rake console takes 20+ seconds each and every time. So having multiple such calls, one for each current hammer command, would take many minutes. While executing a bunch of commands within one single call of one console might sometimes fail in the middle, causing just half of data are obtained (or none if return code would be nonzero) - and still, even here the execution time will be 20+ seconds each and every time.

Yet yet another option is calling API directly, but that approach suffers same problems like hammer does, and we lack the abstraction of API parameters changes among releases.

There is no efficient way of reliable getting the data in a reasonable time (on 95+% cases, at least), I am afraid. From the options (hammer, API, console, direct psql), hammer seems to be the least painful yet not great for the reasons above.

What about putting all the hammer commands (apart of ping and status, these are great) under a plugin option that is disabled by default? (or several plugopts)?

@TurboTurtle
Copy link
Member Author

I'll generally echo @pmoravec's responses, and add just one more thing.

While we are sensitive the execution times, I think in this scenario the concern is a little overblown. Plugins have a maximum execution, defaulting to 5 minutes. The foreman plugin currently has a timeout of 30 minutes, and this PR does not change that. When sticking in the foreman plugin, the worst-case scenario is no different than without the hammer command collections. If we put this in a new plugin (completely fine by me, I only put this in the foreman plugin since there was already a hammer command collection present), then the maximum additional execution time can be directly controlled (again, defaulting to 5 minutes).

@pmoravec
Copy link
Contributor

Logically, the data should be collected by foreman plugin as the commands outputs describe the foreman instance deployment. Anyway we also have a precedent with networking / firewalld / firewall_tables / networkmanager where we have more plugins than the areas they focus on.

Some time ago, I run some stats over sosreports I had access to, and - looking into the notices from the time - the running times of foreman plugin (enabled on 1400ish sosreports) were:

  • 1.66s minimum
  • 15.35s average
  • 36.31s mean
  • 1800s max (the timeout)

Which means the plugin completed within one minute the most of the time.

What about:

  • decreasing the foreman plugin timeout to 10 minutes since the above stats show the 30mins is too high (I think it was set to accommodate html task export that we replaced by CSV export of tasks)
  • having hammer commands (all as optional, accessible via a plugin option?) with a lower probability, so they are collected at the end of the plugin, with some reasonably small timeout on each call (60s the most?)

Then (if/when enabled, until enabled by default) the negative impact of possibly long running hammer commands will be limited.

.. or we can have them really in a separate hammer(?) plugin with dedicated 5mins timeout, rather to apply timeouts separately and cut potential long running commands even sooner?

No strong preference from me.

@TurboTurtle TurboTurtle added Status/Need More Info Feedback is required to reproduce issue or to continue work Kind/Collection New or updated command or file collection labels Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kind/Collection New or updated command or file collection Status/Need More Info Feedback is required to reproduce issue or to continue work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants