Skip to content

Add DR with active-passive and external storage #3679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 6, 2025

Conversation

aneta-petrova
Copy link
Member

@aneta-petrova aneta-petrova commented Feb 18, 2025

What changes are you introducing?

Adding a new disaster recovery scenario: Active and passive server with external storage.

Why are you introducing these changes? (Explanation, links to references, issues, etc.)

A follow-up to #3558
The new scenario follows the structure established in #3615 for consistency

Anything else to add? (Considerations, potential downsides, alternative solutions you have explored, etc.)

Now that I'm working on the second scenario, I can already see opportunities for reuse or snippets. However, I think it will be better to deal with this after all the planned scenarios are in place.

Checklists

  • I am okay with my commits getting squashed when you merge this PR.
  • I am familiar with the contributing guidelines.

Please cherry-pick my commits into:

  • Foreman 3.14/Katello 4.16
  • Foreman 3.13/Katello 4.15 (EL9 only)
  • Foreman 3.12/Katello 4.14 (Satellite 6.16)
  • Foreman 3.11/Katello 4.13 (orcharhino 6.11 on EL8 only; orcharhino 7.0 on EL8+EL9)
  • Foreman 3.10/Katello 4.12
  • Foreman 3.9/Katello 4.11 (Satellite 6.15; orcharhino 6.8/6.9/6.10)
  • Foreman 3.8/Katello 4.10
  • Foreman 3.7/Katello 4.9 (Satellite 6.14)
  • We do not accept PRs for Foreman older than 3.7.

@github-actions github-actions bot added Needs tech review Requires a review from the technical perspective Needs style review Requires a review from docs style/grammar perspective Needs testing Requires functional testing labels Feb 18, 2025
@aneta-petrova aneta-petrova removed the Needs testing Requires functional testing label Feb 18, 2025
Copy link

github-actions bot commented Feb 18, 2025

The PR preview for f95ffc1 is available at theforeman-foreman-documentation-preview-pr-3679.surge.sh

The following output files are affected by this PR:

show diff

show diff as HTML

@aneta-petrova
Copy link
Member Author

@ehelms Can you please review another DR scenario? I am once again making a lot of barely-educated guesses and I added several questions as comments so please keep that in mind and read carefully :) Also, I'm not sure what level of detail we are aiming for with this scenario: Do we want to keep things high-level like the virtualization scenario or can we/should we provide more detailed instructions?

@aneta-petrova aneta-petrova requested a review from rh-max February 19, 2025 10:28
Copy link
Contributor

@rh-max rh-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear and well-structured overall, I have only a few suggestions and considerations. 👍

Also, I'll just paste here what I found in section 21.1. Overview of recommended disaster recovery plans:

Choose a disaster recovery plan that best helps ensure the continuity of Satellite services in your deployment.

Virtualizing your Satellite Server ...

This communicates the idea "my disaster recovery plan is virtualizing the Satellite Server", but virtualizing by itself is not the solution. I'd make it more precise:

Snapshots of virtualized Satellite Server

This also aligns better with the non-action style of the subsequent plan suggestions.

This method is suitable if you can run Satellite on top of virtualization.

This is partly subjective, but to me choosing this phrasing:

run Satellite on top of virtualization (phrasing 1)

instead of something like this:

run Satellite virtualized (phrasing 2a)

or

run Satellite in a virtual machine (phrasing 2b)

implies that scenario described in phrasing 1 has some additional layer between Satellite and virtualization, and scenario described in phrasings 2a and 2b does not. But maybe most readers have more context, which rules out that extra non-existent layer.

@pr-processor pr-processor bot added the Waiting on contributor Requires an action from the author label Feb 26, 2025
@aneta-petrova
Copy link
Member Author

Also, I'll just paste here what I found in section 21.1. Overview of recommended disaster recovery plans:

Let's track these in #3704.

@pr-processor pr-processor bot added Needs re-review and removed Waiting on contributor Requires an action from the author Needs re-review labels Feb 28, 2025
@aneta-petrova
Copy link
Member Author

@ehelms @evgeni Thanks for all your feedback so far! I implemented most of it, except for one comment thread that remains open with a follow-up question. Can you please re-review? This scenario feels more complex to me than the virtualization one, so I can't really tell whether we are getting close enough to the desired outcome.

@aneta-petrova
Copy link
Member Author

aneta-petrova commented Feb 28, 2025

It looks like the preview in #3679 (comment) didn't get updated after the latest set of commits. Just FYI for potential reviewers.

@aneta-petrova aneta-petrova marked this pull request as ready for review March 5, 2025 12:13
@pr-processor pr-processor bot added the Waiting on contributor Requires an action from the author label Mar 5, 2025
@pr-processor pr-processor bot added Needs re-review and removed Waiting on contributor Requires an action from the author Needs re-review labels Mar 5, 2025
@aneta-petrova
Copy link
Member Author

Feedback from style review implemented, which means style review done.

Rebased, which pulled in the change in #3705 and allowed me to include the procedure on retrieving status of services cleanly.

@aneta-petrova aneta-petrova added style review done No issues from docs style/grammar perspective and removed Needs style review Requires a review from docs style/grammar perspective labels Mar 5, 2025
@aneta-petrova
Copy link
Member Author

@ehelms and/or @evgeni, do you have any further comments?

Comment on lines +9 to +10
. Clone your active {ProjectServer}.
See xref:cloning_satellite_server[].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized: the cloning process involves a DB-restore, which is probably not what people want.
What would probably work is: clone with /var/lib/pgsql being local, then turn things off and replace it with shared storage.
But that's fragile :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say that it's "fragile", you're actually kind of saying that that's probably not what we want either :) So what would be a good solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah… The deployment consists of three things: packages (rpms), config (/etc) and data (/var/lib/). To get a working deployment you need all of them. As the data is quickly moving, we place it on shared storage so it can be picked up from various places. The packages and config are not, so we can keep them on "traditional" storage, but they still need to match.

The easiest way to create a matching set is a "clone" (in the general meaning of the term), but satellite-clone does more than cloning rpms/configs, it also does data.

How we can get to a good clone of the slowly moving parts only: I need to think more about it :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created issue #3714 where we can track this. According to #3679 (review), this open item doesn't block merging.

Copy link
Member

@ehelms ehelms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good to merge the current state.

@aneta-petrova aneta-petrova added tech review done No issues from the technical perspective and removed Needs tech review Requires a review from the technical perspective labels Mar 6, 2025
@aneta-petrova aneta-petrova merged commit 79a2600 into theforeman:master Mar 6, 2025
8 of 9 checks passed
@aneta-petrova aneta-petrova deleted the dr-2-external branch March 6, 2025 08:55
aneta-petrova added a commit that referenced this pull request Mar 6, 2025
---------

Co-authored-by: Eric Helms <[email protected]>
Co-authored-by: Evgeni Golov <[email protected]>
(cherry picked from commit 79a2600)
@aneta-petrova
Copy link
Member Author

Merged to "master" and cherry-picked:

a8cd1ff..f564f69 3.14 -> 3.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
style review done No issues from docs style/grammar perspective tech review done No issues from the technical perspective
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants