You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: exercises/ansible_ripu/2.2-snapshots/README.md
+12-12
Original file line number
Diff line number
Diff line change
@@ -62,25 +62,29 @@ The following sections explain the pros and cons in detail.
62
62
63
63
The Logical Volume Manager (LVM) is a set of tools included in RHEL that provide a way to create and manage virtual block devices known as logical volumes. LVM logical volumes are typically used as the block devices from which RHEL OS filesystems are mounted. The LVM tools support creating and rolling back logical volume snapshots. Automating these actions from an Ansible playbook is relatively simple.
64
64
65
-
Logical volumes are contained in a storage pool known as a volume group. The storage available in a volume group comes from one or more physical volumes, that is, block devices underlying actual disks or disk partitions. Typically, the logical volumes where the RHEL OS is installed will be in a "rootvg" volume group. If best practices are followed, logical volumes for applications and app data will be isolated in a separate volume group, "appvg" for example.
65
+
> **Note**
66
+
>
67
+
> The snapshot and rollback automation capability implemented for our workshop lab environment creates LVM snapshots managed using Ansible roles from the [`infra.lvm_snapshots`](https://github.com/swapdisk/infra.lvm_snapshots#readme) collection.
68
+
69
+
Logical volumes are contained in a storage pool known as a volume group. The storage available in a volume group comes from one or more physical volumes, that is, block devices underlying actual disks or disk partitions. Typically, the logical volumes where the RHEL OS is installed will be in a "rootvg" volume group. If best practices are followed, applications and app data will be isolated in their own logicial volumes either in the same volume group or a separate volume group, "appvg" for example.
66
70
67
71
To create logical volume snapshots, there must be free space in the volume group. That is, the total size of the logical volumes in the volume group must be less than the total size of the volume group. The `vgs` command can be used query volume group free space. For example:
68
72
69
73
```
70
74
# vgs
71
-
VG #PV #LV #SN Attr VSize VFree
72
-
rootvg 1 3 0 wz--n- 950.06g 422.06g
75
+
VG #PV #LV #SN Attr VSize VFree
76
+
VolGroup00 1 7 0 wz--n- 29.53g 9.53g
73
77
```
74
78
75
-
In the example above, the rootvg volume group total size is about 950 Gb and there is about 422 Gb of free space in the volume group. There is plenty of free space to allow for creating snapshot volumes in this volume group.
79
+
In the example above, the `VolGroup00` volume group total size is 29.53 GiB and there is 9.53 GiB of free space in the volume group. This should be enough free space to support rolling back a RHEL upgrade.
76
80
77
81
If there is not enough free space in the volume group, there are a few ways we can make space available:
78
82
79
83
- Adding another physical volume to the volume group (i.e., `pvcreate` and `vgextend`). For a VM, you would first configure an additional virtual disk.
80
84
- Temporarily remove a logical volume you don't need. For example, on bare metal servers, there is often a large /var/crash empty filesystem. Removing this filesystem from `/etc/fstab` and then using `lvremove` to remove the logical volume from which it was mounted will free up space in the volume group.
81
-
- Reducing the size of one or more logical volumes. This is tricky because first the filesystem in the logical volume needs to be shrunk. XFS filesystems do not support shrinking. EXT filesystems do support shrinking, but not while the filesystem is mounted. This option can be difficult and should only be considered as a last resort and trusted to a very experienced Linux admin.
85
+
- Reducing the size of one or more logical volumes. This is tricky because first the filesystem in the logical volume needs to be shrunk. XFS filesystems do not support shrinking. EXT filesystems do support shrinking, but not while the filesystem is mounted. Until recently, this way of freeing up volume group space was considered a last resort to be attempted by only the most skilled Linux admin, but it now possible to safely automate shrinking logical volumes using the [`shrink_lv`](https://github.com/swapdisk/infra.lvm_snapshots/tree/main/roles/shrink_lv#readme) role of the aforementioned `infra.lvm_snapshots` collection.
82
86
83
-
After a snapshot is created, COW data will start to utilize the free space of the snapshot logical volume as blocks are written to the origin logical volume. Unless the snapshot is create with the same size as the origin, there is a chance that the snapshot could fill up and become invalid. Testing should be performed during the development of the LVM snapshot automation to determine snapshot sizings with enough cushion to prevent this. The `snapshot_autoextend_percent` and `snapshot_autoextend_threshold` settings in lvm.conf can also be used to reduce the risk of snapshots running out of space.
87
+
After a snapshot is created, COW data will start to utilize the free space of the snapshot logical volume as blocks are written to the origin logical volume. Unless the snapshot is create with the same size as the origin, there is a chance that the snapshot could fill up and become invalid. Testing should be performed during the development of the LVM snapshot automation to determine snapshot sizings with enough cushion to prevent this. The `snapshot_autoextend_percent` and `snapshot_autoextend_threshold` settings in lvm.conf can also be used to reduce the risk of snapshots running out of space. The [`lvm_snapshots`](https://github.com/swapdisk/infra.lvm_snapshots/tree/main/roles/lvm_snapshots#readme) role of the `infra.lvm_snapshots` collection supports variables that may be used to automatically configure the autoextend settings.
84
88
85
89
Unless you have the luxury of creating snapshots with the same size as their origin volumes, LVM snapshot sizing needs to be thoroughly tested and free space usage carefully monitored. However, if that challenge can be met, LVM snapshots offer a reliable snapshot solution without the headache of depending on external infrastructure such as VMware.
86
90
@@ -102,13 +106,9 @@ VMware snapshots work great when they can be automated. If you are considering t
102
106
103
107
Amazon Elastic Block Store (Amazon EBS) provides the block storage volumes used for the virtual disks attached to AWS EC2 instances. When a snapshot is created for an EBS volume, the COW data is written to Amazon S3 object storage.
104
108
105
-
> **Note**
106
-
>
107
-
> The snapshot and rollback automation capability implemented for our workshop lab environment uses EBS snapshots.
108
-
109
109
While EBS snapshots operate independently from the guest OS running on the EC2 instance, the similarity to VMware snapshots ends there. An EBS snapshot saves the data of the source EBS volume, but does not save the state or memory of the EC2 instance to which the volume is attached. Also unlike with VMware, EBS snapshots can be created for an OS volume only while leaving any separate application volumes as is.
110
110
111
-
Automating EBS snapshot creation and rollback is fairly straightforward assuming your playbooks can access the required AWS APIs. The tricky bit of the automation is identifying the EC2 instance and attached EBS volume that corresponds to the target host in the Ansible inventory managed by AAP. For the snapshot automation we implemented for our workshop lab environment, we solved this by setting tags on our EC2 instances.
111
+
Automating EBS snapshot creation and rollback is fairly straightforward assuming your playbooks can access the required AWS APIs. The tricky bit of the automation is identifying the EC2 instance and attached EBS volume that corresponds to the target host in the Ansible inventory managed by AAP, but this can be solved by setting idenifying tags on your EC2 instances.
112
112
113
113
#### Break Mirror
114
114
@@ -128,7 +128,7 @@ Read the article [ReaR: Backup and recover your Linux server with confidence](ht
128
128
129
129
### Step 3 - Snapshot Scope
130
130
131
-
The best practice for allocating the local storage of a RHEL servers is to configure volumes that separate the OS from the apps and app data. For example, the OS filesystems would be under a "rootvg" volume group while the apps and app data would be in an "appvg" volume group. This separation helps isolate the storage usage requirements of these two groups so they can be manged based on their individual requirements and are less likely to impact each other. For example, the backup profile for the OS is likely different than for the apps and app data.
131
+
The best practice for allocating the local storage of a RHEL servers is to configure volumes that separate the OS from the apps and app data. For example, the OS filesystems would be under a "rootvg" volume group while the apps and app data would be in an "appvg" volume group or at least in their own dedicated logical volumes. This separation helps isolate the storage usage requirements of these two groups so they can be manged based on their individual requirements and are less likely to impact each other. For example, the backup profile for the OS is likely different than for the apps and app data.
132
132
133
133
This practice helps to enforce a key tenet of the RHEL in-place upgrade approach: that is that the OS upgrade should leave the applications untouched with the expectation that system library forward compatibility and middleware runtime abstraction reduces the risk of the RHEL upgrade impacting app functionality.
Copy file name to clipboardExpand all lines: exercises/ansible_ripu/3.1-rm-rf/README.md
+2-39
Original file line number
Diff line number
Diff line change
@@ -51,32 +51,7 @@ In the next exercise, we will be rolling back the RHEL upgrade on one of our ser
51
51
52
52
Verify you see a root prompt like the example above.
53
53
54
-
### Step 2 - Choose your Poison
55
-
56
-
The `rm -rf /*` command appears frequently in the urban folklore about Unix disasters. The command recursively and forcibly tries to delete every directory and file on a system. When it is run with root privileges, this command will quickly break everything on your pet app server and render it unable to reboot ever again. However, there are much less spectacular ways to mess things up.
57
-
58
-
Mess up your app server by choosing one of the following suggestions or dream up your own.
59
-
60
-
#### Delete everything
61
-
62
-
- As mentioned already, `rm -rf /*` can be fun to try. Expect to see lots of warnings and error messages. Even with root privileges, there will be "permission denied" errors because of read-only objects under pseudo-filesystem like `/proc` and `/sys`. Don't worry, irreparable damage is still being done.
63
-
64
-
You might be surprised that you will get back to a shell prompt after this. While all files have been deleted from the disk, already running processes like your shell will continue to be able to access any deleted files to which they still have an open file descriptor. Built-in shell commands may even still work, but most commands will result in a "command not found" error.
65
-
66
-
If you want to reboot the instance to prove that it will not come back up, you will not be able to use the `reboot` command, however, the `echo b > /proc/sysrq-trigger` might work.
67
-
68
-
#### Uninstall glibc
69
-
70
-
- The command `rpm -e --nodeps glibc` will uninstall the glibc package, removing the standard C library upon which all other libraries depend. The damage done by this command is just as bad as the the previous example, but without all the drama. This package also provides the dynamic linker/loader, so now commands will fail with errors like this:
71
-
72
-
```
73
-
[root@cute-bedbug ~]# reboot
74
-
-bash: /sbin/reboot: /lib64/ld-linux-x86-64.so.2: bad ELF interpreter: No such file or directory
75
-
```
76
-
77
-
If you want to do a `reboot` command, use `echo b > /proc/sysrq-trigger` instead.
78
-
79
-
#### Break the application
54
+
### Step 2 - Break your application
80
55
81
56
- In [Exercise 1.6: Step 5](../1.6-my-pet-app/README.md#step-5---run-another-pre-upgrade-report), we observed a pre-upgrade finding warning of a potential risk that our `temurin-17-jdk` 3rd-party JDK runtime package might be removed during the upgrade in case it had unresolvable dependencies. Of course, we know this did not happen because our pet app is still working perfectly.
82
57
@@ -97,23 +72,11 @@ Mess up your app server by choosing one of the following suggestions or dream up
97
72
98
73
This is a realistic example of application impact that can be reversed by rolling back the upgrade.
99
74
100
-
#### Wipe the boot record
101
-
102
-
- The `dd if=/dev/zero of=/sys/block/* count=1` command will clobber the master boot record of your instance. It's rather insidious because you will see that everything continues to function perfectly after running this command, but after you do a `reboot` command, the instance will not come back up again.
103
-
104
-
#### Fill up your disk
105
-
106
-
- Try the `while fallocate -l9M $((i++)); do true; done; yes > $((i++))` command. While there are many ways you can consume all the free space in a filesystem, this command gets it done in just a couple seconds. Use a `df -h /` command to verify your root filesystem is at 100%.
107
-
108
-
#### Set off a fork bomb
109
-
110
-
- The shell command `:(){ :|:& };:` will trigger a [fork bomb](https://en.wikipedia.org/wiki/Fork_bomb). When this is done with root privileges, system resources will be quickly exhausted resulting in the server entering a "hung" state. Use the fork bomb if you want to demonstrate rolling back a server that has become unresponsive.
111
-
112
75
## Conclusion
113
76
114
77
Congratulations, you have trashed one of your app servers. Wasn't that fun?
115
78
116
-
In the next exercise, you will untrash it by rolling back.
79
+
In the next exercise, you will untrash it by rolling back the upgrade.
Copy file name to clipboardExpand all lines: exercises/ansible_ripu/3.2-rollback/README.md
+8-6
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
-[Table of Contents](#table-of-contents)
7
7
-[Objectives](#objectives)
8
8
-[Guide](#guide)
9
-
-[Step 1 - Launch the Rollback Workflow Job Template](#step-1---launch-the-rollback-workflow-job-template)
9
+
-[Step 1 - Launch the Rollback Job Template](#step-1---launch-the-rollback-job-template)
10
10
-[Step 2 - Observe the Rollback Job Output](#step-2---observe-the-rollback-job-output)
11
11
-[Step 3 - Check the RHEL Version](#step-3---check-the-rhel-version)
12
12
-[Conclusion](#conclusion)
@@ -26,21 +26,23 @@ We are now here in our exploration of the RHEL in-place automation workflow:
26
26
27
27
After rolling back, the pet app server will be restored to as it was just before entering the upgrade phase of the workflow.
28
28
29
-
### Step 1 - Launch the Rollback Workflow Job Template
29
+
### Step 1 - Launch the Rollback Job Template
30
30
31
31
In this step, we will be rolling back the RHEL in-place upgrade on one of our pet application servers.
32
32
33
33
- Return to the AAP Web UI tab in your web browser. Navigate to Resources > Templates and then open the "AUTO / 03 Rollback" job template. Here is what it looks like:
34
34
35
35

36
36
37
-
- Click the "Launch" button which will bring up a the survey prompt. We only want to do a rollback of one server. To do this, choose the "ALL_rhel" option under "Select inventory group" and then enter the hostname of your chosen pet app server under the "Enter server name" prompt. For example:
37
+
- Click the "Launch" button which will bring up the prompts for submitting the job starting with the limit and variables prompts. We only want to do a rollback of one server. To do this, enter the hostname of your chosen pet app server under the "Limit" prompt. For example:
38
38
39
-

39
+

40
40
41
41
Click the "Next" button to proceed.
42
42
43
-
- Next you will see the job preview prompt, for example:
43
+

44
+
45
+
- Next we see the job template survey prompt asking us to select an inventory group. We already limited the job to one server, so just choose the "ALL_rhel" option and click the "Next" button. This will bring you to the preview of the selected job options and variable settings, for example:
44
46
45
47

46
48
@@ -56,7 +58,7 @@ After launching the rollback playbook job, the AAP Web UI will navigate automati
56
58
57
59

58
60
59
-
Notice in the example above, rolling back was done in just under 2 minutes.
61
+
Notice in the example above, we see the job completed in just under 3 minutes. However, most of that time was spent in the final "Wait for the snapshot to drain" task which holds the job until the snapshot merges finish in the background. The instance was actually rolled back and service ready in just under a minute. Impressive, right?
0 commit comments