- Upgrade
- Introduction
- Online upgrade
- Offline upgrade
- Additional parameters
- Run apply after upgrade
- Kubernetes applications
- How to upgrade Kafka
- Migration from Open Distro for Elasticsearch & Kibana to OpenSearch and OpenSearch Dashboards
- Open Distro for Elasticsearch upgrade
- Node exporter upgrade
- RabbitMQ upgrade
- Kubernetes upgrade
- PostgreSQL upgrade
- Terraform upgrade from Epiphany 1.x to 2.x
From Epicli 0.4.2 and up the CLI has the ability to perform upgrades on certain components on a cluster. The components it currently can upgrade and will add are:
NOTE
There is an assertion to check whether K8s version is supported before running upgrade.
- Kubernetes (master and nodes). Supported versions: v1.18.6 (Epiphany 0.7.1+), v1.22.4 (Epiphany 1.3.0+)
- common: Upgrades all common configurations to match them to current Epiphany version
- repository: Adds the repository role needed for component installation in current Epiphany version
- image_registry: Adds the image_registry role needed for offline installation in current Epiphany version
The component upgrade takes the existing Ansible build output and based on that performs the upgrade of the currently
supported components. If you need to re-apply your entire Epiphany cluster a manual adjustment of the input yaml is
needed to the latest specification which then should be applied with epicli apply...
. Please
see Run apply after upgrade chapter for more details.
Note about upgrade from pre-0.8 Epiphany:
-
If you need to upgrade a cluster deployed with
epicli
in version earlier than 0.8, you should make sure that you've got enough disk space on master (which is used as repository). If you didn't extend OS disk on master during deployment process, you probably have only 32 GB disk which is not enough to properly upgrade cluster (we recommend at least 64 GB). Before you run upgrade, please extend OS disk on master machine according to cloud provider documentation: AWS, Azure. -
If you use logging-machine(s) already in your cluster, it's necessary to scale up those machines before running upgrade. This allows to ensure you've got enough resources to run ELK stack in newer version. We recommend to use at least DS2_v2 Azure size (2 CPUs, 7 GB RAM) machine, or its equivalent on AWS and on-prem installations. It's very related to amount of data you'll store inside. Please see logging documentation for more details.
Your airgapped existing cluster should meet the following requirements:
- The cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate which each other and have access to the internet:
- The cluster machines/vm`s are upgraded to the following versions:
- AlmaLinux 8.4+
- RedHat 8.4+
- Ubuntu 20.04
- The cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
- A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has Epicli 0.4.2 or up running. Note. To run Epicli check the Prerequisites
Start the upgrade with:
epicli upgrade -b /buildoutput/
This will backup and upgrade the Ansible inventory in the provided build folder /buildoutput/
which will be used to
perform the upgrade of the components.
Your airgapped existing cluster should meet the following requirements:
- The airgapped cluster machines/vm`s are connected by a network or virtual network of some sorts and can communicate with each other:
- The airgapped cluster machines/vm`s are upgraded to the following versions:
- AlmaLinux 8.4+
- RedHat 8.4+
- Ubuntu 20.04
- The airgapped cluster machines/vm`s should be accessible through SSH with a set of SSH keys you provided and configured on each machine yourself.
- A requirements machine that:
- Runs the same distribution as the airgapped cluster machines/vm`s (AlmaLinux 8, RedHat 8, Ubuntu 20.04)
- Has access to the internet.
- A provisioning machine that:
- Has access to the SSH keys
- Has access to the build output from when the cluster was first created.
- Is on the same network as your cluster machines
- Has Epicli 0.4.2 or up running.
NOTE
Before running epicli
, check the Prerequisites
To upgrade the cluster components run the following steps:
-
First we need to get the tooling to prepare the requirements for the upgrade. On the provisioning machine run:
epicli prepare --os OS --arch ARCH
Where:
- OS should be
almalinux-8
,rhel-8
,ubuntu-20.04
- ARCH should be
x86_64
,arm64
This will create a directory called
prepare_scripts
with the needed files inside. - OS should be
-
The scripts in the
prepare_scripts
will be used to download all requirements. To do that, copy theprepare_scripts
folder over to the requirements machine and run the following command:download-requirements.py /requirementsoutput/ OS
Where:
- OS should be
almalinux-8
,rhel-8
,ubuntu-20.04
,detect
- /requirementsoutput/ where to output downloaded requirements
This will run the download-requirements script for target OS type and save requirements under /requirementsoutput/. Once run successfully the
/requirementsoutput/
needs to be copied to the provisioning machine to be used later on. - OS should be
-
Finally, start the upgrade with:
epicli upgrade -b /buildoutput/ --offline-requirements /requirementsoutput/
This will backup and upgrade the Ansible inventory in the provided build folder
/buildoutput/
which will be used to perform the upgrade of the components. The--offline-requirements
flag tells Epicli where to find the folder with requirements (/requirementsoutput/
) prepared in steps 1 and 2 which is needed for the offline upgrade.
The epicli upgrade
command has additional flags:
-
--wait-for-pods
. When this flag is added, the Kubernetes upgrade will wait until all pods are in the ready state before proceeding. This can be useful when a zero downtime upgrade is required. Note: that this can also cause the upgrade to hang indefinitely. -
--upgrade-components
. Specify comma separated component names, so the upgrade procedure will only process specific ones. List cannot be empty, otherwise execution will fail. By default, upgrade will process all components if this parameter is not providedExample:
epicli upgrade -b /buildoutput/ --upgrade-components "kafka,filebeat"
NOTE
Parameter --upgrade-components
should be use only for develop or testing purposes. Do not use this option for
production environments, since it may generate additional errors when running apply
after that kind of upgrade
Currently, Epiphany does not fully support apply after upgrade. There is a possibility to re-apply configuration from
newer version of Epicli but this needs some manual work from Administrator. Re-apply on already upgraded cluster needs
to be called with --no-infra
option to skip Terraform part of configuration. If apply
after upgrade
is run
with --no-infra
, the used system images from the older Epiphany version are preserved to prevent the destruction of
the VMs. If you plan modify any infrastructure unit (e.g., add Kubernetes Node) you need to create machine by yourself
and attach it into configuration yaml. While running epicli apply...
on already upgraded cluster you should use yaml
config files generated in newer version of Epiphany and apply changes you had in older one. If the cluster is upgraded
to version 0.8 or newer you need also add additional feature mapping for repository role as shown on example below:
---
kind: epiphany-cluster
name: clustername
provider: azure
specification:
admin_user:
key_path: /path/to/id_rsa
name: operations
components:
repository:
count: 0 # Set repository to 0 since it's introduced in v0.8
kafka:
count: 1
kubernetes_master:
count: 1
kubernetes_node:
count: 2
load_balancer:
count: 1
logging:
count: 1
monitoring:
count: 1
postgresql:
count: 1
rabbitmq:
count: 0
opensearch:
count: 0
name: clustername
prefix: 'prefix'
title: Epiphany cluster Config
---
kind: configuration/feature-mappings
title: "Feature mapping to components"
provider: azure
name: default
specification:
mappings:
kubernetes_master:
- kubernetes-master
- helm
- applications
- node-exporter
- filebeat
- firewall
- repository # add repository here
- image-registry # add image-registry here
...
To upgrade applications on Kubernetes to the desired version after epicli upgrade
you have to:
- generate new configuration manifest using
epicli init
- in case of generating minimal configuration manifest (without --full argument), copy and paste the default configuration into it
- run
epicli apply
NOTE
The above link points to develop branch. Please choose the right branch that suits to Epiphany version you are using.
Kafka will be automatically updated to the latest version supported by Epiphany. You can check the latest supported version here. Kafka brokers are updated one by one - but the update procedure does not guarantee "zero downtime" because it depends on the number of available brokers, topics, and partitioning configuration. Note that old Kafka binaries are removed during upgrade.
Redundant ZooKeeper configuration is also recommended, since service restart is required during upgrade - it can cause ZooKeeper unavailability. Having at least two ZooKeeper services in ZooKeepers ensemble you can upgrade one and then start with the rest one by one.
More detailed information about ZooKeeper you can find in ZooKeeper documentation.
NOTE
Make sure you have a backup before proceeding to migration steps described below!
Following the decision of Elastic NV[1] on ceasing open source options available for Elasticsearch and Kibana and releasing them under the Elastic license (more info here) Epiphany team decided to implement a mechanism of automatic migration from ElasticSearch 7.10.2 to OpenSearch 1.2.4.
It is important to remember, that while the new platform makes an effort to continue to support a broad set of third party tools (ie. Beats) there can be some drawbacks or even malfunctions as not everything has been tested or has explicitly been added to OpenSearch compatibility scope[2]. Additionally some of the components (ie. ElasticSearch Curator) or some embedded service accounts ( ie. kibanaserver) can be still found in OpenSearch environment but they will be phased out.
Keep in mind, that for the current version of OpenSearch and OpenSearch Dashboards it is necessary to include the filebeat
component along with the loggging one in order to implement the workaround for Kibana API not available bug.
Upgrade of the ESS/ODFE versions not shipped with the previous Epiphany releases is not supported. If your environment is customized it needs to be standardized ( as described in this table ) prior to running the subject migration.
Migration of Elasticsearch Curator is not supported. More info on use of Curator in OpenSearch environment can be found here.
[2] https://opensearch.org/docs/latest/clients/agents-and-ingestion-tools/index/
Upgrade will be performed automatically when the upgrade procedure detects your logging
, opensearch
or kibana
hosts.
Upgrade of Elasticsearch uses API calls (GET, PUT, POST) which requires an admin TLS certificate. By default, Epiphany
generates self-signed certificates for this purpose but if you use your own, you have to provide the admin certificate's
location. To do that, edit the following settings changing cert_path
and key_path
.
logging:
upgrade_config:
custom_admin_certificate:
cert_path: /etc/elasticsearch/custom-admin.pem
key_path: /etc/elasticsearch/custom-admin-key.pem
opensearch:
upgrade_config:
custom_admin_certificate:
cert_path: /etc/elasticsearch/custom-admin.pem
key_path: /etc/elasticsearch/custom-admin-key.pem
They are accessible via the defaults of upgrade
role (/usr/local/epicli/data/common/ansible/playbooks/roles/upgrade/defaults/main.yml
).
NOTE
Before upgrade procedure, make sure you have a data backup, and you are familiar with breaking changes.
Starting from Epiphany v0.8.0 it's possible to upgrade node exporter according to components file. Upgrade will be performed automatically when the upgrade procedure detects node exporter hosts.
NOTE
Before upgrade procedure, make sure you have a data backup. Check that the node or cluster is in a good state: no alarms are in effect, no ongoing queue synchronisation operations and the system is otherwise under a reasonable load. For more information visit RabbitMQ site.
With the version of Epiphany 0.9 it's possible to upgrade RabbitMQ to v3.8.9. It requires Erlang system packages upgrade that is done automatically to v23.1.4. Upgrade is performed in offline mode after stopping all RabbitMQ nodes. Rolling upgrade is not supported by Epiphany, and it is advised not to use this approach when Erlang needs to be upgraded.
Before K8s version upgrade make sure that deprecated API versions are not used:
NOTE
Before upgrade procedure, make sure you have a data backup.
Epiphany upgrades PostgreSQL 10 to 13 with the following extensions (for versions, see COMPONENTS.md):
- PgAudit
- PgBouncer
- PgPool
- repmgr
The prerequisites below are checked by the preflight script before upgrading PostgreSQL. Nevertheless, it's good to check these manually before doing any upgrade:
-
Diskspace: When Epiphany upgrades PostgreSQL 10 to 13 it will make a copy of the data directory on each node to ensure easy recovery in the case of a failed data migration. It is up to the user to make sure there is enough space available. The used rule is:
total storage used on the data volume + total size of the data directory < 95% of total size of the data volume
We use 95% of used storage after data directory copy as some space is needed during the upgrade.
-
Cluster health: Before starting the upgrade the state of the PostgreSQL cluster needs to be healthy. This means that executing:
repmgr cluster show repmgr node check test $(repmgr node check | grep -c CRITICAL) -eq 0
Should not fail and return 0 as exit code.
Upgrade procedure is based on PostgreSQL documentation and requires downtime as there is a need to stop old service(s) and start new one(s).
There is a possibility to provide a custom configuration for upgrade with epicli upgrade -f
, and there are a few
limitations related to specifying parameters for upgrade:
-
If there were non-default values provided for installation (
epicli apply
), they have to be used again not to be overwritten by defaults. -
wal_keep_segments
parameter for replication is replaced by wal_keep_size with the default value of 500 MB. Previous parameter is not supported. -
archive_command
parameter for replication is set to/bin/true
by default. It was planned to disable archiving, but changes toarchive_mode
require a full PostgreSQL server restart, whilearchive_command
changes can be applied via a normal configuration reload. See documentation. -
There is no possibility to disable an extension after installation, so
specification.extensions.*.enabled: false
value will be ignored during upgrade if it was set totrue
during installation.
Epiphany runs pg_upgrade
(on primary node only) from a dedicated location (pg_upgrade_working_dir
). For Ubuntu, this
is /var/lib/postgresql/upgrade/$PG_VERSION
and for RHEL /var/lib/pgsql/upgrade/$PG_VERSION
. Epiphany saves
there output from pg_upgrade
as logs which should be checked after the upgrade.
As the "Post-upgrade processing" step in PostgreSQL documentation
states if any post-upgrade processing is required, pg_upgrade
will issue warnings as it completes. It will also
generate SQL script files that must be run by the administrator. There is no clear description in which cases they are
created, so please check logs in pg_upgrade_working_dir
after the upgrade to see if additional steps are required.
Because optimizer statistics are not transferred by pg_upgrade
, you may need to run a command to regenerate that
information after the upgrade. For this purpose, consider running analyze_new_cluster.sh
script (created
in pg_upgrade_working_dir
)
as postgres
user.
For safety Epiphany does not remove old PostgreSQL data. This is a user responsibility to identify if data is ready to
be removed and take care about that. Once you are satisfied with the upgrade, you can delete the old cluster's data
directories by running delete_old_cluster.sh
script (created in pg_upgrade_working_dir
on primary node) on all
nodes. The script is not created if you have user-defined tablespaces inside the old data directory. You can also
delete the old installation directories (e.g., bin
, share
). You may delete pg_upgrade_working_dir
on primary node once the upgrade is completely over.
From Epiphany 1.x to 2.x the Terraform stack received the following major updates:
- Terraform 0.12.6 to 1.1.3
- Azurerm provider 1.38.0 to 2.91.0
- AWS provider 2.26 to 3.71.0
- Removal of auto-scaling-groups in favor of plain EC2 instances on AWS.
These introduce some breaking changes which will require manual steps for upgrading an existing 1.x clusters. As this is not straight forward we recommend deploying a new cluster on 2.x and migrating data instead.
If you do want to upgrade a 1.x manually it will require you to upgrade between a few different versions of Terraform. The basic steps are described below for each provider. As always ensure backup of any data required.
Final note is that you can also leave the Terraform in place and use the --no-infra
flag to skip applying the new Terraform scripts. This however makes you unable to make any changes to your cluster layout.
Notes:
- If you made any manual changes to your cluster infrastructure outside of Terraform this might cause issues.
- Some resources might be changed or added which are usually security-groups and security-groups-association or NIC's. This is normal behaviour and is ok, just make sure no other resources like VM's are included when reviewing the
terraform plan
output. - Only run
terraform apply
ifterraform plan
shows your infrastructure does not match the configuration. - Manual Terraform upgrade up to v1.0.x should be completed before running
epicli apply
command with Epiphany 2.x. - Terraform can be installed as a binary package or by using package managers, see more: https://learn.hashicorp.com/tutorials/terraform/install-cli
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-13
General steps:
- Download the latest Terraform v0.13.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the
build/clustername/terraform
folder and follow the steps if asked:terraform init terraform 0.13upgrade terraform plan terraform apply (if needed)
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/0-14
General steps:
- Download the latest Terraform v0.14.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the
build/clustername/terraform
folder and follow the steps if asked:terraform init terraform plan terraform apply (if needed)
Note: From v0.14.x we can upgrade straight to v1.0.x. No need to upgrade to v0.15.x first.
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-0
General steps:
- Download the latest Terraform v1.0.x: https://releases.hashicorp.com/terraform/
- Run the following sets of commands in the
build/clustername/terraform
folder and follow the steps if asked:terraform init terraform plan terraform apply (if needed)
In this step we also force the upgrade from Azurerm provider 1.38.0 to 2.91.0 which requires a few more steps to resolve some pending issues. At this point, the steps assume that you are already running Epiphany 2.x image.
The official documentation can be found here: https://www.terraform.io/language/upgrade-guides/1-1
General steps:
- Run epicli to generate new Azurerm provider Terraform scripts:
After the Terraform scripts generation
epicli apply -f data.yml
terraform init ...
will result in the following error:Error: Failed to query available provider packages
- To fix the issue from previous step manually run from the epicli container in
build/clustername/terraform
:terraform init -upgrade
- Now re-run epicli again:
This might succeed but depending on the cluster configuration this might lead to several errors for each NIC:
epicli apply -f data.yml
Error: A resource with the ID "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/prefix-name-rg/providers/Microsoft.Network/networkInterfaces/prefix-name-component-nic-x|/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/prefix-name-rg/providers/Microsoft.Network/networkSecurityGroups/prefix-name-component-nsg-x" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_network_interface_security_group_association" for more information.
- To resolve the above error we need to import the azurerm_network_interface_security_group_association for each NIC by using the
terraform input
command for each of the above errors using the ID and the propper name from the xxx_prefix-name-component-nsga-x.tf Terraform resources like so:terraform import 'azurerm_network_interface_security_group_association.prefix-name-component-nsga-0' '/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/prefix-name-rg/providers/Microsoft.Network/networkInterfaces/prefix-name-component-nic-x|/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/prefix-name-rg/providers/Microsoft.Network/networkSecurityGroups/prefix-name-component-nsg-x'
- Once all azurerm_network_interface_security_group_association are imported sucessfully you can re-run epicli apply one more time:
epicli apply -f data.yml
The Terraform for AWS deployments between Epiphany 1.x and 2.x is not compatible and migration is not possible without destruction of the enviroment. The only options is to deploy a new cluster and migrate the data.