Skip to content

NC | Lifecycle | GPFS ILM integration #8923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

romayalon
Copy link
Contributor

@romayalon romayalon commented Apr 1, 2025

Describe the Problem

Today, NooBaa NC lifecycle worker is running a directory scan for both POSIX and GPFS file systems.
A method for optimizing the scan on GPFS file system is to generate and apply GPFS ILM policies based on the configured lifecycle configuration.

Explain the Changes

Note - currently we support only expiry + filter
(non current days is not supported yet)
Implementation changes -

  1. Changed config.NC_LIFECYCLE_GPFS_ILM_ENABLED to be true by default.
  2. Added a directory for storing the ilm policies - const ILM_POLICIES_TMP_DIR = path.join(config.NC_LIFECYCLE_LOGS_DIR, 'lifecycle_ilm_policies') - notice it's inside the lifecycle logs dir which means it's not shared between nodes.
  3. Added a directory for storing the candidates files - const ILM_CANDIDATES_TMP_DIR = path.join(config.NC_LIFECYCLE_LOGS_DIR, 'lifecycle_ilm_candidates'); - notice it's inside the lifecycle logs dir which means it's not shared between nodes.
  4. Added the following new functions for GPFS ILM policies usage -
  • create_gpfs_candidates_files() - creates a candidates file per mount point that is used by at least one bucket
      1. creates a map of mount point to buckets
      1. for each bucket -
    • 2.1. finds the mount point it belongs to
    • 2.2. convert the bucket's lifecycle policy to a GPFS ILM policy
    • 2.3. concat the bucket's GPFS ILM policy to the mount point policy file string
      1. for each mount point -
    • 3.1. writes the ILM policy to a tmp file
      1. creates the candidates file by applying the ILM policy
  • get_candidates_by_expiration_rule_gpfs() - does the following -
      1. parses the candidates from the ILM candidates file
  • convert_lifecycle_policy_to_gpfs_ilm_policy() - converts the base definition + path search and then calls the next 2 functions for converting the filter and expiry lifecycle configuration properties to ILM policy.
  • convert_expiry_rule_to_gpfs_ilm_policy ()
  • convert_filter_to_gpfs_ilm_policy()
  • write_tmp_ilm_policy() - writes the generated ilm policy to a tmp file under ILM_POLICIES_TMP_DIR.
  • get_candidates_by_gpfs_ilm_policy() - applies the policy by using mmapplypolicy - this command will generate the candidates file under ILM_CANDIDATES_TMP_DIR.
  • parse_candidates_from_gpfs_ilm_policy() - reads 1000 lines of the ILM candidates file, and if the rule state has a candidates_file_offset and it's not finished it will restaet reading from it.

Issues: Fixed #xxx / Gap #xxx

  1. Directory objects tests/nested objects - implemented and checked
  2. Check --continue implementation after NC | lifecycle | continue last run #8925 will get merged - according to Nadav it should work
  3. Gap - optimization on GPFS ILM policy - instead of mountpoint it's better to use fileset (the lowest common fileset of buckets on the same file system)
  4. Gap - noncurrent_version_expiration
  5. Gap - Content dir size 0 optimization will not be found by these code changes

Testing Instructions:

  1. sudo jest --testRegex=jest_tests/test_nc_lifecycle_gpfs
    Manual tests -
  2. Create a directory for storage -mkdir /mnt/gpfs0/romy/
  3. Create an account - `noobaa-cli account add --name account1 --user root --new_buckets_path=/mnt/gpfs0/romy/
  4. Create a directory for objects stored in the bucket bucket1 - mkdir /mnt/gpfs0/romy/bucket1_storage/
  5. Create a bucket - noobaa-cli bucket add --name bucket1 --owner account1 --path /mnt/gpfs0/romy/bucket1_storage/
  6. Install AWS CLI -
sudo dnf install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
  1. Create an alias to the S3 command - alias s3-nb='AWS_ACCESS_KEY_ID=<access_key> AWS_SECRET_ACCESS_KEY=<secret_key> aws --endpoint https://localhost:6443 --no-verify-ssl'
  2. Create a lifecycle policy file -
    vi lifecycle.json -
{
    "Rules": [
        {
            "ID": "Expiration Rule",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "a"
            },
            "Expiration": {
                "Date": "2022-07-12"
            }
        }
    ]
}
  1. Put the policy on bucket1 - s3-nb s3api put-bucket-lifecycle-configuration --bucket bucket1 --lifecycle-configuration file://lifecycle.json
  2. Check that the policy was configured - s3-nb s3api get-bucket-lifecycle-configuration --bucket bucket1
  3. Create objects -
echo "mdalmdlamdlakd" > /mnt/gpfs0/romy/bucket1_storage/a.txt
echo "mdalmdlamdlakd" > /mnt/gpfs0/romy/bucket1_storage/a1.txt
echo "mdalmdlamdlakd" > /mnt/gpfs0/romy/bucket1_storage/a2.txt
echo "mdalmdlamdlakd" > /mnt/gpfs0/romy/bucket1_storage/b.txt
  1. Run lifecycle worker manually - noobaa-cli lifecycle --disable_runtime_validation
  2. Check that the ilm policy was created under /var/log/noobaa/lifecycle/lifecycle_ilm_policies/
  3. Check that a candidates file was created under /var/log/noobaa/lifecycle/lifecycle_ilm_candidates/
  4. Check that files starting with 'a' where deleted and the file starting with 'b' was not deleted.
  • Doc added/updated
  • Tests added

@romayalon romayalon force-pushed the romy-lifecycle-gpfs-integration branch 7 times, most recently from 2280eb5 to 4d1e32f Compare April 8, 2025 08:00
@romayalon romayalon force-pushed the romy-lifecycle-gpfs-integration branch 4 times, most recently from c7e3ab5 to 6dcabf2 Compare April 10, 2025 09:19
const in_versions_dir = path.join(bucket_path, '/.versions/%');
const in_nested_versions_dir = path.join(bucket_path, '/%/.versions/%');
let path_policy = ``;
if (expiration && !noncurrent_version_expiration?.days) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we have both expiration and noncurrent_version_expiration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on this PR I don't plan to support noncurrent_version_expiration really, removing it

if (file_key.startsWith(config.NSFS_FOLDER_OBJECT_NAME)) {
file_key = file_path.replace(bucket_json.path, '').replace(config.NSFS_FOLDER_OBJECT_NAME, '');
}
parsed_candidates_array.push({ key: file_key });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that for notifications you need more information like size and eTag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these mandatory? I don't have the ETag in the candidate line, I can head object if you think it's needed

@@ -526,6 +796,7 @@ class NCLifecycle {
} else {
rule_state.is_finished = true;
}
this.lifecycle_run_status.buckets_statuses[bucket_json.name].rules_statuses[lifecycle_rule.id].state = rule_state;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't do anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? it sets the rule state

@romayalon romayalon force-pushed the romy-lifecycle-gpfs-integration branch 3 times, most recently from eadca77 to ff156da Compare April 10, 2025 14:09
@romayalon romayalon requested a review from nadavMiz April 10, 2025 14:26
@romayalon romayalon force-pushed the romy-lifecycle-gpfs-integration branch 3 times, most recently from 0367811 to 7e207e0 Compare April 14, 2025 09:21
@romayalon romayalon force-pushed the romy-lifecycle-gpfs-integration branch from 7e207e0 to d91d515 Compare April 14, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants