Skip to content

Commit 9ed3e57

Browse files
Merge pull request #978 from juanpablosalas/fi_tool
Moving FI tool scripts to CMSRucio project
2 parents 2d15214 + cfdd064 commit 9ed3e57

File tree

12 files changed

+1591
-0
lines changed

12 files changed

+1591
-0
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: File Invalidation Tool Docker Image CI
2+
3+
on:
4+
push:
5+
tags:
6+
- 'fileinvalidation-*'
7+
8+
jobs:
9+
build:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: Checkout code
13+
uses: actions/checkout@v3
14+
15+
- name: Get tag name
16+
id: tagname
17+
run: echo "tag=${GITHUB_REF#refs/tags/loadtest-}" >> "$GITHUB_ENV"
18+
19+
- name: Login to CERN Harbour
20+
uses: docker/login-action@v2
21+
with:
22+
registry: registry.cern.ch
23+
username: ${{ secrets.HARBOR_USERNAME }}
24+
password: ${{ secrets.HARBOR_TOKEN }}
25+
26+
- name: Build the Docker Image
27+
run: |
28+
cd DMOps/file_invalidation_tool && docker build . \
29+
--file Dockerfile \
30+
--tag registry.cern.ch/${{ vars.HARBOR_REPOSITORY }}/file_invalidation_tool:${{ env.tag }}
31+
32+
- name: Push Image to CERN Harbour
33+
run: docker push registry.cern.ch/${{ vars.HARBOR_REPOSITORY }}/file_invalidation_tool:${{ env.tag }}
34+
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
FROM registry.cern.ch/cmsmonitoring/cmsmon-spark:latest
2+
3+
# Set environment variables
4+
ENV PYCURL_SSL_LIBRARY=nss \
5+
X509_USER_CERT=/certs/usercert.pem \
6+
X509_USER_KEY=/certs/userkey.pem \
7+
USER=dmtops \
8+
RUCIO_CONFIG=/cvmfs/cms.cern.ch/rucio/rucio.cfg \
9+
RUCIO_ACCOUNT=transfer_ops \
10+
DRIVERPORT=5001 \
11+
BMPORT=5002 \
12+
UIPORT=5003
13+
14+
ADD http://repository.egi.eu/sw/production/cas/1/current/repo-files/egi-trustanchors.repo /etc/yum.repos.d/egi.repo
15+
16+
17+
# Install dependencies for Rucio, DBS and Gfal
18+
RUN dnf install -y libcurl-devel openssl-devel libffi-devel ca-policy-egi-core\
19+
&& dnf install -y cmake gfal2-devel libcurl-devel \
20+
&& dnf install -y gfal2-all python3-gfal2 python3-gfal2-util \
21+
&& dnf -y groupinstall "Development Tools" || true \
22+
&& pip3 install cx-Oracle SQLAlchemy==1.4.49 dbs3-client rucio-clients \
23+
&& pip3 install --compile --global-option="--with-nss" --no-cache-dir pycurl
24+
25+
# Expose ports
26+
EXPOSE 5001
27+
EXPOSE 5002
28+
EXPOSE 5003
29+
30+
# Copy code
31+
COPY ./src /src/
32+
33+
# Set working directory
34+
WORKDIR /src
35+
36+
RUN chmod 755 /src/submit_invalidation.sh
37+
38+
# Set entrypoint
39+
ENTRYPOINT ["python3", "run_invalidations.py"]
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# File Invalidation Tool for Rucio and DBS
2+
## Overview
3+
4+
This guide outlines the steps to run the file invalidation tool for Rucio and DBS using Docker image. The tool assists in invalidating specific files, datasets or containers within these systems, to ensure data consistency. Additionally, it has a running mode to check the integrity of files in a given RSE(checksum validation), and invalidate the corrupted replicas. Finally, the tool can also be used to invalidate all files in a given site.
5+
6+
## Prerequisites, Folder Structure and tool input
7+
8+
### Tool Input
9+
10+
The tool has 5 running modes. It's important that your cert and key (decrypted) have enough permissions to invalidate on DBS and declare replicas as corrupted on Rucio, additionally to this it they require the following inputs and parameters:
11+
12+
| Running Mode | Description | Tool Mode | Input File | Params | Auth Requirements |
13+
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
14+
| Global Invalidation | Invalidate all files from received files, datasets or containers list on Rucio and DBS | `global` | `<filename>.txt`: txt file containing list of files, datasets or containers | `--reason <reason>`: comment for invalidation<br>`--dry-run`(**optional**): Simulate the execution without actually performing the file invalidation<br>`--erase-mode`(**optional**): Erase empty DIDs | `./certs/usercert.pem`<br>`./certs/userkey.pem`<br>`./secrets/dmtops.keytab`|
15+
| DBS Invalidation | Invalidate all files from received files, datasets or containers list only on DBS | `only-dbs` | `<filename>.txt`: txt file containing list of files, datasets or containers | `--reason <reason>`: comment for invalidation<br>`--dry-run`(**optional**): Simulate the execution without actually performing the file invalidation<br>`--erase-mode`(**optional**): Erase empty DIDs | `./certs/usercert.pem`<br>`./certs/userkey.pem`<br>`./secrets/dmtops.keytab`|
16+
| Rucio Invalidation | Invalidate all files from received files, datasets or containers list only on Rucio | `only-rucio` | `<filename>.txt`: txt file containing list of files, datasets or containers | `--reason <reason>`: comment for invalidation<br>`--dry-run`(**optional**): Simulate the execution without actually performing the file invalidation<br>`--erase-mode`(**optional**): Erase empty DIDs | `./certs/usercert.pem`<br>`./certs/userkey.pem`<br>`./secrets/dmtops.keytab`|
17+
| Integrity Validation | Validate integrity of files in the given RSE | `integrity-validation` | `<filename>.csv`: csv file containing list of files and RSE [FILENAME,RSE_EXPRESSION] | `--dry-run`(**optional**): Simulate the execution without actually performing the file invalidation in case of being corrupted | `./certs/usercert.pem`<br>`./certs/userkey.pem`|
18+
| Site Invalidation | Invalidate in Rucio all files from received list at a specific site | `site-invalidation` | `<filename>.txt`: txt file containing list of files, datasets or containers | `--rse <rse>`: RSE to invalidate at<br>`--reason <reason>`: comment for invalidation<br>`--dry-run`(**optional**): Simulate the execution without actually performing the file invalidation | `./certs/usercert.pem`<br>`./certs/userkey.pem`<br>`./secrets/dmtops.keytab`|
19+
20+
> **Note:** The userkey.pem should be decrypted.
21+
22+
??? Example
23+
**USERKEY decryption**
24+
`openssl rsa -in <encrypted_userkey> -out userkey.pem`
25+
26+
You would be asked to enter the password.
27+
28+
??? Info
29+
**Checksum Validation Mode**
30+
31+
Some files could be heavy and may lead to exceed your lxplus quota. In case of seeing this error move your working directory to `/eos/user/<first_username_letter>/<username>/` directory.
32+
```Bash
33+
gfal-copy error: 122 (Disk quota exceeded) - errno reported by local system call Disk quota exceeded
34+
```
35+
36+
### Environment
37+
38+
This script is thought to be run on **lxplus** or CERN server with access to `registry.cern.ch` and `/cvmfs/` directory.
39+
40+
Setting all together, the working directory structure can change a bit, but it should look like this:
41+
working_directory/
42+
├── dids.txt / replicas_validation.csv
43+
├── certs/
44+
│ ├── usercert.pem
45+
│ └── userkey.pem
46+
47+
## Run File Invalidation tool
48+
49+
### 1. CERN Registry Authentication
50+
51+
1. Visit [cern registry](https://registry.cern.ch/).
52+
2. Login via OIDC Provider.
53+
3. Click on your username located in the top right.
54+
4. Click on **User Profile**
55+
5. Copy the **CLI Secret**, it will be used in the next step.
56+
57+
### 2. Login into CERN Registry
58+
```Bash
59+
docker login registry.cern.ch -u <username>
60+
```
61+
- `docker login`: Logs in to the Docker registry.
62+
- `registry.cern.ch`: CERN registry URL.
63+
- `-u <username>`: CERN registry username.
64+
65+
It will ask you to enter your password. **Enter your CLI Secret.**
66+
67+
### 3. Run the container
68+
69+
```Bash
70+
docker run -P \
71+
-v "$(pwd)/<input_file>:/input/<input_file>" \
72+
-v "$(pwd)/certs:/certs" \
73+
[-v "$(pwd)/secrets:/secrets" \]
74+
--mount type=bind,source=/cvmfs/,target=/cvmfs/,readonly \
75+
--mount type=bind,source=/etc/gfal2.d/,target=/etc/gfal2.d/,readonly \
76+
--mount type=bind,source=/etc/grid-security/certificates/,target=/etc/grid-security/certificates/,readonly \
77+
--network host --rm registry.cern.ch/cmsrucio/file_invalidation_tool [Tool_Mode_Options]
78+
```
79+
- `docker run`: Executes a Docker container.
80+
- `-P`: Publishes all exposed ports to the host interfaces.
81+
- Volumes mounted:
82+
- `-v "$(pwd)/<input_file>:/input/<input_file>"`: Mounts the containers_inv.txt file from the host to /input/dids.txt within the container.
83+
- `-v "$(pwd)/certs:/certs"`: Mounts the certs directory from the host to /certs within the container. It must contain the usercert.pem and userkey.pem.
84+
- `-v "$(pwd)/secrets:/secrets"`: Mounts the secrets directory from the host to /secrets within the container. It must contain the keytab file.
85+
- `--mount type=bind,source=/cvmfs/,target=/cvmfs/,readonly`: Binds the /cvmfs/ directory on the host as read-only within the container.
86+
- `--mount type=bind,source=/etc/gfal2.d/,target=/etc/gfal2.d/,readonly`: Binds the /etc/gfal2.d/ directory on the host as read-only within the container. Necessary for the integrity-validation mode.
87+
- `--mount type=bind,source=/etc/grid-security/certificates/ ,target=/etc/grid-security/certificates/,readonly`: Binds the /etc/grid-security/certificates/ directory on the host as read-only within the container. Necessary for the proxy-init command.
88+
- `--network host`: Uses the host's network stack within the container.
89+
- `--rm`: Automatically removes the container when it exits.
90+
- `registry.cern.ch/cmsrucio/file_invalidation_tool`: Name of the Docker image to run.
91+
92+
??? Example
93+
94+
```Bash
95+
docker run -P \
96+
-v "$(pwd)/<input_file>:/input/<input_file>.txt" \
97+
-v "$(pwd)/certs:/certs" \
98+
-v "$(pwd)/secrets:/secrets" \
99+
--mount type=bind,source=/cvmfs/,target=/cvmfs/,readonly \
100+
--mount type=bind,source=/etc/grid-security/certificates/,target=/etc/grid-security/certificates/,readonly \
101+
--network host --rm registry.cern.ch/cmsrucio/file_invalidation_tool [global | only-dbs | only-rucio] --reason <reason>
102+
```
103+
104+
```Bash
105+
docker run -P \
106+
-v "$(pwd)/<input_file>:/input/<input_file>.csv" \
107+
-v "$(pwd)/certs:/certs" \
108+
--mount type=bind,source=/cvmfs/,target=/cvmfs/,readonly \
109+
--mount type=bind,source=/etc/gfal2.d/,target=/etc/gfal2.d/,readonly \
110+
--mount type=bind,source=/etc/grid-security/certificates/,target=/etc/grid-security/certificates/,readonly \
111+
--network host --rm registry.cern.ch/cmsrucio/file_invalidation_tool integrity-validation
112+
```
113+
114+
```Bash
115+
docker run -P \
116+
-v "$(pwd)/<input_file>:/input/<input_file>.txt" \
117+
-v "$(pwd)/certs:/certs" \
118+
-v "$(pwd)/secrets:/secrets" \
119+
--mount type=bind,source=/cvmfs/,target=/cvmfs/,readonly \
120+
--mount type=bind,source=/etc/grid-security/certificates/,target=/etc/grid-security/certificates/,readonly \
121+
--network host --rm registry.cern.ch/cmsrucio/file_invalidation_tool site-invalidation --rse <rse> --reason <reason>
122+
```
123+
## Additional Notes
124+
125+
- The tool's output will provide details about the invalidation process.
126+
- User Authorization: Ensure you have the necessary permissions to invalidate on DBS.
127+
- The provided certificates will be used for DBS invalidation, in case of authorization errors, rucio invalidation will not be executed.
128+
- Rucio Invalidation will be done using the the dmtops certificate and transfer_ops account since many users will not have permissions to develop this operation.
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
import click as click
2+
from rucio.client import Client
3+
import pandas as pd
4+
import numpy as np
5+
from pyspark.sql.functions import col, collect_list, concat_ws
6+
from CMSSpark.spark_utils import get_spark_session
7+
from hadoop_queries import get_df_rse_locks, get_df_rse_replicas, get_df_contents
8+
from pyspark.sql.window import Window
9+
10+
11+
@click.command()
12+
@click.option('--filename', required=True, default=None, type=str,
13+
help='Name of the text file having the datasets names')
14+
@click.option('--rse', required=False, default=None, type=str,
15+
help='RSE to look at')
16+
@click.option('--mode', required=False, type=click.Choice(['rucio','spark']), default='rucio', help='List generation mode')
17+
def invalidate_containers(filename,rse, mode):
18+
#TODO: Check rse option
19+
20+
if mode=='rucio':
21+
#Start Rucio Client
22+
rucio_client = Client()
23+
24+
# Read the containers to delete
25+
with open('/input/'+filename, 'r') as file:
26+
containers_to_delete = file.readlines()
27+
containers_to_delete = list(map(str.strip,containers_to_delete))
28+
29+
# Get the list of containers
30+
dict_delete = [{'scope':'cms','name':lfn} for lfn in containers_to_delete]
31+
32+
#Get the content of the containers to delete (content includes filename, dataset and container)
33+
container_files = list(rucio_client.bulk_list_files(dict_delete))
34+
35+
#Files to invalidate on DBS
36+
df_container_files = pd.DataFrame(columns=['name','parent_name'])
37+
df_container_files = pd.concat([df_container_files,pd.DataFrame(container_files)])[['name','parent_name']].rename(columns={'parent_name':'CONTAINER','name':'FILENAME'})
38+
df_container_files['FILENAME'].drop_duplicates().to_csv('/input/dbs_files_inv.txt',index=False, header = False)
39+
40+
#Replicas to declare as bad
41+
file_list = [{'scope':'cms','name':name} for name in df_container_files['FILENAME']]
42+
df_replicas = pd.DataFrame(columns=['name','states'])
43+
if len(file_list)>0:
44+
for arr in np.array_split(file_list,indices_or_sections=int(len(file_list)/1000)+1):
45+
list_replicas_i = list(rucio_client.list_replicas(dids=arr.tolist(),all_states=True))
46+
df_replicas_i = pd.DataFrame(list_replicas_i)[['name','states']]
47+
df_replicas = pd.concat([df_replicas,df_replicas_i])
48+
49+
df_replicas['rses']=df_replicas['states'].apply(dict.keys)
50+
df_replicas['rses']=df_replicas['rses'].apply(lambda s: ';'.join(s))
51+
52+
if rse is not None:
53+
df_replicas = df_replicas[df_replicas.rses.str.contains(rse)]
54+
df_replicas.loc[:,'rses'] = rse
55+
56+
df_replicas[['name','rses']].rename(columns={'name':'FILENAME','rses':'RSES'}).drop_duplicates().to_csv('/input/rucio_replicas_inv.csv',index=False)
57+
58+
#Datasets to erase from Rucio
59+
info_dataset = rucio_client.get_locks_for_dids(dict_delete)
60+
if len(info_dataset)>0:
61+
df_datasets = pd.DataFrame(info_dataset)
62+
if rse is not None:
63+
df_datasets = df_datasets[df_datasets['rse']==rse]
64+
else:
65+
df_datasets = pd.DataFrame(columns=['name'])
66+
df_datasets[['name']].rename(columns={'name':'DATASET'}).drop_duplicates().to_csv('/input/datasets_inv.txt',index=False,header=False)
67+
68+
#Rules to erase
69+
#RSE is exported in case it's tape and require purge_replicas
70+
dataset_list = [{'scope':'cms','name':n,'type':'dataset'} for n in df_datasets['name'].drop_duplicates().values]
71+
df_rules = pd.DataFrame(columns=['rse','rule_id'])
72+
if len(dataset_list)>0:
73+
for arr in np.array_split(dataset_list,indices_or_sections=int(len(dataset_list)/400)+1):
74+
info_rules_i = rucio_client.get_locks_for_dids(arr.tolist())
75+
if len(info_rules_i)>0:
76+
df_rules_i = pd.DataFrame(info_rules_i)[['rse','rule_id']]
77+
df_rules = pd.concat([df_rules,df_rules_i])
78+
79+
if rse is not None:
80+
df_rules['includes_rse'] = df_rules['rse_expression'].apply(lambda exp: {'rse':rse} in list(rucio_client.list_rses(rse_expression=exp)))
81+
df_rules = df_rules.loc[df_rules.includes_rse]
82+
83+
df_rules.columns = df_rules.columns.str.upper()
84+
df_rules[['RULE_ID','RSE']].drop_duplicates().to_csv('/input/rucio_rules_delete.csv',index=False)
85+
else:
86+
spark = get_spark_session(app_name='global_containers_invalidation')
87+
88+
#Read the containers to delete
89+
filename = f'/user/dmtops/{filename}'
90+
df_delete = spark.read.text(filename)
91+
df_delete = df_delete.withColumnRenamed('value','CONTAINER')
92+
93+
#Get the basic df
94+
df_locks = get_df_rse_locks(spark)
95+
df_replicas = get_df_rse_replicas(spark,rse)
96+
df_contents = get_df_contents(spark).alias('co')
97+
98+
#Get the content of the containers to delete (content includes filename, dataset and container)
99+
df_delete = df_delete.join(df_contents,df_delete.CONTAINER==df_contents.CONTAINER,how='inner').select(['co.*']).alias('de')
100+
101+
#Replicas to declare as bad
102+
df_delete = df_delete.join(df_replicas,df_delete.FILENAME==df_replicas.NAME,how='inner').select(['de.*','RSE','REPLICA_STATE']).alias('de')
103+
104+
#Rules protecting the replicas
105+
df_delete = df_delete.join(df_locks,(df_delete.FILENAME==df_locks.NAME) & (df_delete.RSE == df_locks.RSE),how='left').select(['de.*','RULE_ID']).alias('de')
106+
df_delete.cache()
107+
108+
#Files to invalidate on DBS
109+
df_delete.select('FILENAME').distinct().toPandas().to_csv('/input/dbs_files_inv.txt',index=False, header = False)
110+
111+
windowSpec = Window.partitionBy('FILENAME')
112+
df_delete.withColumn("RSES", collect_list(col("RSE")).over(windowSpec)) \
113+
.select(['FILENAME','RSES']).withColumn("RSES", concat_ws(";", "RSES")).distinct().toPandas().to_csv('/input/rucio_replicas_inv.csv',index=False)
114+
115+
#Replicas to erase from Rucio
116+
df_delete.select('DATASET').distinct().toPandas().to_csv('/input/datasets_inv.txt',index=False,header=False)
117+
118+
#RSE is exported in case it's tape and require purge_replicas
119+
df_delete.filter(col('RULE_ID').isNotNull()).select(['RULE_ID','RSE']).distinct()\
120+
.toPandas().to_csv('/input/rucio_rules_delete.csv',index=False)
121+
122+
123+
if __name__ == "__main__":
124+
invalidate_containers()

0 commit comments

Comments
 (0)