Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Script to Generate a Summary of Docker Images Used by Workflows #410

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

bshifaw
Copy link
Collaborator

@bshifaw bshifaw commented May 19, 2023

Added a script that generates a TSV with a list of docker images used in our wdl files. The TSV contains

DOCKER_NAME	LATEST_TAG	USED_TAG	FILE_LINE	WDL_PATH
biocontainers/samtools	v1.9-4-deb_cv1	1.3.1	180	/wdl/deprecated/tasks/dsde_pipelines_tasks/variantcalling.wdl
broadinstitute/gatk	gatkbase-3.1.0	4.2.6.1	217	/wdl/tasks/Transcriptomics/MASSeq.wdl
us.gcr.io/broad-dsp-lrma/lr-10x	0.1.18	0.1.18	139	/wdl/tasks/Transcriptomics/MASSeq.wdl
us.gcr.io/broad-dsp-lrma/lr-transcript_utils	0.0.15	0.0.14	276	/wdl/tasks/Transcriptomics/MASSeq.wdl
us.gcr.io/broad-dsp-lrma/lr-transcript_utils	0.0.15	0.0.14	335	/wdl/tasks/Transcriptomics/MASSeq.wdl
us.gcr.io/broad-dsp-lrma/lr-transcript_utils	0.0.15	0.0.14	45	/wdl/tasks/Transcriptomics/MASSeq.wdl
broadinstitute/picard	latest	2.23.7	1484	/wdl/tasks/Utility/Utils.wdl
gcr.io/cloud-marketplace/google/ubuntu2004	latest	latest	1307	/wdl/tasks/Utility/Utils.wdl
gcr.io/cloud-marketplace/google/ubuntu2004	latest	latest	2261	/wdl/tasks/Utility/Utils.wdl
gcr.io/cloud-marketplace/google/ubuntu2004	latest	latest	2337	/wdl/tasks/Utility/Utils.wdl
quay.io/broad-long-read-pipelines/lr-pacasus	0.3.0	0.3.0	1151	/wdl/tasks/Utility/Utils.wdl
ubuntu	None	19.10	2609	/wdl/tasks/Utility/Utils.wdl
ubuntu	None	hirsute-20210825	638	/wdl/tasks/Utility/Utils.wdl
us.gcr.io/broad-dsp-lrma/lr-align	0.1.28	0.1.26	701	/wdl/tasks/Utility/Utils.wdl
us.gcr.io/broad-dsp-lrma/lr-align	0.1.28	0.1.28	1087	/wdl/tasks/Utility/Utils.wdl
us.gcr.io/broad-dsp-lrma/lr-basic	0.1.2	0.1.1	1367	/wdl/tasks/Utility/Utils.wdl
us.gcr.io/broad-dsp-lrma/lr-basic	0.1.2	0.1.1	1421	/wdl/tasks/Utility/Utils.wdl

@bshifaw bshifaw self-assigned this May 19, 2023
@bshifaw bshifaw requested a review from SHuang-Broad May 19, 2023 21:59
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should be kept if there is an updated python version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's delete it when this PR gets merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bshifaw
Copy link
Collaborator Author

bshifaw commented Aug 23, 2023

dockers.in_use.txt

logging.basicConfig(level=logging.INFO)


# A script to collect which dockers are in use and which latest dockers are available
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to a help message

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

OUT_SUMMARY_TSV = os.path.join(current_dir, "dockers.in_use.tsv")

if os.path.exists(OUT_SUMMARY_TSV):
os.remove(OUT_SUMMARY_TSV)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we backup the existing one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


with open(wdl_path, "r") as file_content:
content = file_content.read()
pattern = re.compile(r'.*docker:.*"')
Copy link
Collaborator

@SHuang-Broad SHuang-Broad Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^\s+docker:\s+\"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having ^ yielded no matches so removed that.
There are some instances where there is no space between : and " so changed + to *

new pattern: \s*docker:\s*"

Copy link
Collaborator Author

@bshifaw bshifaw Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added carrot


# Clear the previous line and print the progress
print(f"Progress: {progress:.2f}%\r", end="")
with open(OUT_SUMMARY_TSV, "a") as tsv_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write as oppsed to append?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

print(f"Progress: {progress:.2f}%\r", end="")
with open(OUT_SUMMARY_TSV, "a") as tsv_file:
# Add header
tsv_file.write(f"DOCKER_NAME\tLATEST_TAG\tUSED_TAG\tFILE_LINE\tWDL_PATH")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should need an \n after WDL_PATH.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why but adding a \n creates an empty line between the header and the first line with the docker-wdl info.

# If the latest tag is not found locally, try to get it from remote
latest_tag = get_latest_remote_docker_tag(docker_name) if latest_tag == "NA" else latest_tag
docker_detail.append(f"{docker_name}\t{latest_tag}\t{used_tag}\t{line_num}\t{wdl_path_sum}")
else:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove else pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bshifaw bshifaw changed the title docker usage sum Adding Script to Generate a Summary of Docker Images Used by Workflows Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants