Skip to content

Transparent PGP Encryption #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mark-weghorst opened this issue Apr 2, 2025 · 1 comment
Open

Transparent PGP Encryption #757

mark-weghorst opened this issue Apr 2, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@mark-weghorst
Copy link

Background

When we transfer our datasets outbound from z/OS, our information security policies require that the datasets first be PGP encrypted prior to transfer.

When the datasets arrive in our large data platform, we first decrypt them and then run our Spark job. It would be a more secure solution if Cobrix were able to decrypt the data in the byte stream as opposed to decrypting the file in-situ and then running the Spark job against decrypted data

Feature

Add support to enable Cobrix to read a PGP encyrpted dataset when provided with a valid encryption key.

Ideally this feature should not allow the key to be read from a filesystem, or contained in code and only support key storage in a secure key vault.

As for the key vaults that should be supported, I would suggest the following support list which would cover all of the most commonly used commercial solutions

  • Amazon Web Services Key Manager
  • Azure Key Vault
  • Google Cloud Platform Secret. Manager (my own needs)
  • Hashicorp Vault
@mark-weghorst mark-weghorst added the enhancement New feature or request label Apr 2, 2025
@yruslan
Copy link
Collaborator

yruslan commented Apr 10, 2025

Hi @mark-weghorst ,

This is an interesting request. A couple of questions.

  1. is there a Spark source that supports this? This is so we could look at the implementation and do something similar potentially.

  2. Usually, secrets are managed externally to Spark, and provided as options. Something like:

    spark.read.option("secret", "xyz").parquet("s3:/bucket/bath")

    (more a suggestion than a question)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants