Transparent PGP Encryption #757

mark-weghorst · 2025-04-02T21:06:49Z

Background

When we transfer our datasets outbound from z/OS, our information security policies require that the datasets first be PGP encrypted prior to transfer.

When the datasets arrive in our large data platform, we first decrypt them and then run our Spark job. It would be a more secure solution if Cobrix were able to decrypt the data in the byte stream as opposed to decrypting the file in-situ and then running the Spark job against decrypted data

Feature

Add support to enable Cobrix to read a PGP encyrpted dataset when provided with a valid encryption key.

Ideally this feature should not allow the key to be read from a filesystem, or contained in code and only support key storage in a secure key vault.

As for the key vaults that should be supported, I would suggest the following support list which would cover all of the most commonly used commercial solutions

Amazon Web Services Key Manager
Azure Key Vault
Google Cloud Platform Secret. Manager (my own needs)
Hashicorp Vault

yruslan · 2025-04-10T13:12:46Z

Hi @mark-weghorst ,

This is an interesting request. A couple of questions.

is there a Spark source that supports this? This is so we could look at the implementation and do something similar potentially.
Usually, secrets are managed externally to Spark, and provided as options. Something like:
```
spark.read.option("secret", "xyz").parquet("s3:/bucket/bath")
```
(more a suggestion than a question)

mark-weghorst added the enhancement New feature or request label Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transparent PGP Encryption #757

Transparent PGP Encryption #757

mark-weghorst commented Apr 2, 2025

yruslan commented Apr 10, 2025

Uh oh!

Transparent PGP Encryption #757

Transparent PGP Encryption #757

Comments

mark-weghorst commented Apr 2, 2025

Background

Feature

yruslan commented Apr 10, 2025

Uh oh!