Skip to content

Commit 7018a2e

Browse files
committed
improve support for hashing field values
1 parent e467747 commit 7018a2e

File tree

7 files changed

+536
-390
lines changed

7 files changed

+536
-390
lines changed

docs/sensitive.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Handling Sensitive Data
2+
3+
`unclogger` will automatically mask sensitive information such as PII, login credentials and the like. By default, the masked data is replaced by a generic string, which can be configured to use a hashing function instead.
4+
5+
6+
## Sensitive Fields
7+
8+
If the name of any field in the structured log message matches one of the listed sensitive names, the value of that field is (recursively) replaced with a safe value:
9+
10+
>>> from unclogger import get_logger
11+
>>> logger = get_logger("test logger")
12+
>>> logger.info("clean password", password="blabla", foo={"Email": "test@example.xcom"})
13+
{
14+
"password": "********",
15+
"foo": {"Email": "********"},
16+
"event": "clean password",
17+
"logger": "test logger",
18+
"level": "info",
19+
"timestamp": "2022-02-02T10:53:52.245833Z"
20+
}
21+
22+
A basic list of sensitive field names is included in `unclogger`:
23+
24+
>>> from unclogger.processors.clean_data import SENSITIVE_FIELDS
25+
>>> SENSITIVE_FIELDS
26+
['password', 'email', 'email_1', 'firstname', 'lastname', 'currentpassword', 'newpassword', 'tmppassword', 'authentication', 'refresh', 'auth', 'http_refresh', 'http_x_forwarded_authorization', 'http_x_endpoint_api_userinfo', 'http_authorization', 'idtoken', 'oauthidtoken', 'publickey', 'privatekey']
27+
28+
!!! Note
29+
30+
Note that the list is case-insensitive; `unclogger` normalizes all field names to lowercase, so e.g. `email` and `Email` are treated equally.
31+
32+
This list can be configured with an iterable of custom field names:
33+
34+
>>> from unclogger import get_logger
35+
>>> logger = get_logger("test logger")
36+
>>> logger.config.sensitive_keys = {"foo", "bar"}
37+
>>> logger.config.sensitive_keys.add("foobar")
38+
>>> payload = {"foo": "1234", "bar": "5678", "fooBar": "9876"}
39+
>>> logger.info("clean sensitive values)", payload=payload)
40+
{
41+
"payload": {
42+
"foo": "********",
43+
"bar": "********",
44+
"fooBar": "********"
45+
},
46+
"event": "clean sensitive values",
47+
"logger": "test logger",
48+
"level": "info",
49+
"timestamp":
50+
"2022-02-02T11:08:01.260019Z"
51+
}
52+
53+
### Configurable Replacement Value
54+
55+
A custom string can be used instead of the default replacement value:
56+
57+
>>> from unclogger import get_logger
58+
>>> logger = get_logger("test logger")
59+
>>> logger.config.replacement = "blablabla"
60+
>>> logger.info("clean password", password="blabla", foo={"Email": "test@example.xcom"})
61+
{
62+
"password": "blablabla",
63+
"foo": {"Email": "blablabla"},
64+
"event": "clean password",
65+
"logger": "test logger",
66+
"level": "info",
67+
"timestamp": "2022-12-13T20:02:38.520599Z"
68+
}
69+
70+
### Hashing Sensitive Data
71+
72+
Instead of a replacement string, `config.replacement` can define a Python callable:
73+
74+
>>> from unclogger import get_logger
75+
>>> logger = get_logger("test logger")
76+
>>> logger.config.replacement = hashlib.sha256
77+
>>> logger.info("clean password", password="blabla", foo={"Email": "test@example.xcom"})
78+
{
79+
"password": "ccadd99b16cd3d200c22d6db45d8b6630ef3d936767127347ec8a76ab992c2ea",
80+
"foo": {"Email": "77b6427267ac7638fd0cd49f2f64fd619ade2ab21d4a3891234293671c1d14b3"},
81+
"event": "clean password",
82+
"logger": "test logger",
83+
"level": "info",
84+
"timestamp": "2022-12-13T20:06:37.542212Z"
85+
}
86+
87+
This can be used so that the data can still be identified (e.g. an email address will always have the same has value) without sending the actual data to the log.
88+
89+
!!! Warning
90+
91+
This functionality is intended to work out of the box with the functions present in the `hashlib` standard library. Any custom hash function has to accept a bytestring value and return a hash object as described in the [documentation](https://docs.python.org/3.10/library/hashlib.html). For typing purposes, `unclogger` provides a `Protocol` class for hash objects:
92+
93+
```python
94+
from unclogger.processors.clean_data import HashObjectProtocol
95+
96+
def custom_hash_function(data: bytes) -> HashObjectProtocol:
97+
...
98+
```
99+
100+
## Sensitive Text Values
101+
>>> from unclogger import get_logger
102+
>>> logger = get_logger("test logger")
103+
>>> logger.info("'Authentication': 1234")
104+
{
105+
"event": "#### WARNING: Log message replaced due to sensitive keyword: 'Authentication':",
106+
"logger": "test logger",
107+
"level": "info",
108+
"timestamp": "2022-02-02T11:22:21.997204Z"
109+
}
110+
111+
*[PII]: Personally Identifiable Information

mkdocs.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ plugins:
1010
- search
1111
- mkapi
1212
markdown_extensions:
13+
- abbr
14+
- attr_list
1315
- pymdownx.highlight:
1416
anchor_linenums: true
1517
- pymdownx.inlinehilite
@@ -21,4 +23,5 @@ markdown_extensions:
2123
guess_lang: no
2224
nav:
2325
- index.md
26+
- sensitive.md
2427
- reference.md

0 commit comments

Comments
 (0)