dendrol parses STIX2 pattern expressions into basic Python structures
This iconic STIX2 Pattern visualization is based upon this example expression:
(
[ipv4-addr:value = '198.51.100.1/32' OR
ipv4-addr:value = '203.0.113.33/32' OR
ipv6-addr:value = '2001:0db8:dead:beef:dead:beef:dead:0001/128']
FOLLOWEDBY [
domain-name:value = 'example.com']
) WITHIN 600 SECONDS
Using the formal STIX2 Pattern grammar, that expression is converted into this parse tree:
dendrol will convert that expression (by way of that parse tree) into this more human-readable and machine-actionable form:
pattern:
expression:
join: FOLLOWEDBY
qualifiers:
- within:
value: 600
unit: SECONDS
expressions:
- observation:
objects:
? ipv4-addr
? ipv6-addr
join: OR
qualifiers:
expressions:
- comparison:
object: ipv4-addr
path: [value]
negated:
operator: '='
value: 198.51.100.1/32
- comparison:
object: ipv4-addr
path: [value]
negated:
operator: '='
value: 203.0.113.33/32
- comparison:
object: ipv6-addr
path: [value]
negated:
operator: '='
value: 2001:0db8:dead:beef:dead:beef:dead:0001/128
- observation:
objects: {domain-name}
join:
qualifiers:
expressions:
- comparison:
object: domain-name
path: [value]
negated:
operator: '='
value: example.com
dendrol provides an interface for parsing STIX2 Pattern Expressions much like cti-pattern-validator, with the dendrol.Pattern
class. This class has a method, to_dict_tree()
, which converts the ANTLR parse tree to a dict-based tree structure, PatternTree.
from dendrol import Pattern
pattern = Pattern("[domain-name:value = 'http://xyz.com/download']")
assert pattern.to_dict_tree() == {
'pattern': {
'observation': {
'objects': {'domain-name'},
'join': None,
'qualifiers': None,
'expressions': [
{'comparison': {
'object': 'domain-name',
'path': ['value'],
'negated': None,
'operator': '=',
'value': 'http://xyz.com/download',
}}
]
}
}
}
A specialized YAML representation is also proposed, to make visualization of this data a little less cumbersome:
from dendrol import Pattern
pattern = Pattern("[domain-name:value = 'http://xyz.com/download']")
assert str(pattern.to_dict_tree()) == '''\
pattern:
observation:
objects: {domain-name}
join:
qualifiers:
expressions:
- comparison:
object: domain-name
path: [value]
negated:
operator: '='
value: http://xyz.com/download
'''
For more info, read The Spec below, or check out the tests.
To develop dendrol and run its tests, first clone the repo. Then install the dev and testing dependencies:
pip install .[dev] .[test]
pytest is used for testing:
py.test
Reported issues and pull requests welcomed! From new features and suggestions to typo fixes and poor naming choices, fresh eyes bolster software eternally in development.
If submitting a pull request, please add yourself to the CONTRIBUTORS file for a piece of that sweet, sweet street cred!
A PatternTree begins with a 'pattern'
key. Below it is an observation expression, with an 'observation'
or 'expression'
key (which may contain more observation expressions joined by AND/OR/FOLLOWEDBY). Below 'observation'
keys are comparison expressions, marked by a 'comparison'
or 'expression'
key (which may contain more comparison expressions joined by AND/OR). 'comparison'
keys denote a single comparison between an object property and a literal value.
{'pattern': {...}}
pattern:
...
A PatternTree is a dict with one top-level key, 'pattern'
. This paradigm of a dict with a single key identifying its contents is seen throughout this spec.
The value of this 'pattern'
key is an observation expression.
An Observation Expression is a dict with a single key of either 'expression'
or 'observation'
. An 'expression'
SHALL contain two or more observation expressions joined by AND/OR/FOLLOWEDBY, whereas an 'observation'
SHALL contain only comparison expressions.
{'expression': {
'join': oneOf('AND', 'OR', 'FOLLOWEDBY'),
'qualifiers': [...],
'expressions': [...],
}}
expression:
join: AND | OR | FOLLOWEDBY
qualifiers:
expressions:
- a
- b
- ...
An 'expression'
is a container for other observation expressions, joined by an observation operator in 'join'
. It MAY have a list of qualifiers in the 'qualifiers'
key, or None
if there are none.
Its children are in 'expressions'
, whose values SHALL be dicts with single keys (of either 'observation'
or 'expression'
).
{'observation': {
'objects': {'ipv4-addr', 'ipv6-addr', ...},
'join': oneOf('AND', 'OR'),
'qualifiers': [...],
'expressions': [...],
}}
observation:
objects:
? ipv4-addr
? ipv6-addr
? ...
join: AND | OR
qualifiers:
expressions:
- a
- ...
An 'observation'
is analogous to square brackets in STIX2 Pattern Expressions, e.g.: [ipv4-addr:value = '1.2.3.4']
. Children of an observation (in the 'expressions'
key) SHALL only be comparisons or comparison expressions.
An 'observation'
MAY have qualifiers, but its children MUST NOT.
An 'observation'
MAY have a join method, which denotes how its child comparison expressions are to be joined. This method MAY be AND or OR, but MUST NOT be FOLLOWEDBY, because the join method applies to comparison expressions, not observation expressions. If there is only a single child comparison expression, 'join'
MAY be None
.
An 'observation'
SHALL contain a set of all the object types of its child comparison expressions. This is mainly for human consumption. A STIX2 observation is allowed to contain comparisons on disparate object types, provided they're joined by OR— this is why 'objects'
is a set, not a single string.
If 'objects'
contains only a single object type, it MAY be compacted into set literal form:
observation:
objects: {ipv4-addr}
join: AND | OR
qualifiers:
expressions:
- a
- ...
A Qualifier is a dict having a single key identifying its Qualifier type. Currently, this SHALL be one of:
{'start_stop': {
'start': datetime(2018, 10, 7, 0, 0, tzinfo=tzutc()),
'stop': datetime(2018, 10, 7, 23, 59, tzinfo=tzutc()),
}}
start_stop:
start: 2018-10-07T00:00:00Z
stop: 2018-10-08T23:59:00Z
The 'start_stop'
qualifier constrains the timeframe in which its associated observation expressions MUST occur within to evaluate true. Unlike WITHIN
, START ... STOP ...
denotes absolute points in time, using datetime literals.
Example STIX2 expression:
[a:b = 12] START t'2018-10-07T00:00:00Z' STOP t'2018-10-08T23:59:00Z'
In STIX2 Pattern Expressions, all datetimes MUST be in RFC3339 format, and MUST be in UTC timezone. datetime literals resemble Python strings with t
as their modifying char (like an f-string, or a bytestring). Because they must be in UTC timezone, datetime literals MUST end with the Z
char.
When parsed into Python, they SHALL have a tzinfo
object with a dstoffset
of 0.
{'within': {
'value': 600,
'unit': 'SECONDS',
}}
within:
value: 600
unit: SECONDS
The 'within'
qualifier constrains the timeframe in which its associated observation expressions MUST occur within to evaluate true. Unlike START ... STOP ...
, WITHIN
denotes relative timeframes, where the latest observation expression MUST occur within the specified number of seconds from the earliest observation expression.
Example STIX2 expression:
[a:b = 12] WITHIN 600 SECONDS
SECONDS
is hard-coded into the STIX2 Pattern Expression grammar, and MUST be included in pattern expressions. However, to avoid ambiguity for the reader, and to allow for future STIX2 spec changes, the unit is also included in the Pattern Tree.
{'repeats': {
'value': 9000,
}}
repeats:
value: 9000
The 'repeats'
qualifier REQUIRES that its associated observation expressions evaluate true at different occasions, for a specified number of times.
Example STIX2 expression:
[a:b = 12] REPEATS 9000 TIMES
TIMES
is hard-coded into the STIX2 Pattern Expression grammar, and MUST be included in pattern expressions. However, since there aren't any other obvious units of multiplicity, other than "X times", it has been omitted from the Pattern Tree output — unlike SECONDS
of WITHIN
.
A Comparison Expression is a dict with a single key of either 'expression'
or 'comparison'
. An 'expression'
SHALL contain two or more comparison expressions joined by AND/OR, whereas a 'comparison'
contains no children, and only marks a comparison of one variable to one literal value.
{'expression': {
'join': oneOf('AND', 'OR'),
'expressions': [a, b, ...],
}}
expression:
join: AND | OR
expressions:
- a
- b
- ...
An 'expression'
is a container for other comparison expressions, joined by either AND or OR in 'join'
— comparison expressions do not have FOLLOWEDBY, as they are intended to reference a single object at a single point in time.
An 'expression'
MUST NOT have qualifiers.
Its children are in 'expressions'
, whose values SHALL be dicts with single keys (of either 'comparison'
or 'expression'
).
{'comparison': {
'object': 'email-message',
'path': ['from_ref', 'value'],
'negated': None,
'operator': 'MATCHES',
'value': '.+@malicio\\.us',
}}
comparison:
object: email-message
path:
- from_ref
- value
negated:
operator: MATCHES
value: .+@malicio\.us
A 'comparison'
represents a single comparison between a STIX2 object property and a literal value. A single string object type SHALL be placed in the 'object'
key.
'path'
SHALL be a list beginning with a top-level property of the object type denoted in 'object'
, as a string. Following this MAY be any number of child properties, as strings, or list index components/dereferences, denoted as Python slice()
objects, where [1]
is equivalent to slice(start=None, stop=1, step=None)
. The special match any list index from STIX2 (e.g. file:sections[*]
) is equivalent to slice(start=None, stop='*', step=None)
.
'negated'
SHALL be a bool denoting whether the operator SHALL be negated during evaluation. STIX2 allows a NOT
keyword before the operator: file:name NOT MATCHES 'james.*'
. If the operator is not negated, 'negated'
MAY be None
. (This allows for a more compact YAML representation — where the value may simply be omitted.)
'operator'
SHALL be a string representing the operator, e.g. '>'
, 'LIKE'
, or '='
.
'value'
MAY be any static Python value. Currently, only strings, bools, ints, floats, datetimes, and bytes are outputted, but this could change in the future (e.g. if compiled regular expressions are deemed useful).
If 'path'
contains only a single property, it MAY be compacted into list literal form:
comparison:
object: domain-name
path: [value]
negated:
operator: =
value: cnn.com