Added support for JSON containing multiple events#2545
Added support for JSON containing multiple events#2545sebix merged 3 commits intocerttools:developfrom
Conversation
sebix
left a comment
There was a problem hiding this comment.
Do you have an example feed at hand (so we can extract an example for the tests, add it to the docs)? To my knowledge no documented feed is using such a format.
intelmq/bots/parsers/json/parser.py
Outdated
| report = self.receive_message() | ||
| if self.splitlines: | ||
| if self.multiple_events: | ||
| lines = [json.dumps(event) for event in json.loads(base64_decode(report['raw']))] |
There was a problem hiding this comment.
Converting the data forth and back appears to be inefficient.
There was a problem hiding this comment.
Yeah, could imagine this. Any tips to do this a proper way?
Currently this PR is running in our production and works just fine, but I agree on the double JSON conversion isn't the most efficient way to do this.
There was a problem hiding this comment.
I'd suggest just loading here, and then in L33 checking config (or maybe just the object type?), and using MessafeFactory.from_dict if we already loaded data :)
Our National Cyber Security Centre (NCSC) is sending us "IntelMQ JSON's" in a ZIP-file by mail. Here's an example (I tried to anonymise most values): |
|
I added a few changes here:
|
f9d8d81 to
9215f92
Compare
|
...and found & fixed another bug in intelmq.lib.message.Message.from_dict: |
|
As I wrote a major part of this PR, I won't merge it myself @aaronkaplan could you do the review instead? |
|
Rebased on develop to fix conflicts |
|
@kamil-certat maybe you can have a look? |
kamil-certat
left a comment
There was a problem hiding this comment.
It looks good except of non-handled case of both flags being set. Please either handle it or explicitly forbid
| lines = base64_decode(report["raw"]).splitlines() | ||
| else: | ||
| lines = [base64_decode(report['raw'])] | ||
| lines = [base64_decode(report["raw"])] |
There was a problem hiding this comment.
By having two flags and no enforcement if they are exclusive, we allow setting all combinations, including
splitlines = True
multiple_events = True
which is not supported by the code, but still makes a theoretically possible case of multiple lines, each of them containing a list of multiple event dictionaries.
I'd suggest on of following solutions:
- support that case,
- switch from two flags to a one
format(or similar) configuration, that supports options like e.g.single,splitlines,multiple - add a config validation forbidding starting a bot with both flags set.
There was a problem hiding this comment.
Thanks for catching this.
I chose the easiest option: Forbidding it.
There's no reason to implement it; switching to a different mode setting would also increase the required effort, and require upgrade functions. We can always make the switch to a format parameter later with the same effort as now.
add test cases for multiple events mode optimize runtime for multiple events mode add documentation add classification.type = undetermined if input data does not contain the field fix bugs in intelmq.lib.message.Message.from_dict: Do not modify the dict parameter by adding the `__type` field Raise a ValueError if message type is not determinable
Currently the
intelmq.bots.parsers.json.parseris only able to parse or single events in JSON, or multiple events in JSON, each on their own line.This PR contains an option to parse multiple events within a JSON, by adding the
multiple_events(boolean) to the config.