-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Using msgpack instead of json #1819
Draft
waldbauer-certat
wants to merge
45
commits into
develop
Choose a base branch
from
waldbauer/msgpack-poc
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
waldbauer-certat
force-pushed
the
waldbauer/msgpack-poc
branch
13 times, most recently
from
March 18, 2021 12:10
23cd283
to
9bce822
Compare
waldbauer-certat
force-pushed
the
waldbauer/msgpack-poc
branch
4 times, most recently
from
April 1, 2021 09:32
5c6bdd5
to
9ab334e
Compare
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
ghost
reviewed
Apr 1, 2021
waldbauer-certat
force-pushed
the
waldbauer/msgpack-poc
branch
2 times, most recently
from
June 30, 2021 15:41
6d9e656
to
40e4ae1
Compare
ghost
added
the
needs: feedback
label
Aug 20, 2021
waldbauer-certat
force-pushed
the
waldbauer/msgpack-poc
branch
from
July 13, 2022 08:30
5215a89
to
d32bdb8
Compare
Signed-off-by: Sebastian Waldbauer <[email protected]>
This commit adds license information to a lot of files and adds a .reuse/dep5 file that lists the license information for some folders The commit also changes the main license in setup.cfg from AGPL-3.0-only to AGPL-3.0-or-later because only one file has the AGPL-3.0-only file as license and multiple files have the AGPL-3.0-or-later in the license header. It also removes the cef_logo.png file, as there is no information about the licese anywhere to be found. It is now included directly from the website of the european union. Closes #1633
and add legacy tag to shadowserver caida config
and add legacy tag to darknet config
and add legacy tag to the configs it replaces and update changelog and documentation accordingly
fix mapping use compromised type if the data indicates an active webshell plus add testcases add changelog update bots documentation
enhance mappings add 4/6 agnostic mapping for `Sinkhole-Events` as well document feeds with IPv4 and IPv6 better and shorter
This commit adds a license header or a license file to most of the files, or documents the license in the .reuse/dep5 license file. Some of the process was automated, first by listing all the files that are not reuse lint compliant: > reuse lint > ../reuse.lst This list was then modified to remove metainformation and only list filenames. Also a couple of filenames that need manual modification were removed. Then using git and reuse: > for file in `cat ../reuse.lst`; do year=`git log --reverse --pretty="format:%ai" $file | head -1 | cut -d "-" -f 1`; author=`git log --reverse --pretty="format:%an" $file|head -1`; reuse addheader --copyright="$author" --year="$year" --license="AGPL-3.0-or-later" --skip-unrecognised $file; done Then the same process was repeated for files reuse does not recognize, like csv and json files or REQUIREMENTS.txt files.
match with RSIT in the taxonomy intrusions: compromised -> system-compromise unauthorized-command -> system-compromise unauthorized-login -> system-compromise adapt bots depending on the name add changelog and news entries, including SQL update statements
merged into information-content-security > unauthorised-information-modification adapt bots depending on the name add changelog and news entries, including SQL update statements
was renamed and marked as deprecated in 2.0.0.beta1 #1404
Compatibility with the deprecated configuration format (before 1.0.0.dev7) was removed. #1404
The deprecated shell scripts - `update-asn-data` - `update-geoip-data` - `update-tor-nodes` - `update-rfiprisk-data` have been removed in favor of the built-in update-mechanisms (see the bots' documentation). A crontab file for calling all new update command can be found in `contrib/cron-jobs/intelmq-update-database`. #1404
add two n6 images directly to the repository, as they are not displayed on readthedocs otherwise: The other websites hosting the images block loading images if the referer does not match a whitelist. we can't add a noreferer HTML attribute in rst as well. the option left is to add the files, that only implies adding the licensing information and the AGPL-3.0 license text as well. add two illustrations on the the flow n6 to intelmq and vice versa, own work. some textual improvements in the document itself.
The Aggregate Expert might be used to aggregate events within a given timespan and threshold. Signed-off-by: Sebastian Waldbauer <[email protected]>
Using msgpack instead of json results in faster (de)serialize and less memory usage. Redis is also capable of msgpack within its lua api i.e. https://github.com/kengonakajima/lua-msgpack-native. ====== Benchmark ======= JSON median size: 387 MSGPACK median size: 329 ------------------------ Diff: 16.20% JSON * Serialize: 39286 * Deserialize: 30713 MSGPACK * Serialize: 23483 * Deserialize: 12602 --------------------- DIFF * Serialize: 50.35% * Deserialize: 83.62% Data extracted from spamhaus-collector Measurements based on deduplicator-expert 460 events in total process by deducplicator-expert Signed-off-by: Sebastian Waldbauer <[email protected]>
Signed-off-by: Sebastian Waldbauer <[email protected]>
waldbauer-certat
force-pushed
the
waldbauer/msgpack-poc
branch
from
July 15, 2022 13:31
202fb1b
to
1253c3e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: This is a proof of concept. Being heavily tested!
Introduction
Msgpack ( MessagePack ) is a (de)serialization format, which is similar to json, but more optimized for m2m ( Machine-to-Machine ) communication. For sure there are better protocols like protobuf, flatbuffers, capnproto, SBE and so on, but this doenst fit in intelmq very well. Msgpack uses a key-value pattern ( like in json ), so there wont be any major change. The real "magic" happens how the data is being stored, JSON is very human-readable due to its serialization, but msgpack packs data into binary format, which results in smaller size & faster processing - see the benchmark below.
If you want to know some specs, check it out here.
Msgpack itself is available for multiple languages like golang, python, javascript, php and so on.
In addition, Redis - our internal message queue - is also capable of using msgpack within its lua api.
Whats the goal?
Benchmark
For the benchmark, data was extracted from spamhaus-drop-collector, parsed by spamhaus-drop-parser and measured in deduplicator-expert. 460 events were processed in total.
I've tested the bots above, they worked fine with that change, it might break other bots ( which I havent tested yet )
Serialize
Deserialize
To sum up, changing from json to msgpack will result in a faster (de)serialization and a lower memory footprint.