Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datajson v1.0 #12102

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Datajson v1.0 #12102

wants to merge 8 commits into from

Conversation

regit
Copy link
Contributor

@regit regit commented Nov 7, 2024

Indicator of Compromises (IOCs) are a key element in Security Operating Center. Dataset
have been a huge step in getting alert on IOCs from Suricata. But produced alerts are
lacking contextualization. For example, if a IOC management software has a list of host
names, they will be linked to different threat actors but at ingestion in Suricata they will be
a simple list of strings. This list will be used via a rule like

alert tls any any -> any any (msg:"IOC hostname on TLS"; tls.sni; dataset:isset,hostname.lst,...; sid:1664;)

With this an alert will have a subject without information and a mapping will have to be done
at posteriori to see which IOC has hit. The pseudo algorithm to run is:

  • Intercept the signature 1664
  • Extract the tls.sni
  • Check the tls.sni value in the IOC management software

This works but it is not optimal as correlation and external processing has to be done for all the matches.

To fix this issue, we need to be able to ingest the IOCs without loosing the contextual
information contained in the IOC management software.

Datajson is a proposed implementation that addresses this issue. Instead of injecting into
Suricata the value list we can attach to each value a JSON object that will end up into
the alert output.

The following example is alert on source and destination IP in dataset:

alert tls $HOME_NET any -> any any (msg:"Test dataset";
                              ip.src; datajson:isset,ip4list,type ipv6,load ip4-json.lst,key inventory; \
                              ip.dst; datajson:isset,actors_ip,type ipv6, load bad-json.lst, key bad_actors; sid:1;)

In ip4-json.lst, we have data from inventory:

10.7.5.5,{"user":"vjulien","rank": 1}

In bad-json.lst, we have data from the IOC management software:

185.117.73.76,["Bad Panda"]
144.217.50.240,["killer bear","LSD kitten"]

The result is an alert section that looks like:

  "alert": {
    "action": "allowed",
    "gid": 1,
    "signature_id": 1,
    "rev": 0,
    "signature": "Test dataset",
    "category": "",
    "severity": 3,
    "extra": {
      "inventory": {
        "user": "vjulien",
        "rank": 1
      },
      "bad_actors": [
        "Killer Bear",
        "LSD kitten"
      ]
    }
  },

Contribution style:

Our Contribution agreements:

Changes (if applicable):

Link to ticket: https://redmine.openinfosecfoundation.org/issues/7372

Describe changes:

  • Implement datajson keywords for all dataset types
  • Add documentation
  • Add unix socket commands

SV_REPO=https://github.com/regit/suricata-verify/
SV_BRANCH=datajson-v1.0

This patch introduces a new keyword datajson that is similar
to dataset with a twist. Where dataset allows match from sets,
datajson allows the same but also adds JSON data to the alert
event. This data is comint from the set definition it self.
For example, an ipv4 set will look like:

  10.16.1.11,{"test": "success","context":3}

The syntax is value and json data separated by a comma.

The syntax of the keyword is the following:

  datajson:isset,src_ip,type ip,load src.lst,key src_ip;

Compare to dataset, it just have a supplementary option key
that is used to indicate in which subobject the JSON value
should be added.

The information is added in the even under the alert.extra
subobject:

  "alert": {
    "extra": {
      "src_ip": {
        "test": "success",
        "context": 3
      },

The main interest of the feature is to be able to contextualize
a match. For example, if you have an IOC source, you can do

 value1,{"actor":"APT28","Country":"FR"}
 value2,{"actor":"APT32","Country":"NL"}

This way, a single dataset is able to produce context to the
event where it was not possible before and multiple signatures
had to be used.

Ticket: OISF#7372
Previous code was using an array and introducing a limit in the
number of datajson keywords that can be used in a signature.

This patch uses a linked list instead to overcome the limit. By
using a first element of the list that is part of the structure
we limit the cost of the feature to a structure member added to
PacketAlert structure. Only the PacketAlertFree function is
impacted as we need to iterate to find potential allocation.

Ticket: OISF#7372
It was not handling correctly the json values with space as they
were seen as multiple arguments.

Ticket: OISF#7372
Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 3.28685% with 971 lines in your changes missing coverage. Please review.

Project coverage is 67.43%. Comparing base (278dc24) to head (6061d01).

❗ There is a different number of reports uploaded between BASE (278dc24) and HEAD (6061d01). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (278dc24) HEAD (6061d01)
suricata-verify 1 0
unittests 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #12102       +/-   ##
===========================================
- Coverage   83.23%   67.43%   -15.81%     
===========================================
  Files         906      846       -60     
  Lines      257647   156207   -101440     
===========================================
- Hits       214458   105332   -109126     
- Misses      43189    50875     +7686     
Flag Coverage Δ
fuzzcorpus 60.88% <2.98%> (-0.33%) ⬇️
livemode 19.32% <2.88%> (-0.11%) ⬇️
pcap 44.16% <2.68%> (-0.27%) ⬇️
suricata-verify ?
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPR1_stats_chk
.uptime 636 676 106.29%

Pipeline 23289

Copy link
Member

@victorjulien victorjulien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

Bigger issue: I like this work, but I would like to understand why this needs a new set type and keyword type? Can we not overload the existing dataset/datarep facilities?

#ifndef __DATASETS_JSON_H__
#define __DATASETS_JSON_H__

#include <stdint.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use suricata-common.h

{
json_error_t jerror;
json_t *msg;
msg = json_loads(in, 0, &jerror);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare here

if (msg == NULL) {
SCLogWarning("dataset: Invalid json: %s\n", jerror.text);
return -1;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary else branch, if branch is terminating

json_decref(msg);
}
rep_out->len = ins;
rep_out->value = SCStrndup(in, ins);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an error check here

if (set == NULL)
return rrep;

if (data_len != 16 && data_len != 4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we get here with data_len == 4?

@@ -40,6 +40,7 @@
#include "util-radix-tree.h"
#include "util-file.h"
#include "reputation.h"
#include <stddef.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suricata-common.h should be used

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPW2_autofp_stats_chk
.uptime 139 144 103.6%
SURI_TLPR1_stats_chk
.uptime 636 670 105.35%

Pipeline 23316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants