Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer ADDOBSID added #354

Merged
merged 16 commits into from
Jun 14, 2023

Conversation

leila-rashidi
Copy link
Contributor

No description provided.

@leila-rashidi
Copy link
Contributor Author

@subbyte and @pcoccoli, please note that we should wait for the new version of firepit. Some of the tests are failed for this reason.

@leila-rashidi
Copy link
Contributor Author

Hi @pcoccoli, I applied comments of Dr. Shu. Can you please review the pull request?

Copy link
Collaborator

@pcoccoli pcoccoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like this solution.

@@ -15,7 +15,7 @@

LITERALS = {"CNAME", "LETTER", "DIGIT", "WS", "INT", "WORD", "ESCAPED_STRING", "NUMBER"}
AGG_FUNCS = {"MIN", "MAX", "AVG", "SUM", "COUNT", "NUNIQUE"}
TRANSFORMS = {"TIMESTAMPED"}
TRANSFORMS = {"TIMESTAMPED", "ADDOBSID"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awkward name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pcoccoli and @subbyte, Do you have a better suggestion for the name of new transformer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBSERVED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the initial name, and @pcoccoli requested to change it to a better name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the transform only adds the ID, I guess the current name is acceptable. I would encourage everyone to take a step back and look at what the use case is, though. IIRC, you're first using TIMESTAMPED and then this new ADDOBSID. Those two perform the exact same operation except for the attribute name. Couldn't you do it in a single operation by providing a list of the observed-data attributes you want in the result? And if so, what you're really doing is producing a list of "records" correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pcoccoli, for using our kestrel analytics, we no longer need to use TIMESTAMPED transformer. However, the single operation that you suggested is great. Ideally, we can have a general transformer which takes the name of attributes that should be added as a list of attributes. If you like this solution, I can implement it in Kestrel. If we do this, in addition to the TIMSTAMPED and ADDOBSID, we have another transformer used as follows.

x=TRANSFORM(y) ATTR id, first-observed, created

Please let me know if you like this solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought on "record". To me, it is a future direction and I am viewing "record" as "event" we need to bring to Kestrel that correlate two or more entities. By defining the event as a first-class citizen in Kestrel, we will be able to refer to it (such as in a variable) and pass the correlation information around.

In @leila-rashidi 's case, the data being used is already linked (as records in a single data source). However, we may think of more generic cases where users use Kestrel to correlate data from multiple data sources, then provide the linked entities to @leila-rashidi 's analytics, e.g., using process---user-account data from a source and process---network-traffic data from another source, we may output user-account---network-traffic data to the analytics.

@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Patch coverage: 66.66% and project coverage change: -0.11 ⚠️

Comparison is base (a686960) 83.88% compared to head (615c8f1) 83.78%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #354      +/-   ##
===========================================
- Coverage    83.88%   83.78%   -0.11%     
===========================================
  Files           42       42              
  Lines         2967     2973       +6     
===========================================
+ Hits          2489     2491       +2     
- Misses         478      482       +4     
Impacted Files Coverage Δ
src/kestrel/semantics/completor.py 91.77% <ø> (-0.11%) ⬇️
src/kestrel/codegen/commands.py 89.45% <33.33%> (-1.04%) ⬇️
src/kestrel/syntax/parser.py 97.24% <100.00%> (+0.03%) ⬆️
src/kestrel/syntax/utils.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@subbyte
Copy link
Member

subbyte commented Jun 12, 2023

My suggestion is to accept the current PR, while planning for the more generalized design that has record or event as first-class citizen.

For the currently PR, we need some documentation update to be complete.

@leila-rashidi
Copy link
Contributor Author

Hi @subbyte, I will update documentation soon. For your information, @pcoccoli has already approved this pull request.

@subbyte subbyte merged commit 72ed6e1 into opencybersecurityalliance:develop Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants