-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer ADDOBSID added #354
Transformer ADDOBSID added #354
Conversation
This reverts commit 83c5119.
Hi @pcoccoli, I applied comments of Dr. Shu. Can you please review the pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this solution.
src/kestrel/syntax/utils.py
Outdated
@@ -15,7 +15,7 @@ | |||
|
|||
LITERALS = {"CNAME", "LETTER", "DIGIT", "WS", "INT", "WORD", "ESCAPED_STRING", "NUMBER"} | |||
AGG_FUNCS = {"MIN", "MAX", "AVG", "SUM", "COUNT", "NUNIQUE"} | |||
TRANSFORMS = {"TIMESTAMPED"} | |||
TRANSFORMS = {"TIMESTAMPED", "ADDOBSID"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awkward name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBSERVED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the initial name, and @pcoccoli requested to change it to a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the transform only adds the ID, I guess the current name is acceptable. I would encourage everyone to take a step back and look at what the use case is, though. IIRC, you're first using TIMESTAMPED
and then this new ADDOBSID
. Those two perform the exact same operation except for the attribute name. Couldn't you do it in a single operation by providing a list of the observed-data
attributes you want in the result? And if so, what you're really doing is producing a list of "records" correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pcoccoli, for using our kestrel analytics, we no longer need to use TIMESTAMPED transformer. However, the single operation that you suggested is great. Ideally, we can have a general transformer which takes the name of attributes that should be added as a list of attributes. If you like this solution, I can implement it in Kestrel. If we do this, in addition to the TIMSTAMPED and ADDOBSID, we have another transformer used as follows.
x=TRANSFORM(y) ATTR id, first-observed, created
Please let me know if you like this solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought on "record". To me, it is a future direction and I am viewing "record" as "event" we need to bring to Kestrel that correlate two or more entities. By defining the event as a first-class citizen in Kestrel, we will be able to refer to it (such as in a variable) and pass the correlation information around.
In @leila-rashidi 's case, the data being used is already linked (as records in a single data source). However, we may think of more generic cases where users use Kestrel to correlate data from multiple data sources, then provide the linked entities to @leila-rashidi 's analytics, e.g., using process---user-account
data from a source and process---network-traffic
data from another source, we may output user-account---network-traffic
data to the analytics.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #354 +/- ##
===========================================
- Coverage 83.88% 83.78% -0.11%
===========================================
Files 42 42
Lines 2967 2973 +6
===========================================
+ Hits 2489 2491 +2
- Misses 478 482 +4
☔ View full report in Codecov by Sentry. |
My suggestion is to accept the current PR, while planning for the more generalized design that has For the currently PR, we need some documentation update to be complete. |
No description provided.