Container with Elasticsearch #2550

drosetti · 2024-10-22T16:49:14Z

(Please add to the PR name the issue/s that this PR would close if merged by using a Github keyword. Example: <feature name>. Closes #999. If your PR is made by a single commit, please add that clause in the commit too. This is all required to automate the closure of related issues.)

Description

Please include a summary of the change and link to the related issue.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue).
New feature (non-breaking change which adds functionality).
Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist

Important Rules

If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review. After being reviewed and received a "change request", you should explicitly ask for a review again once you have made the requested changes.

drosetti · 2024-10-22T16:53:43Z

I converted the pr to draft because the doc is missing

drosetti · 2024-10-25T12:53:10Z

restored to real pr, the doc is in another repo

mlodic

it is important to write in the docs which elasticsearch versions are supported / have been tested

mlodic · 2024-10-25T13:28:31Z

intel_owl/tasks.py

+
+    def _convert_report_to_elastic_document(_class: AbstractReport) -> List[Dict]:
+        upper_threshold = now().replace(second=0, microsecond=0)
+        lower_threshold = upper_threshold - datetime.timedelta(minutes=5)


timedeltas should not be calculated inside async functions but should be calculated beforehand. That is to avoid that, in case of congestion, this value changes.

I think it's correct to calculate it inside the task: the alternative is to put it in the beat schedule, but this doesn't work because the function is called once when the schedule is defined and the time range would be the same for all the scheduled tasks. Am i wrong ?

no, you right. If I remember correctly, in another occasion we managed this case by calculating the time from an element of the database. In this way there's no way of getting this value wrong because this task would change it only at the time of execution. So, if there are any downtimes, there would be no loss of data. (I am afraid of having the sync misaligned cause we lose some data from time to time. That would make data analysis really bad)

(I would still get the time from "now" first and then, instead of doing minus 5 minutes, I would use as lower_threshold the data got from the database of the last update)

I'm not sure, I remember what are you talking about, but I didn't find collections: I found some capped collections used to repeat a task in case of failure, it's similar, but not the same. However I found a way to do it with postgres, I proceed with the merge.

…ted time

code-review-doctor

Looks good. Worth considering though. View full project report here.

api_app/models.py

drosetti added 15 commits September 17, 2024 12:26

updated elastic, removed kibana and fixed

1521b52

added elastic conf

dd7edb3

added persistency and tls

359e427

fixed elastic docker

8b4812b

removed useless setting

7fe65cc

undo

e8a01b5

remove

7f92bb9

moved to index template

b2d2b46

changed comment

513b801

moved doc indexing in task instead of signal

e81429c

added log

39838f0

template auto create

2555d38

removed mock

0707f7d

fix

de051be

elastic support https

3dd86cb

drosetti requested a review from mlodic October 22, 2024 16:49

drosetti marked this pull request as draft October 22, 2024 16:53

drosetti marked this pull request as ready for review October 23, 2024 09:12

mlodic approved these changes Oct 25, 2024

View reviewed changes

added test for elastic cronjob and a collection to mark the last upda…

cc785f8

…ted time

code-review-doctor bot suggested changes Oct 31, 2024

View reviewed changes

api_app/models.py Show resolved Hide resolved

drosetti added 4 commits October 31, 2024 09:49

code quality

c2944a2

print to debug ci

3896e19

debug ci

0f371ad

fixed ci

e554bee

drosetti merged commit 4220e7e into develop Oct 31, 2024
11 of 12 checks passed

drosetti deleted the advanced-queries branch October 31, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container with Elasticsearch #2550

Container with Elasticsearch #2550

drosetti commented Oct 22, 2024

drosetti commented Oct 22, 2024

drosetti commented Oct 25, 2024

mlodic left a comment

mlodic Oct 25, 2024

drosetti Oct 28, 2024

mlodic Oct 28, 2024

drosetti Oct 31, 2024

code-review-doctor bot left a comment

Container with Elasticsearch #2550

Container with Elasticsearch #2550

Conversation

drosetti commented Oct 22, 2024

Description

Type of change

Checklist

Important Rules

drosetti commented Oct 22, 2024

drosetti commented Oct 25, 2024

mlodic left a comment

Choose a reason for hiding this comment

mlodic Oct 25, 2024

Choose a reason for hiding this comment

drosetti Oct 28, 2024

Choose a reason for hiding this comment

mlodic Oct 28, 2024

Choose a reason for hiding this comment

drosetti Oct 31, 2024

Choose a reason for hiding this comment

code-review-doctor bot left a comment

Choose a reason for hiding this comment