Skip to content

add file retry events and adjust lock structure to fix eBPF file data loss #2292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

xiongyunn
Copy link
Contributor

No description provided.

yyuuttaaoo and others added 26 commits April 18, 2025 10:51
* feat: add instance labels into host monitor metrics
* feat: support send multivalue metrics to sls

* fix

* fix

* fix

* fix

* fix
* add e2e framework code for eBPF process collection

* e2e support ssh to host by private key

* fix: formatting errors
* feat: provide common cache for system information
* trim meaningless \0 when reading file

* remove redundant code
… function. (alibaba#2256)

* fix container path format

* fix lint

* fix commentf
* Disable process security local container meta

* add btf search path

* set logger for any bpf program

* stop process manager if not inited successfully
* upgrade go version from 1.19.10 to 1.23.10

* update dev image

* fix license

* remove goc

* update build script

* update go lint

* update go lint

* fix lint

* fix e2e

* speedup ci

* fix static check

* fix go version
…" (alibaba#2260)

* Revert "Feat: upgrade go version from 1.19.10 to 1.23.10 (alibaba#2254)"

This reverts commit f1712da.

* update workflow
* feat: windows build

* feat: add windows workflows

* fix: windows build script

---------

Co-authored-by: linrunqi08 <[email protected]>
Co-authored-by: Takuka0311 <[email protected]>
Fix crash caused by pointer moved to nullptr when camel concurrent queue is full
Use eBPF Server's polling thread to run process cache management
Use eBPF Server's handling thread to run plugin's aggregation and send
Refine eBPF Server's start and stop routine
* fix

* fix lint and ut memory leaks

* fix lint and ut

* fix memory leaks

* fix GCC diagnostic push

* polish

* add self-monitor benchmark

* remove test branch

* Revert "remove test branch"

This reverts commit f31a3bb.

* Revert "add self-monitor benchmark"

This reverts commit 87015b3.
* feat: windows build

* feat: add windows workflows

* fix: windows build script

---------

Co-authored-by: linrunqi08 <[email protected]>
Co-authored-by: Takuka0311 <[email protected]>
merge eBPF thread structure modifications
@xiongyunn xiongyunn closed this Jul 11, 2025
@xiongyunn xiongyunn reopened this Jul 11, 2025
@xiongyunn xiongyunn changed the title [WIP] file security add file retry events and adjust lock structure to fix eBPF file data loss Jul 11, 2025
@CLAassistant
Copy link

CLAassistant commented Jul 15, 2025

CLA assistant check
All committers have signed the CLA.

@@ -14,6 +14,7 @@

#include "ebpf/Config.h"

#include <set>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没用到

size_t originalSize = thisFileFilter.mFilePathList.size();
std::unordered_set<std::string> uniquePaths;
std::vector<std::string> deduplicatedPaths;
deduplicatedPaths.reserve(originalSize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个去重功能有UT吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在InputFileSecurityUnittest里面有一个

default:
LOG_ERROR(sLogger, ("unknown plugin type", int(type)));
return false;
}

if (pluginMgr->Init(options) != 0) {
LOG_ERROR(sLogger, ("plugin manager init failed", ""));
if ((type == PluginType::NETWORK_SECURITY || type == PluginType::PROCESS_SECURITY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里条件要个上面Init的地方保持一致。最好就上面判断一下后写一个flag,都通过这个flag决定要不要init和stop processcachemanager


int processCacheEvents = mProcessCacheManager->PollPerfBuffers();

int currentMaxWaitTime = kDefaultMaxWaitTimeMS;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这好像不怎么符合预期,应该是这样的逻辑

maxWaitTimeMs=kDefaultMaxWaitTimeMs
endtime = now()

starttime = endtime
pollperfbuffer(maxWaitTimeMs)
endtime = now()
maxWaitTimeMs -= (endtime - starttime)
if (maxWaitTimeMs < 0) maxWaitTimeMs = 1

starttime = endtime
……

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

查了一下接下来确实可以改为嵌套epoll的逻辑

@@ -60,6 +66,8 @@ struct PluginState {
// (PollPerfBuffers/HandlerEvents/GetAllProjects), allowing them to safely interleave.
mutable std::atomic_bool mValid;
mutable std::shared_mutex mMtx;
std::atomic<LifecycleState> mLifecycleState{LifecycleState::STOPPED};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通过框架保证type变更后stop(true)是不是这些逻辑就可以不要了

for (auto* pb : pbs) {
auto* perfbuffer = static_cast<perf_buffer*>(pb);
if (perfbuffer) {
perf_buffer__free(perfbuffer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gWrapper->

@@ -59,7 +59,7 @@ class RetryableEvent {
* @brief Checks if the event can be retried.
* @return true if there are retry attempts left, false otherwise.
*/
[[nodiscard]] bool CanRetry() const { return mRetryLeft > 0; }
[[nodiscard]] virtual bool CanRetry() const { return mRetryLeft > 0; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不应该需要override

@@ -228,7 +229,29 @@ void InitSecurityFileFilter(const Json::Value& config,
} else if (!GetOptionalListFilterParam<std::string>(
config, "FilePathFilter", thisFileFilter.mFilePathList, errorMsg)) {
// FilePathFilter has element of wrong type
} else {
// FilePathFilter succeeded, deduplication
size_t originalSize = thisFileFilter.mFilePathList.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里验证一下配置,应该有个filter数量上限或者总长度上限,与ebpf程序上限对应

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants