-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share host info between packages #5884
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Kind can be one of: | ||
# - breaking-change: a change to previously-documented behavior | ||
# - deprecation: functionality that is being removed in a later release | ||
# - bug-fix: fixes a problem in a previous version | ||
# - enhancement: extends functionality but does not break or fix existing behavior | ||
# - feature: new functionality | ||
# - known-issue: problems that we are aware of in a given version | ||
# - security: impacts on the security of a product or a user’s deployment. | ||
# - upgrade: important information for someone upgrading from a prior version | ||
# - other: does not fit into any of the other categories | ||
kind: enhancement | ||
|
||
# Change summary; a 80ish characters long description of the change. | ||
summary: Collect host info exactly once on startup | ||
|
||
# Long description; in case the summary is not enough to describe the change | ||
# this field accommodate a description without length limits. | ||
# NOTE: This field will be rendered only for breaking-change and known-issue kinds at the moment. | ||
#description: | ||
|
||
# Affected component; usually one of "elastic-agent", "fleet-server", "filebeat", "metricbeat", "auditbeat", "all", etc. | ||
component: elastic-agent | ||
|
||
# PR URL; optional; the PR number that added the changeset. | ||
# If not present is automatically filled by the tooling finding the PR where this changelog fragment has been added. | ||
# NOTE: the tooling supports backports, so it's able to fill the original PR number instead of the backport PR number. | ||
# Please provide it if you are adding a fragment for a different PR. | ||
#pr: https://github.com/owner/repo/1234 | ||
|
||
# Issue URL; optional; the GitHub issue related to this changeset (either closes or is part of). | ||
# If not present is automatically filled by the tooling with the issue linked to the PR number. | ||
#issue: https://github.com/owner/repo/1234 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,11 @@ package util | |
|
||
import ( | ||
"context" | ||
"sync" | ||
"time" | ||
|
||
"github.com/elastic/elastic-agent/pkg/core/logger" | ||
"github.com/elastic/go-sysinfo" | ||
"github.com/elastic/go-sysinfo/types" | ||
) | ||
|
||
|
@@ -31,3 +33,61 @@ func GetHostName(isFqdnFeatureEnabled bool, hostInfo types.HostInfo, host types. | |
|
||
return fqdn | ||
} | ||
|
||
var _ types.Host = &threadSafeHost{} | ||
|
||
// threadSafeHost is a thread-safe wrapper around types.Host. | ||
// It exists so we can only create it once, as some of the setup it does is relatively expensive. | ||
type threadSafeHost struct { | ||
sync.Mutex | ||
inner types.Host | ||
} | ||
|
||
func newThreadSafeHost(inner types.Host) *threadSafeHost { | ||
return &threadSafeHost{inner: inner} | ||
} | ||
|
||
func (s *threadSafeHost) CPUTime() (types.CPUTimes, error) { | ||
s.Lock() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my first reaction/thinking was that this thread-safety code most probably belongs to the go-sysinfo codebase?! but then looking again in the exposed functions we use, and from a quick look in go-sysinfo they do not mutate anything, so I am wondering do we need to hold a mutex to call them?! did I miss a mutation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. go-sysinfo doesn't guarantee that these are thread-safe, so I don't have much of a choice here. We could try to make them so in the library, but I'm not sure it's worth it. These functions aren't called very often, so in practice there shouldn't be any contention. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since these functions are not mutating anything I would propose to remove the mutex. Or again you may have spotted a mutation that I missed?! If not, I am not entirely convinced that losing the concurrency of calling these functions, given that this is now a shared instance across multiple places, is wise. But if you insist sure go with it 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The functions are not mutating anything right now, but go-sysinfo doesn't guarantee this interface is thread-safe, so this is an implementation detail that may change. If we want to remove the locks, we should make the change in go-sysinfo, and I don't think that's worth the hassle. This PR is a bit borderline already in my opinion when it comes to complexity introduced vs performance gained. At the very least, I'd have to see evidence of actual contention before looking into this further. |
||
defer s.Unlock() | ||
return s.inner.CPUTime() | ||
} | ||
|
||
func (s *threadSafeHost) Info() types.HostInfo { | ||
s.Lock() | ||
defer s.Unlock() | ||
return s.inner.Info() | ||
} | ||
|
||
func (s *threadSafeHost) Memory() (*types.HostMemoryInfo, error) { | ||
s.Lock() | ||
defer s.Unlock() | ||
return s.inner.Memory() | ||
} | ||
|
||
func (s *threadSafeHost) FQDNWithContext(ctx context.Context) (string, error) { | ||
s.Lock() | ||
defer s.Unlock() | ||
return s.inner.FQDNWithContext(ctx) | ||
} | ||
|
||
func (s *threadSafeHost) FQDN() (string, error) { | ||
s.Lock() | ||
defer s.Unlock() | ||
return s.inner.FQDN() | ||
} | ||
|
||
var ( | ||
sharedHost types.Host | ||
once sync.Once | ||
hostErr error | ||
) | ||
|
||
func GetHost() (types.Host, error) { | ||
once.Do(func() { | ||
var innerHost types.Host | ||
innerHost, hostErr = sysinfo.Host() | ||
sharedHost = newThreadSafeHost(innerHost) | ||
}) | ||
return sharedHost, hostErr | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we didn't have two different ways to get Host information where callers need to think about which one to use, or whether they actually need the parts of it that can change at runtime (FQDN for an obvious example).
Can the periodic updates be pushed into
util.GetHost()
? They could be unconditional updates at a fixed rate, or it could behave as a rate limit where it updates at most every X seconds.As a possible simplification for all of this, perhaps the network address information isn't valuable at all for containerized applications and we can add a way to filter it into https://github.com/elastic/go-sysinfo. There is an IsContainerized() already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some related context,
add_kubernetes_metadata
stopped including the host IPs by default:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could just filter out the link local addresses (assuming those are the problem here too) if just not having the addresses at all doesn't work, or its too risky to think being in a container always means you are in k8s (it doesn't). elastic/integrations#6674 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the periodicity of the host info checks in the provider should be a part of the provider's logic, as opposed to being hidden in the utils package. I could see adding a parameter to this API indicating if a cached value is fine, if that helps.
Honestly, what I'm really trying to do here is avoid recomputing the host info, which is only done at the start. All the other methods on this struct fetch the latest data anyway. Maybe the solution is to compute host info at call time, same as everything else? Then every component could have its own
types.Host
(as creating one would be cheap), and we'd separately cache host info for the packages that want it at startup.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all I really care about is that it's obvious when a call to get host info will potentially contain stale data. Keeping the periodicity of the host info check in the provider is fine if that is clear.