Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/thread safe logging #139

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from
Open

Conversation

mcakircali
Copy link
Contributor

No description provided.

optional to use
enables safely handling lifetime of log channel objects
refactored Channel as interface (abstract class)
OutputChannel works as usual
EmptyChannel has no buffer and is no-op
fixes segfaults in multi-threaded
bonus: improves performance
handle any log calls from dtors of static objects
@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 30 lines in your changes missing coverage. Please review.

Project coverage is 63.70%. Comparing base (a8f0983) to head (19507a3).

Files with missing lines Patch % Lines
src/eckit/log/Log.cc 48.57% 18 Missing ⚠️
src/eckit/log/ChannelBuffer.cc 61.90% 8 Missing ⚠️
src/eckit/system/Library.cc 50.00% 3 Missing ⚠️
src/eckit/log/Channel.h 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #139      +/-   ##
===========================================
+ Coverage    63.69%   63.70%   +0.01%     
===========================================
  Files         1065     1066       +1     
  Lines        55140    55188      +48     
  Branches      4085     4100      +15     
===========================================
+ Hits         35122    35159      +37     
- Misses       20018    20029      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

virtual Channel* createChannel() { return new Channel(Main::instance().createWarningLogTarget()); }
};
if (Main::finalised()) {
thread_local Channel postMainError(new PrefixTarget("POST-MAIN-ERROR", new OStreamTarget(std::cout)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in a comment:

This may still cause a segfault due to detruction order in library shutdown -
We may have to return an emptychannel that won't cause any harm when it has been destructed officially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant about this comment (not clear to me) because it can be very misleading if we haven't seen a case before

which type of library, runtime or system ?

for example, calling Log::info/debug/error in a system::~Library will fail but nothing to do with what Main::finalised does.

Copy link

Private downstream CI failed.
Workflow name: private-downstream-ci-hpc
View the logs at https://github.com/ecmwf-actions/private-downstream-ci/actions/runs/10808625792.

Copy link

Private downstream CI succeeded.
Workflow name: private-downstream-ci
View the logs at https://github.com/ecmwf-actions/private-downstream-ci/actions/runs/10808794710.

Copy link

Private downstream CI failed.
Workflow name: private-downstream-ci-hpc
View the logs at https://github.com/ecmwf-actions/private-downstream-ci/actions/runs/10819467342.

Copy link

Private downstream CI succeeded.
Workflow name: private-downstream-ci
View the logs at https://github.com/ecmwf-actions/private-downstream-ci/actions/runs/10819456484.

1 similar comment
Copy link

Private downstream CI succeeded.
Workflow name: private-downstream-ci
View the logs at https://github.com/ecmwf-actions/private-downstream-ci/actions/runs/10819456484.

Copy link

@Ozaq Ozaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcakircali I am not sure if all my comments are relevant to you but I had to guess a bit what the intent of this PR is, i.e. what is the underlying bug to solve.

@@ -29,8 +30,7 @@ typedef void (*channel_callback_t)(void* data, const char* msg);
/// Singleton holding global streams for logging
///

class Log {

class Log: private NonCopyable {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You updated the Log struct to be non-copyable and non-instanciable. I.e. the struct Log is essentially a namespace, why not make it a namespace in the first place? Right now there are only function templates used which also removes this reason to use a struct. This change should also be transparent to the user as the sythax for invocation will be unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct Log is essentially a namespace. If you look at the history of this code, it predates namespaces in C++...

I'm not sure if there would be knock-on impacts of changing it to be a namespace. There is a lot of code which makes use of the log functionality. And (in any case) it would leave us out of our general conventions, in that we would have to have a namespace called "Log" rather than "log" or we would really break a lot of stuff.

This PR already does far too much changing things (mostly for the better) at the same time as trying to fix an underlying issue. Makes it much harder to see if/where it actually fixes the issue. I'm generally inclined towards separating 'cleanup' PRs and and functional PRs for the ease of correctness reviewing. So if we want to play with this, it should be in a separate PR.

public: // methods
EmptyChannel();

~EmptyChannel() = default;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be made virtual with virtual or override

~AutoIndent() { channel_.unindent(); }
};

//----------------------------------------------------------------------------------------------------------------------

class EmptyChannel: public Channel {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two remarks:

1.) This should be named VoidChannel or NullChannel (as in > /dev/null), because it is more interesting that the messages disappear into the void than it is important that the channel is empty.

2.) Is this type actually needed? A VoidChannel could equally be constructed by a factory function, e.g. createVoidChannel.


ChannelBuffer* buffer_;
private: // members
ChannelBuffer* buffer_ {nullptr};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a std::unique_ptr<ChannelBuffer>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently explicitly deleted in the constructor. Please change to unique_ptr. Is a good change.

@@ -39,7 +39,7 @@ class Channel : public std::ostream, private NonCopyable {
bool operator!() const;
operator bool() const;

void indent(const char* prefix = "");
void indent(const char* space = "");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a first glance I disagree with the renaming because It looks like I can use an arbitrary character sequence here while I interpret indentation either as <space> vs <tab> or a numerical value e.g. 5 spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also disagree with this renaming. Although, that said, it is plausible to use non-space indents (e.g. ">>>" or "...")


Channel& Log::info() {
if (!Main::ready()) {
static Channel empty(new PrefixTarget("PRE-MAIN-INFO", new OStreamTarget(std::cout)));
return empty;
thread_local Channel preMainInfo(new PrefixTarget("PRE-MAIN-INFO", new OStreamTarget(std::cout)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there really pre-main threads running? I.e. does this need to be thread_local?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that there are ever pre-main threads running. Good point.

if (Main::finalised()) {
// This may still cause a SEGFAULT due to missing destruction order at the end of library life time
// - only solution is then to use an empty channel which is harmless in a destructed state
thread_local Channel postMainError(new PrefixTarget("POST-MAIN-ERROR", new OStreamTarget(std::cout)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static objects are destructed in reverse order of construction. Function statics are constructed on first use, i.e. the Channel object will be destroyed before all objects that write log messages are destructed.

Ideally we do not rely on static initialization order at all and explicitly call all initialization code in main, then it is straight forward to start logging as one of the first systems.

For the shutdown scenario we could opt to leak memory and simply heap allocate all Channels and never destruct them.


//----------------------------------------------------------------------------------------------------------------------

CASE("Log: output " + std::to_string(logSize) + " logs by " + std::to_string(numThreads) + " threads") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be extended with any additional log targets as they use multiple write calls if I understand the code correctly to produce a single log message

Comment on lines +18 to +22
ecbuild_add_test( TARGET eckit_test_log_threads_hammer
SOURCES test_log_threads_hammer.cc
ENVIRONMENT _TEST_ECKIT_HOME=/tmp/$ENV{USER}
LIBS eckit )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit torn about adding this to the default test set because this test may pass (or most likely will) even if there is a bug as it is about trying to trigger a race condition. I don't know what the current testing strategy is for test that do not reliably produce a clear good/bad distinction

static StaticMutex local_mutex;
static Main* instance_ = nullptr;

//----------------------------------------------------------------------------------------------------------------------

Main::Main(int argc, char** argv, const char* homeenv) :
taskID_(-1), argc_(argc), argv_(argv), home_("/"), debug_(false) {

std::atexit(finalise);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware that this may still interleave with static object destruction, see https://en.cppreference.com/w/cpp/utility/program/exit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants