Skip to content

Integrate with OpenTelemetry #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 28, 2025

Conversation

abicky
Copy link
Contributor

@abicky abicky commented Apr 15, 2025

This PR is the fifth phase of OpenTelmetry integration described on #153 and I have added the "OpenTelmetry integration" section to the REAME.md.
Although there are a lot of changes, I think you can easily understand them by looking at each commit one by one.
I will also create another PR to integrate with OpenTelemetry logs (#163).

image

Breaking changes

This PR also removes --prometheus-addr flag. To keep the old behavior, the following environment variables are required along with --enable-telemetry flag or YRLY_ENABLE_TELEMETRY=true:

OTEL_TRACES_EXPORTER=none
OTEL_METRICS_EXPORTER=prometheus
OTEL_LOGS_EXPORTER=none
OTEL_EXPORTER_PROMETHEUS_PORT=2223

@abicky abicky requested a review from a team as a code owner April 15, 2025 04:12
abicky added 10 commits April 15, 2025 13:12
If we use `cmd.Context()`, metric.Shutdown never flush metrics
when the process receives SIGINT or SIGTERM.

Signed-off-by: Takeshi Arabiki <[email protected]>
This commit also removes `--prometheus-addr` flag.
To keep the old behavior, the following environment variables
are required along with `--enable-telemetry` flag:

```
OTEL_TRACES_EXPORTER=none
OTEL_METRICS_EXPORTER=prometheus
OTEL_LOGS_EXPORTER=none
OTEL_EXPORTER_PROMETHEUS_PORT=2223
```

Signed-off-by: Takeshi Arabiki <[email protected]>
Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 6dc3b6c to 7031b9b Compare April 15, 2025 04:13
@abicky abicky requested a review from siburu April 15, 2025 04:16
@abicky abicky self-assigned this Apr 15, 2025
)

// ProvableChain represents a chain that is supported by the relayer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the code related to ProvableChain to core/provable-chain.go and added some methods for tracing.

Copy link
Contributor Author

@abicky abicky Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some methods for tracing.

I have removed these methods and added tracing bridges in b18cf1e


// ProvableChain represents a chain that is supported by the relayer.
//
// It wraps primary methods of the Chain and Prover interfaces with tracing.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see below, we can trace the methods defined in the interfaces without changing the tendermint module methods and we can know which module methods were called by referring to the package attribute:

image

Copy link

@mizuochik mizuochik Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design is a bit awkward from the following perspectives. Given the following background, it seems better to exclude this implementations from this PR.

  1. ProvableChain is not Prover nor Chain itself, so it is counterfactual to set the their spans as Prover's or Chain's.
  2. We want to set the span at the module boundary, but in this case, it is natural for the span to start on the module side.

Also, especially for chain modules, the trace implementation on the chain module side should have a high priority in order to output the tx.hash attribute. In this case, the usefulness of this design is reduced.

Copy link
Contributor Author

@abicky abicky Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I also feel this implementation is weird from the perspective of the responsibility of ProvableChain. At first, I tried to introduce a chain tracer bridge and prover tracer bridge as follows:

diff --git a/core/config.go b/core/config.go
index 9ab7d3c..89fc757 100644
--- a/core/config.go
+++ b/core/config.go
@@ -134,7 +134,7 @@ func (cc ChainProverConfig) Build() (*ProvableChain, error) {
        if err != nil {
                return nil, err
        }
-       return NewProvableChain(chain, prover), nil
+       return NewProvableChain(chainTracerBridge{chain}, proverTracerBridge{prover}), nil
 }

 func SyncChainConfigsFromEvents(ctx context.Context, pathName string, msgIDsSrc, msgIDsDst []MsgID, src, dst *ProvableChain) error {

But I found some modules assumed that ProvableChain.Chain and ProvableChain.Prover returns a struct pointer of a certain module: https://github.com/hyperledger-labs/yui-relayer/blob/v0.5.13/chains/tendermint/cmd/light.go#L43-L44

So, I decided to add tracing logic to ProvableChain.
As for the responsibility of ProvableChain, I tried to re-implement the tracing logic in another barnch: abicky@e5bd577.
Although we have to modify some code in other modules, I think this implementation is much better.

As for your point, I agree that each module should modify its code to add the tx_hash span attribute and the current implementation cannot take care of all the tracing logic.
However, it can reduce a lot of burden in implementing tracing logic in each module and keep the code tidy.
Additionally, we can see insightful information even if the developer does not implement the logic.

Copy link
Contributor Author

@abicky abicky Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To conclude, in my opinion, it would be better to apply the changes in abicky@e5bd577 to this PR, adding some tests of ProvableChain.As.
This approach seems to be natural considering that there are tracer bridges such as otelhttp.
What do you think about that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximately good! The following improvements exist.

If you publish chainTracerBridge and proverTracerBridge and initialize them in each module's XxxConfig.Build, it looks good from the following points.

  • Modules are wrapped explicitly, so module developers can use type assertions without surprise
  • Each spans are started as module side code
  • Consistent implementation in modules that wrap other modules, such as lcp-go

Copy link
Contributor Author

@abicky abicky Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with you and I have re-implemented the tracing logic in 1b9c119.

Here is the summary of the discussion:

Final design

  • yui-relayer Introduces tracing bridges so that other modules can trace primary methods with minimum effort
  • External modules need to wrap a Chain struct and Prover struct in ChainConfig.Build and ProverConfig.Build to enable tracing
    • They must be careful if they use type assertions to access a Chain struct and Prover struct

Rejected approaches

  • Implement tracing in ProvableChain
    • This approach is the easiest and has no side effects, but violates the responsibility of ProvableChain
  • Wrap chain modules and prover modules with tracing bridges in NewProvableChain
    • This approach introduces a breaking change that can lead to runtime panics
      • For example, c.Chain.(*tendermint.Chain) panics at runtime even if a build is successful
  • Leave tracing up to each module
    • This approach shifts the burden of tracing implementation onto developers and can make the code messy

AttributeKeyDirection = attribute.Key("direction")
AttributeKeyPath = attribute.Key("path")
AttributeKeyRevisionNumber = attribute.Key("revision_number")
AttributeKeyRevisionHeight = attribute.Key("revision_height")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming revision.number and revision.height?

It seems natural to assign the revision namespace following the OpenTelemetry naming convention.

  1. A term revision has an independent meaning/object in IBC
  2. revision has multiple sub properties: revision.number and revision.height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out that! I couldn't find the document on the naming convention when I implemented the code 💦

According to https://ibc.cosmos.network/main/ibc/overview/#ibc-client-heights, is it better to use the namespace "height" as follows?

diff --git a/core/attribute.go b/core/attribute.go
index 8f4f609..d3d5570 100644
--- a/core/attribute.go
+++ b/core/attribute.go
@@ -12,8 +12,8 @@ const (
        AttributeKeyPortID               = attribute.Key("port_id")
        AttributeKeyDirection            = attribute.Key("direction")
        AttributeKeyPath                 = attribute.Key("path")
-       AttributeKeyRevisionNumber = attribute.Key("revision_number")
-       AttributeKeyRevisionHeight = attribute.Key("revision_height")
+       AttributeKeyHeightRevisionNumber = attribute.Key("height.revision_number")
+       AttributeKeyHeightRevisionHeight = attribute.Key("height.revision_height")
        AttributeKeyPackage              = attribute.Key("package")
        AttributeKeyTxHash               = attribute.Key("tx_hash")
 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the attribute names in f6774c0


// ProvableChain represents a chain that is supported by the relayer.
//
// It wraps primary methods of the Chain and Prover interfaces with tracing.
Copy link

@mizuochik mizuochik Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design is a bit awkward from the following perspectives. Given the following background, it seems better to exclude this implementations from this PR.

  1. ProvableChain is not Prover nor Chain itself, so it is counterfactual to set the their spans as Prover's or Chain's.
  2. We want to set the span at the module boundary, but in this case, it is natural for the span to start on the module side.

Also, especially for chain modules, the trace implementation on the chain module side should have a high priority in order to output the tx.hash attribute. In this case, the usefulness of this design is reduced.

export OTEL_EXPORTER_CONSOLE_METRICS_WRITER=stderr
export OTEL_LOGS_EXPORTER=console
export OTEL_EXPORTER_CONSOLE_LOGS_WRITER=stderr
if [ -z "${OTEL_TRACES_EXPORTER-}" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Same to above.

export OTEL_METRICS_EXPORTER=console
export OTEL_EXPORTER_CONSOLE_METRICS_WRITER=stderr
fi
if [ -z "${OTEL_LOGS_EXPORTER-}" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Same to above.

export OTEL_TRACES_EXPORTER=console
export OTEL_EXPORTER_CONSOLE_TRACES_WRITER=stderr
fi
if [ -z "${OTEL_METRICS_EXPORTER-}" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Same to above.

@abicky abicky force-pushed the integrate-with-opentelemetry branch 3 times, most recently from 271d62d to 1b9c119 Compare April 17, 2025 08:23
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 1b9c119 to b18cf1e Compare April 17, 2025 08:57
@@ -258,6 +260,7 @@ func (c *Chain) rawSendMsgs(ctx context.Context, msgs []sdk.Msg) (*sdk.TxRespons
return res, false, nil
}

trace.SpanFromContext(ctx).SetAttributes(core.AttributeKeyTxHash.String(res.TxHash))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see below, this line sets tx_hash to the span created by the tracing bridge:

image

Copy link

@mizuochik mizuochik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good! I left the last comment.

}
}

func UnwrapChain(chain core.Chain) (core.Chain, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and UnwrapProver APIs seem redundant because:

  1. All use cases of UnwrapXxx are covered by core.AsXxx
  2. Users of UnwrapXxx has confidence what the concrete type is, so the users can use type assertion for unwrapping otelcore.Chain|Prover simply like chain.(*otelcore.Chain).Chain.(*userChain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wondered if we should add the functions, so I have deleted them and improved AsChain and AsProver in df80ae2

@abicky abicky force-pushed the integrate-with-opentelemetry branch from fb33323 to df80ae2 Compare April 18, 2025 06:59
@abicky abicky requested a review from mizuochik April 18, 2025 07:30
README.md Outdated

## OpenTelemetry integration

OpenTelemetry integration can be enabled by specifying the `--enable-telemetry` flag or by setting `YLRY_ENABLE_TELEMETRY` environment variable to true.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/YLRY_ENABLE_TELEMETRY/YRLY_ENABLE_TELEMETRY/

Copy link
Contributor Author

@abicky abicky May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you for pointing it out! I have fixed the typo in 09eee15.

Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 0e84e8b to 09eee15 Compare May 15, 2025 02:30
Copy link
Contributor

@siburu siburu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abicky I'm sorry for very late response...
Looks almost perfect, but I have put some comments, please check them.

@@ -40,7 +41,11 @@ func keysAddCmd(ctx *config.Context) *cobra.Command {
if err != nil {
return err
}
chain := c.Chain.(*tendermint.Chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsChain and AsProver appear to be too flexible and may cause unexpected behaviour.
Why not simply rewrite this as follows?

chain := c.Chain.(*otelcore.Chain).Chain.(*tendermint.Chain)

Copy link
Contributor Author

@abicky abicky May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't mind the possibility of a panic, we can simply use type assertions.
However, if we want to provide a more user-friendly error message, the code becomes a bit more complex as follows:

var chain *tendermint.Chain
if otelcoreChain, ok := c.Chain.(*otelcore.Chain); ok {
	if tmChain, ok := otelcoreChain.Chain.(*tendermint.Chain); ok {
		chain = tmChain
	} else {
		return fmt.Errorf("Chain %q is not a tendermint.Chain", args[0])
	}
} else {
	if tmChain, ok := c.Chain.(*tendermint.Chain); ok {
		chain = tmChain
	} else {
		return fmt.Errorf("Chain %q is not a tendermint.Chain", args[0])
	}
}

Actually, without AsChain or AsProver, the changes in the PR datachainlab/lcp-go#51 would be much messier.

Additionally, AsChain and AsProver allow module developers to easily wrap other modules' methods without caring about breaking changes (Decorator pattern).
The same applies to the github.com/hyperledger-labs/yui-relayer module. If we want to introduce a wrapper (decorator) similar to the tracing bridges in the future, we can easily introduce it because we can assume that other developers use AsChain or AsProver, not type assertions.

Copy link
Contributor

@siburu siburu May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abicky Since each relayer module knows how to unwrap its own concrete struct, it can define a unwrapping function once and use it everywhere. Only unwrapping functions become messy even if many decorators are applied to Chain or Prover.

Example:

func unwrapChain(c *core.ProvableChain) (*tendermint.Chain, error) {
	if otelcoreChain, ok := c.Chain.(*otelcore.Chain); ok {
		if chain, ok := otelcoreChain.Chain.(*tendermint.Chain); ok {
			return chain, nil
		} else {
			return nil, fmt.Errorf("Not a tendermint.Chain")
		}
	} else {
		return nil, fmt.Errorf("Not a otelcore.Chain")
	}
}

Copy link
Contributor Author

@abicky abicky May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this design in an online meeting and decided to introduce AsChain and AsProver.

Here is a summary of our discussion:

Masa-san's concerns

AsChain and AsProver appear to be too flexible and may cause unexpected behaviour.

Masa-san feels a bit uncomfortable using reflection and string to access a struct field

Additionally, AsChain and AsProver allow module developers to easily wrap other modules' methods without caring about breaking changes (Decorator pattern).

The module that is most likely to introduce such wrappers is the yui-relayer core module, and if so, the core module can provide a helper function to unwrap all the structs. Here is example code:

// core module provides this function
func UnwrapChain(c *core.ProvableChain) (core.Chain, error) {
	otelcoreChain, ok := c.Chain.(*otelcore.Chain)
	if !ok {
		return nil, fmt.Errorf("Not a otelcore.Chain")
	}

	anotherWrappedChain, ok := otelcoreChain.Chain.(*another.Chain)
	if !ok {
		return nil, fmt.Errorf("Not a another.Chain")
	}

	return anotherWrappedChain.Chain, nil
}

// each module implements a helper function if needed
func someFunction(c core.Chain) (*my.Chain, error) {
	if unwrapped, err := core.UnwrapChain(c); err != nil {
		return nil, err
	}
	chain, ok := unwrapped.(*my.Chain)
	if !ok {
		return nil, fmt.Errorf("Not a my.Chain")
	}
	return chain, nil
}

if we want to provide a more user-friendly error message

Failure of type assertions is usually caused by incorrect implementation and module developers can fix it before releasing the new code using type assertions, so we can use type assertions without the second argument. That can make type assertion code simpler.

My perspective

Failure of type assertions is usually caused by incorrect implementation

With the previous implemention, a panic can happen when a user runs some commands such as yrly tendermint keys with unexpected commandline option, so we should provide a user-friendly error message if we use type assertions.

it can define a unwrapping function once and use it everywhere

AsChain, AsProver can return a user-friendly error similar to fmt.Errorf("expected chain type is %T, but got %T", &ethereum.Chain{}, chain), so we can replace existing code with AsChain and AsProver more easily.

Although I don't have a strong preference, I'd like to go with the current implementation if Masa-san feels either is okay.

abicky and others added 4 commits May 19, 2025 15:46
Co-authored-by: YOSHIDA Masanori <[email protected]>
Signed-off-by: Takeshi Arabiki <[email protected]>
This commit also initializes the OTel SDK before initializing
chain/prover modules to handle logs produced in ctx.InitConfig.

Signed-off-by: Takeshi Arabiki <[email protected]>
Signed-off-by: Takeshi Arabiki <[email protected]>
abicky added a commit to abicky/yui-relayer that referenced this pull request May 19, 2025
abicky added a commit to abicky/yui-relayer that referenced this pull request May 19, 2025
@abicky abicky force-pushed the integrate-with-opentelemetry branch from de90985 to 9d5cbde Compare May 19, 2025 10:41
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 9d5cbde to d9da792 Compare May 19, 2025 10:42
@abicky abicky requested a review from mizuochik May 20, 2025 01:03
@abicky
Copy link
Contributor Author

abicky commented May 20, 2025

@mizuochik I have made some changes based on Masa-san's feedback. Can you take a look? 09eee15...d9da792

abicky added a commit to abicky/yui-relayer that referenced this pull request May 22, 2025
This is the continuation of the PR
hyperledger-labs#162.

Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky force-pushed the integrate-with-opentelemetry branch from d1b0778 to 637b496 Compare May 22, 2025 08:23
abicky added a commit to abicky/yui-relayer that referenced this pull request May 22, 2025
This is the continuation of the PR
hyperledger-labs#162.

Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 637b496 to 2bb62df Compare May 22, 2025 08:48
abicky added a commit to abicky/yui-relayer that referenced this pull request May 22, 2025
This is the continuation of the PR
hyperledger-labs#162.

Signed-off-by: Takeshi Arabiki <[email protected]>
Because we found another way to provider similar functions
without using reflection.

Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky force-pushed the integrate-with-opentelemetry branch from 2bb62df to ac09998 Compare May 22, 2025 09:26
abicky added a commit to abicky/yui-relayer that referenced this pull request May 22, 2025
This is the continuation of the PR
hyperledger-labs#162.

Signed-off-by: Takeshi Arabiki <[email protected]>
@abicky abicky requested a review from siburu May 23, 2025 00:32
Copy link
Contributor

@siburu siburu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abicky LGTM. Thank you for your great work!

@siburu siburu merged commit 8317760 into hyperledger-labs:main May 28, 2025
9 checks passed
abicky added a commit to abicky/yui-relayer that referenced this pull request May 28, 2025
This is the continuation of the PR
hyperledger-labs#162.

Signed-off-by: Takeshi Arabiki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants