Skip to content

Implement SQLCommenter support #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Implement SQLCommenter support #176

wants to merge 3 commits into from

Conversation

iand675
Copy link
Owner

@iand675 iand675 commented Feb 13, 2025

This adds support for the sqlcommenter spec, as outlined by folks at Google. We're using it at https://github.com/MercuryTechnologies for improved correlation between SQL queries and the rest of the OTel ecosystem. The main area we're seeing benefits is via https://pganalyze.com/, which has native support for the format. Namely, we can have "explain analyze" traces added to our existing application traces to help us understand why certain queries behave badly in production.

We've used this in prod for about 10 months, so I'm not particularly concerned about the efficacy of the implementation, but I do want to ensure that it builds against API changes & is updated on a similar cadence to other packages in this repo.

Why should this package live in this repo? Because Google donated it to the OpenTelemetry community: https://cloud.google.com/blog/products/databases/sqlcommenter-merges-with-opentelemetry


intercalate :: (Monoid a, Foldable f) => a -> f a -> a
intercalate delim l = mconcat (intersperse delim l)
{-# INLINE intercalate #-}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are foldable versions of the list equivalent.

-- SQL Commenter spec wants them escaped with a slash, but this should
-- probably solve the same issue
unreservedQS :: Word8Set
unreservedQS = foldr insert SqlCommenter.empty $ map c2w "-_.~'"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds a data structure that can perform fast checks to determine whether to escape a character or not for a URL. This is similar to @ekmett's charset package, but with a much more constrained purposed, and is intended to avoid incurring extra dependencies. https://hackage.haskell.org/package/charset-0.3.11/docs/Data-CharSet.html

Comment on lines +44 to +54
sqlCommenterKey :: Ctxt.Key (Map Text Text)
sqlCommenterKey = unsafePerformIO $ Ctxt.newKey "sqlcommenter-attributes"
{-# NOINLINE sqlCommenterKey #-}

lookupSqlCommenterAttributes :: Ctxt.Context -> Map Text Text
lookupSqlCommenterAttributes ctxt = case Ctxt.lookup sqlCommenterKey ctxt of
Nothing -> mempty
Just attrs -> attrs

getSqlCommenterAttributes :: IO (Map Text Text)
getSqlCommenterAttributes = lookupSqlCommenterAttributes <$> TL.getContext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess sqlCommenterKey has to be a global variable b/c of how Vault works, and since it's an implementation detail that never leaks beyond the scope of this module there shouldn't be a problem here?

checking my understanding: we can't make sqlCommenterKey :: IO _ because we need lookupSqlCommenterAttributes to have a pure API for any callers who might need to use it in a pure context (even though getSqlCommenterAttributes :: IO _).

Copy link
Owner Author

@iand675 iand675 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The main thing is that sqlCommenterKey needs to be stable across uses. If we had users intitialize the key and pass it in any time they want to use it to look up the sqlcommenter attributes, then we could make it be an IO value. However, I don't think that's a great experience, and is relatively error prone compared to just making sure that there's a stable reference available.

And, while it's not exported currently, there might be a world in which we want to support adding attributes at a thread-local storage level, which necessitates the stable key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this sort of context in newKey docs, so it might be worth adding to a Haddock on sqlCommenterKey at least. up to you, though

& M.insert "traceparent" (unsafeConvert traceparent)
& M.insert "tracestate" (unsafeConvert tracestate)
where
unsafeConvert = B.toText . B.unsafeFromByteString
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeSpanContext gives us two ByteStrings.

it looks like these are produced by traceparentHeader, which appears to construct these directly using a combination of primitives (char7 '-', word8HexFixed on the Word8 stored in TraceFlags) & Base16-encoded SpanId & TraceId values.

You've got tests for this, so presumably something upholds the invariant that all of these must be UTF-8 encoded text but I'm having trouble understand what exactly this is (if it is anything besides convention).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The W3C traceparent and tracestate headers by definition are US-ASCII encoded, so we don't have to think super hard about this assuming that import OpenTelemetry.Propagator.W3CTraceContext (encodeSpanContext) is implemented correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha okay, that's the missing link for me; I would maybe add this in a comment onto unsafeConvert in the same manner that Rust encourages documenting assumed invariants around unsafe blocks just so that it's clear.

Comment on lines +20 to +28
specify "Does not add comment if the query already has a comment" $ do
let queries =
[ "SELECT * FROM table -- noodle"
, "SELECT * FROM table -- noodle\n"
, "SELECT * FROM table /* noodle\n poodle\n*/"
]
someAttrs = M.fromList [("foo", "bar")]
for_ queries $ \query -> do
sqlCommenter query someAttrs `shouldBe` query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow what this is testing; someAttrs is just ignored entirely and the query is returned as-is because there is an existing comment?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Per the sqlcommenter spec:

If a comment already exists within a SQL statement, we MUST NOT mutate that statement.

Comment on lines +36 to +52
specify "Span attributes are picked up from thread-local context" $ do
tp <- createTracerProvider [] emptyTracerProviderOptions
let t = makeTracer tp (InstrumentationLibrary "test" "test") tracerOptions
inSpan t "test" defaultSpanArguments $ do
attrs <- getSqlCommenterAttributesWithTraceData
M.lookup "traceparent" attrs `shouldSatisfy` isJust
M.lookup "tracestate" attrs `shouldSatisfy` isJust
specify "Parsing a queries reads attributes from the first comment" $ do
let query1 = "SELECT * FROM table -- foo='bar'\n-- bar='baz'"
query2 = "SELECT * FROM table /* noodle='poodle',wibble='wobble'*/ /* foo='bar' */"
parseFirstSqlComment query1 `shouldBe` M.fromList [("foo", "bar")]
parseFirstSqlComment query2 `shouldBe` M.fromList [("noodle", "poodle"), ("wibble", "wobble")]
specify "Parsing a query decodes encoded bytes" $ do
hedgehog $ do
let attrList = M.fromList <$> Gen.list (Range.linear 0 30) ((,) <$> textValGen <*> textValGen)
kvs <- forAll attrList
kvs === parseFirstSqlComment (sqlCommenter "" kvs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these tests cover the case where sqlCommenter actually modifies the query, right?

I think it'd be useful to add an end-to-end to check that span attributes get picked up tacked onto a query in comments according to the sqlcommenter example documentation (or something).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Will add tests for this.

@michaelpj
Copy link
Collaborator

I think this could do with a bit of explanation for how to actually use it? As far as I can tell, there are several things here:

  1. Support for adding sqlcommenter comments to queries. This seems useful and arguably deserves its own library, which need not depend on any OTel bits?
  2. Support for a named section of the thread-local context which is used for special attributes associated with sqlcommenter
  3. Support for adding some OTel specific attributes into the special section

I guess the expectation is that on top of this you:

  1. Modify the instrumentation libraries for databases to call getSqlCommenterAttributesWithTraceData and then use sqlcommenter to actually put them in queries.
  2. Tell people to add things into the special sqlcommenter section in order to get them into their queries.

I guess I'm a bit unsure about the "action-at-a-distance" transmission of the attributes. This requires knowing to put attributes in a special place, that is a) specific to databases, and b) specific to a particular method of adding information into queries. As a user, I maybe don't even know that the postgresql-simple instrumentation library is using sqlcommenter - should I need to know that? If we were going to have a special section for such attributes, perhaps it should be called postgres or database?

An alternative would be to use something like the existing support for "carry-on" attributes. Then if a higher-scope piece of code wants to add an attribute that should get into the attributes of a database span, it can add it to the carry-ons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants