-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/http sink #58
Conversation
package com.snowplowanalytics.snowplow.eventgen | ||
|
||
import com.snowplowanalytics.snowplow.eventgen.tracker.HttpRequest | ||
import com.snowplowanalytics.snowplow.eventgen.tracker.HttpRequest.{Method => trackerMethod} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unusual to rename from Upper case name to lower case name. More normal to write {Method => TrackerMethod}
} | ||
*/ | ||
|
||
val address = Uri.fromString(properties.endpoint + generatedRequest.method.path.toString()) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments about this little section:
Easy one -- consider using IllegalArgumentException
instead of Exception
.
Another easy one -- It's unusual to use the .format
syntax in scala. It's more normal to write it like
s"blah blah ${properties.endpoint} blah blah ${generatedRequest.method.path}"
A nice change you could make is to move the Uri.fromString(...)
outside of this buildRequest
block. So you only run it once when the app first starts up. And you only need to handle the error scenario once.
Then, once you have generated a Uri
, you can use syntax like:
val address = baseUri / generatedRequest.method.path.vendor / generatedRequest.method.path.version
val httpClient = EmberClientBuilder.default[F].build | ||
|
||
st: Stream[F, Main.GenOutput] => | ||
st.map(_._3).map(buildRequesst).evalMap(req => httpClient.use(client => client.expect[String](req))).void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here when you call httpClient.use { ..... }
it creates and destroys the http client + all of the internal objects needed to implement it. Like a TCP connection pool etc.
The problem is, you call .use
for every single request. So you're continually creating/destroying the resources, which is quite wasteful.
It's better to find a way to .use
the http client just once, and then re-use it for every request. I'd be happy to show you a few ways to arrange the code for this.
case (k, v) if k != "Origin" => List(k, v) | ||
// We need to unpack multiple Origins in discrete key-value pairs | ||
// to play nicely with [HttpRequest.builder().headers()] | ||
case (k, v) => intersperse(k, v.split(",").toList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is the right approach. I think we can make it neater though!
@@ -69,6 +69,7 @@ object Config { | |||
case class File(path: URI) extends Output | |||
case class PubSub(subscription: String) extends Output | |||
case class Kafka(brokers: String, topic: String, producerConf: Map[String, String] = Map.empty) extends Output | |||
case class Http(endpoint: String) extends Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you made endpoint a org.http4s.Uri
instead of a String
in the config, then you wouldn't need to handle parsing exceptions later on when generating requests.
e6d1caf
to
6e80cce
Compare
9908aca
to
720f2d5
Compare
668e5d2
to
ad1698d
Compare
Note: HEAD requests are still not implemented. Using
Since I suspect that this is something to do with the request itself, rather than an expected error from the collector, I've left it for another day. I have also made a separate PR to make request paths configurable. With the current implementation, we can generate |
@istreeter it doesn't need your urgent attention, I've also flagged it to @peel who will likely soon be using the tool - but as an FYI/because I think you might be interested, see last comment for where things lie. |
Note: Formatting might be off. I realised that I hadn't run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of being able to tune HTTP request formats is fantastic and overall this feature is massively helpful. Great job!
Please do run formatting and remove dead code. A few extra (hopefully helpful) comments below.
|
||
def sink[F[_]: Async](properties: Config.Output.Http): Pipe[F, Main.GenOutput, Unit] = { | ||
|
||
val baseUri = Uri.fromString(properties.endpoint) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's one thing that I'd change is the org.http4s.Uri
instead of String
in the config. This way you don't need to additionally validate the config here. Therefore sink
becomes less defensive behavioural function. But the config is not affected, so we can do that at a better time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Peel I'm not sure I understand what you're saying here?
generatedRequest: HttpRequest | ||
): Request[F] = { | ||
|
||
// Origin headers are given to us a key and a comma separated string of values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice example where you'd be able to use a massively helpful quickcheck/property-based testing approach that's often used within Scala/FP-projects. We shouldn't do any of that now, but sharing for the sake of it.
The approach here would likely be:
- generate a collection of header name and it's values making sure
Origin
appears there at least once - collect them into a desired format, where each header's values are comma-separated - that will be the
headers
argument value - expected return value is the original collection where each key's value is a single
Raw
- run through
parseHeaders
and compare input and output
The benefit of this approach is we cover all the scenarios and hint that we don't really care about the values but the shape of it. Next time we get around to looking at the kind of logic we'd know what the author cared about and the comment itself becomes a spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so I didn't touch the data generation part, but I did feel that tweaks to that implementation could make our lives easier here/make our approach more idiomatic - which I think is what you're getting at here, if I understand correctly.
} | ||
// TODO: Some code repetition can probably be removed from this | ||
|
||
return req |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not a reason for changing anything. But maybe that's a personal preference, but I've always found a function should have a single reason to change. Never been a massive fan of internal methods especially as it makes them hard to test individually.
I'd structure this function in a way that address, body, parseHeaders are separate functions, this way buildRequest could potentially look like this:
def buildRequesst(generatedRequest: HttpRequest): Request[F] = generatedRequest.method match {
case TrackerMethod.Post(_) => Request[F](...)
...
}
def address(_) = _
def body(_) = _
def parseHeaders(_) = _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my approach was to implement something that works, then look at how to refactor. I don't think I disagree with your perspective but at the same time, as things stood I didn't see an obvious way that a refactor would simplify/improve things - at least from the point of view that actually we don't seem to be repeating anything necessarily.
Not that I think it can't be factored better/more nicely - I just didn't see a quick win and didn't want to let perfect be the enemy of good. If support rota is quiet for the rest of the week I might get the chance to revisit this. :)
Edit: I have just realised it's already got approval. I'll rebase and merge, apologies for unnecessary noise. |
97e3a68
to
fe5c408
Compare
Draft PR for http sink.
Notes
TODO this release:
To make tickets for:
7
character at the end of the b64 when decoded: Querystring data is often corrupt #60