Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to V4 protocol support #102

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nevans
Copy link
Contributor

@nevans nevans commented Jun 1, 2023

Recreating #101

lib/staccato/V4/event.rb Outdated Show resolved Hide resolved
It looks like it was meant to be moved by 20fd9a8, but only the copy
half of the rename was committed.
@nevans nevans mentioned this pull request Jun 1, 2023
nevans added 5 commits June 1, 2023 10:00
Add `api_secret` and `client_id` requirement to Tracker.  When they are
nil, NoopTracker will be used instead.

Require `client_id`.  This is a big change, but I believe it the most
appropriate approach to Google's v4 API.  Because access to standard
analytics `client_id` is at least partially a matter of end-user
consent, I believe it would be wrong for this library to automatically
generate a missing `client_id`.  Users of the gem may know that it is
reasonable and permissible to generate a missing `client_id`, but they
should manage that themselves.

Other general updates:
* Add many specs for V4::Tracker and V4::Event.
* Some changes to event method names to use nomenclature more similar to
  the Measurement API v4 documentation.
* Added `uri` methods to the Net::HTTP and Validation adapters, for both
  testing and debugging.
* Experimental support for auto-generating `timestamp_micros`.
* Moved all of the default_adapter related methods from V4 and
  V4::Tracker into V4::AdapterDefaults, and then extended V4 and
  V4::Tracker with them.  For V4::Tracker, this moves the methods up to
  the class level, but this also enables memoization at the class level.
* Updated `Staccato::Adapter::Validation` to accept a custom uri.

Updates to Event:
* Adds `Event.param` DSL, backward-compatible with `FIELDS`.  This is a
  step towards changing the `Event` module to a generic superclass.
* Return custom parameters from `#as_json`, in addition to the standard
  recommended parameters.
* Recursively encode `#as_json` parameter values, eg: `items: [...]`.
* Correctly encode Item, without requiring ActiveSupport.
* Add `#engagement_time_msec` and `#session_id` as recommended
  parameters.
* Add `#update`, for adding and updating parameters.
* Add `#apply_defaults`, for adding missing parameters to an event.

Updates to Tracker
* Add a bang-method version of each dynamically generated event type
  method, to immediately post all enqueued events.
* Add `tracker << event` as an alternative form of `#add`.
* Update `#add` to accept simple String, Symbol, and Hash events.
* Update `#add` to accept already constructed Event objects.
* Update `#add` to reverse merge with `#default_event_params`.
* Update `#event` to return a frozen duplicate array.  This protects
  against events being added directly.  `#add`, `#<<`, `#clear`, and
  `#trim` should provide adequate support for generic event data, so
  direct mutation of `#events` shouldn't be necessary.
* Update `#post` visibility to be public.
* Update `#post` to send in batches of 25 events, returning an array.
* Update `#post` to clear `#events` as they are posted.
* Add `#default_event_params`, which are applied to each event.
* Add basic support for `#user_id` and `#user_properties`.
* Convert `NoopTracker` to a subclass of `Tracker`.
* Clear `#events` after successfully calling `#track`.
* Update `#adapters` to memoize the default adapter on the first access.
  This simplifies adding a new adapter without losing the default.
* Document that `#validate!` sends only the first 25 events.
* Document that `#validate!` doesn't clear the #events array.
* Update `#validate!` to use the class-memoized validation adapter.
* Update `#add_adapter` to accept `:net_http`, `:validation`, or
  `:logger` with optional logger arguments.
* Add `tracker.clear` to clear the events array
* Add `tracker.trim` to drop the first events from the events array
Copy link
Owner

@tpitale tpitale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, have to comment here for them to not be pending? 🙃

lib/staccato/v4.rb Show resolved Hide resolved
# adds convert_booleans()
# The custom event parameters to be sent for this event
# @return (see params)
def custom_params
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be used yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the end-user to use as they see fit, and for documentation. All event params (both "recommended" and "custom") are stored on params and encoded by as_json.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hold off on this for now until there's a use for it before we add it to the public API.

lib/staccato/v4/events/share.rb Show resolved Hide resolved
lib/staccato/v4/events/item.rb Show resolved Hide resolved
lib/staccato/adapter/logger.rb Show resolved Hide resolved
lib/staccato/adapter/net_http.rb Show resolved Hide resolved
lib/staccato/adapter/net_http.rb Show resolved Hide resolved
.to_sym
end

def param(name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to discuss this DSL as a separate PR. Sorry if that's a pain …

I suspect if we do this now, we should just switch to this and get rid of FIELDS. I'd also like to see it documented in the README in a section on custom even classes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to have a DSL … maybe field AND fields? So we could replace the FIELDS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like field and fields and that matches the existing FIELDS constant, but the Measurement protocol API docs all use params, so I stuck with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect if we do this now, we should just switch to this and get rid of FIELDS. I'd also like to see it documented in the README in a section on custom even classes.

Yeah, I haven't documented anything in the README. 🙁

end

private
def ==(other) other.class == self.class && other.as_json == as_json end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used? I'm not seeing it, but I could be missing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used implicitly by the specs expect(actual).to eq(expected). It's just nice behavior to have on data container objects (whenever object identity matters less than values).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like as_json is an "expensive" method, but it's only really for the tests, cool.

Copy link
Contributor Author

@nevans nevans Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: expensive, I should probably add a short-circuit at the front of the boolean expression: super || ... to check object identity before checking the contents.

In this case, as_json is really just a way to iterate through all of the contents while also compacting away nils. And that's basically what's needed for checking value equality.

def self.validation_adapter
require 'staccato/adapter/validate'
Staccato::Adapter::Validate
def self.encode_params_json(params)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Is it just to remove empty hash/array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this was needed to recursively encode items: [Item.new] while also avoiding a dependency on ActiveSupport.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh … I figured as_json would recursively call as_json when it was being encoded by the JSON library. Maybe I didn't put as_json on Item? 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like struct has as_json https://ruby-doc.org/stdlib-2.5.1/libdoc/json/rdoc/Struct.html so hopefully that would simplify things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that stdlib's as_json extensions behave very differently from ActiveSupport's as_json. In particular, stdlib's Struct#as_json won't work for Item. Also, I wanted to compact away any nil values.

Stdlib's as_json feels more appropriate for serializing data similarly to how me might YAML or Marshal. ActiveSupport's is generally more appropriate for APIs like this one. This allows us to not care whether either extension is loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't put as_json on Item? 🤔

I might have been able to get away with just adding as_json on Item. But this removes nil values, allows custom structs, and just seemed more generically useful.

self.options = Staccato::OptionSet.new(options)
module ClassMethods

attr_reader :recommended_params
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be an attr_writer? I see def recommended_params below, and a usage in the param DSL that maybe was expecting a writer?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh … this is in ClassMethods … the other is the instance.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be recommended_fields … since it'll just be the field keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the ClassMethods half of the DSL. It could be renamed to recommended_param_names to avoid confusion?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah … GA used to call it "fields" now events have "parameters" … so yeah, recommended_param_names (though I do often misspell recommended 🙃 )

def with_defaults!(defaults)
defaults.to_hash.each_pair do |key, value|
key = key.to_sym
options[key] = value unless options.to_h.key?(key)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would ||= work here?

If not, should options have a has_key??

Copy link
Contributor Author

@nevans nevans Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, but I might have copied this implementation from ActiveSupport. They need to do it this way because they need to distinguish between nil values and no matching key. We don't need to make that distinction, so ||= should suffice.

default_event_params: {})
raise ArgumentError, 'measurement_id is required' unless measurement_id
raise ArgumentError, 'api_secret is required' unless api_secret
raise ArgumentError, 'client_id is required' unless client_id
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think letting client_id be a random UUID as it does in V3 is acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I puzzled over this one.

Requiring client_id is a big change, but I believe it is necessary for to truly satisfy Google's v4 API. As far as I can tell, there is no documentation anywhere for using anything other than the gtag or gtm generated client_id. The big push to v4 seems to be at motivated by regulatory compliance concerns. And my interpretation of the docs is that access to standard analytics client_id could potentially be a matter of end-user consent (depending on what you're tracking).

In other words, I believe it might be wrong for the library to automatically generate a missing client_id. If users of the gem think that it is reasonable and permissible to generate a missing client_id, they should manage that themselves.

Also, using a randomly generated UUID can result in tremendously inflated session counts and your boss telling you to try something else. Ask me how I know. 😉

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how a random UUID would be a matter of end-user consent.

I could see it being a session issue, as it's going to make a new session for each post.

Is there a way we could split the difference? Maybe make it a configuration somewhere? require_client_id: true by default, and a user could disable that with false to get the random UUID back?

Copy link
Contributor Author

@nevans nevans Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how a random UUID would be a matter of end-user consent.

Being forced get the real client_id also forces the developer into a position where it should be easy for them to also check for consent. That might encourage the developer to consider whether they're allowed to send everything they send.

Not knowing the real client_id suggests that we haven't even attempted to check whether we have end-user consent. We might have both on the frontend but never communicated it to the backend. Or we might not have a client_id because we don't have consent.

Either way, that doesn't mean we can't send any data, but (as I understand it) it does mean we shouldn't send certain private or user identifying data. It might be that your app is fine sending everything you intended to send. In my case, we will send certain user_properties and custom parameters and custom events if we have consent, and we won't if we don't.

I'd prefer a config setting to an always on default. But personally, I think it's better to force the developer to make their own decision about how to handle it:

  • the above reason: force the developer to consider the consequences.
  • Google's docs never give any valid alternatives to the official client_id.
  • Unique client IDs for each tracker instance causes an explosion in session count.
  • The app probably has internal session identifiers of its own that are more stable.

IMO it would be fine to document that SecureRandom.uuid seems to more-or-less work. But, assuming you're using rails, the following would work much better and we'd be better off recommending something like it:

class Current < ActiveSupport::CurrentAttributes
  attribute :ga_real_cid
  attribute :ga_fake_cid
  attribute :app_session_id

  def ga_cid = ga_real_cid || self.ga_fake_cid ||= app_session_id || SecureRandom.uuid

end

...using before_filters in ApplicationController to get and set these values appropriately. They could even be forwarded into background jobs and used there as well, using Current.set(ga_real_cid:, ga_fake_cid:, app_session_id:) { the bg job code }.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the info. As you know I haven't been working with this for awhile, so this is all very useful. I appreciate the time you've taken to explain it.

Okay, as long as we document what they'll have to do in case they just need to get it working ASAP, I'm fine moving forward with this being required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpitale thanks! and there's some other bits of this PR that could use more docs too. not least of which is adding something to the README.md. 😉

# @param logger [Logger] used when `adapter` is `:logger`
# @param formatter [Proc] used when `adapter` is `:logger`
def add_adapter(adapter, logger = nil, formatter = nil)
case adapter
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting a symbol here is interesting. I'm not sure it's a necessary feature for this PR. I haven't had a feature request for this, and I'd like to consider it more before adding it to the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really isn't necessary for this PR.

But logging is such a useful and basic feature that I wanted to remove barriers to adding it. Also... it simplified my app's code. ;)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a nice feature. I'm just trying to break down the PR. There's a LOT here 😄


# The required API Secret that is generated through the Google Analytics UI
# @return [String]
attr_reader :api_secret
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate having this documented. But I don't know that we need attr_reader for all of these used internally within the tracker. Again, this adds to the public API we would have to support going forward. Maybe not a huge deal since it matches what GA does … but still.

end
end

# A tracker which does no tracking
# Useful in testing
class NoopTracker
class NoopTracker < Tracker
attr_accessor :hit_defaults
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have removed this as it's not a hit anymore, just an event. Oops!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to delete that. Oops.

@tpitale
Copy link
Owner

tpitale commented Jun 1, 2023

Thank you again @nevans. I left a bunch of comments. Appreciate all the work. You can address the comments here, or split the PR up. I'll do my best to review and merge this.

@nevans
Copy link
Contributor Author

nevans commented Jun 1, 2023

@tpitale thanks for your time! I don't know if I'll split it up this week, but there are some obvious chunks to break off as separate PRs.

when :logger
require 'staccato/adapter/logger'
uri = self.class.collection_uri
adapter = Staccato::Adapter::Logger.new(uri, logger, formatter)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this follow the same pattern as the other adapters in the defaults module? self.class.logging_adapter(uri, logger, formatter)?

Copy link
Contributor Author

@nevans nevans Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that. I decided against it because those were effectively accessors for shared singleton-ish objects, and this would need to generate a new logger adapter every time.

But, on reconsidering the matter, it can still be useful to delegate the decision of how to generate that logger to the class.

@tpitale
Copy link
Owner

tpitale commented Jun 1, 2023

My laptop is dying and I need to get back to my vacation. I'll try to check back later.

@nevans
Copy link
Contributor Author

nevans commented Jun 1, 2023

My laptop is dying and I need to get back to my vacation. I'll try to check back later.

I probably won't get to this again before Monday, so please: enjoy your vacation!

@grega
Copy link

grega commented Jun 22, 2023

Thank you @nevans and @tpitale for putting all of the time and effort into this! Is there anything I can help with getting it over the line (before the looming deadline of July 1st hits)?

@tpitale
Copy link
Owner

tpitale commented Jun 22, 2023

I'm happy to move forward with this if @nevans commits to helping me do README/docs updates after this 😄

@nevans
Copy link
Contributor Author

nevans commented Jun 26, 2023

@grega @tpitale sorry for the slow follow up. Barring any unforseen circumstances, I'll definitely get around to cleaning up this PR eventually--both splitting it into smaller PRs and adding documentation. I'll try to work on it a little bit more this week (maybe even today?), but I probably can't commit to completing it before July 1st! (sorry)

I can say that I've been running my branch in production for for over six weeks, and I only have a few issues to report:

  1. I'm using this PR's branch in my Gemfile with gem "stacatto", github: "nevans/staccato", ref: "97-v4-protocol". Every time I've rebased this branch, it's invalidated the commit SHA1 in my Gemfile.lock. This breaks my CI and (if the bundle install docker layer is invalidated) my production deploys too. But I always run bundle update stacatto immediately afterward anyway, so it hasn't been a huge issue. Since I'm going to split this PR into smaller PRs anyway, I will commit to no more rebasing of this branch, nor will I delete the branch until everything is merged and released. I will merge any updates into this branch, to keep it up-to-date. So you should be able to use this PR's branch in your Gemfile and I won't break it on you (any more).
  2. Because Google doesn't provide any error response when you send invalid events, I've really wanted a config option to send some percentage of all events to the validation endpoint, in order to log the errors. I've had to copy events from my logfiles and manually hit the validation endpoint to debug some invalid schema issues. The v4 Measurement Protocol API really doesn't like it when you send numbers where it expects a string!
  3. I've had some issues with Google Ads conversions not connecting with purchase events sent from the backend... but I'm not sure if that's ever worked the way I had assumed it had. It most likely broke due to another change we made to our frontend javascript at the same time as we switched to this in the backend: we switched from a bunch of different tags to gtag.js and then to gtm.js. I really don't understand how everything connects to everything else in this Ads/Analytics/Tag Manager ecosystem, so... 🤷🏻

@tpitale
Copy link
Owner

tpitale commented Jun 26, 2023

@nevans As long as you feel you can help with docs at some point, I'll go ahead and merge this.

@nevans
Copy link
Contributor Author

nevans commented Jul 18, 2023

As long as you feel you can help with docs at some point, I'll go ahead and merge this.

@tpitale Sorry, I missed this... and I never got back to finishing this otherwise. I've been in the middle of a completely different project and it sucked away all of my attention.

This does need more documentation and I should eventually get to that (ideally before I completely forget what I was trying to do with this code!).

I can't say whether merging this before or after I start on docs PR(s) is a good idea. Breaking it into separate PRs is a good idea and it isn't a burden... to the contrary, splitting off smaller focused chunks might help the documentation project. And I did want to implement your suggestions as we discussed. On the other hand, if you simply merged this behemoth and released a new version, and I could stop using my fork... I certainly won't complain. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants