Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#144): Handle null date in telemetry doc #145

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 23 additions & 20 deletions libs/medic-users-meta/migrations/202105171933.do.86.usersMeta.sql
Original file line number Diff line number Diff line change
Expand Up @@ -24,26 +24,29 @@ CREATE INDEX idx_useview_feedback_period_start_user ON useview_feedback(period_s
CREATE MATERIALIZED VIEW useview_telemetry AS
SELECT
doc->>'_id' AS uuid,
CONCAT_WS( --> Date concatenation from JSON fields, eg. 2021-5-17
'-',
doc#>>'{metadata,year}', --> year
CASE --> month of the year
WHEN
string_to_array(substring(doc#>>'{metadata,versions,app}' FROM '(\d+.\d+.\d+)'),'.')::int[] < '{3,8,0}'::int[]
THEN
(doc#>>'{metadata,month}')::int+1 --> Legacy, months zero-indexed (0 - 11)
ELSE
(doc#>>'{metadata,month}')::int --> Month is between 1 - 12
END,
CASE --> day of the month, else 1
WHEN
(doc#>>'{metadata,day}') IS NOT NULL
THEN
doc#>>'{metadata,day}'
ELSE
'1'
END
)::date AS period_start,
(CASE
Copy link
Member

@kennsippell kennsippell Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think migrations run twice, so you'll have to create new which REPLACE this view instead of editing this definition. Correct me if I'm wrong.

You can test by:

  1. Deploying production couch2pg. Seeing this bug
  2. Upgrading to this version.
  3. Is it fixed?

I believe the migraiton won't re-run on upgrade...

Hopefully we can replace the definition and not drop... because that causes a cascade drop :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've made the changes.

I had to use drop cascade since materialized views can not be replaced like view CREATE OR REPLACE.

Copy link
Member

@kennsippell kennsippell Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's pretty scary then... who knows what we are going to cascade delete...
i'm not sure how we message this - but i think we need to be really clear about this somehow. can you ask on #development or forum or something?

WHEN doc#>>'{metadata,year}' IS NULL THEN '1970-1-1'
ELSE
CONCAT_WS( --> Date concatenation from JSON fields, eg. 2021-5-17
'-',
doc#>>'{metadata,year}', --> year
CASE --> month of the year
WHEN
string_to_array(substring(doc#>>'{metadata,versions,app}' FROM '(\d+.\d+.\d+)'),'.')::int[] < '{3,8,0}'::int[]
THEN
(doc#>>'{metadata,month}')::int+1 --> Legacy, months zero-indexed (0 - 11)
ELSE
(doc#>>'{metadata,month}')::int --> Month is between 1 - 12
END,
CASE --> day of the month, else 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this isn't your code, but cleaner to use COALESCE here instead of CASE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to a function so that it's more readable and single place to fix if any issue.

WHEN
(doc#>>'{metadata,day}') IS NOT NULL
THEN
doc#>>'{metadata,day}'
ELSE
'1'
END)
END)::date AS period_start,
doc#>>'{metadata,user}' AS user_name,
doc#>>'{metadata,versions,app}' AS app_version,
doc#>>'{metrics,boot_time,min}' AS boot_time_min,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,28 @@ SELECT
doc #>> '{_id}' AS telemetry_doc_id,
doc #>> '{metadata,deviceId}' AS device_id,
doc #>> '{metadata,user}' AS user_name,

concat_ws(
'-',
doc #>> '{metadata,year}',
CASE
WHEN
doc #>> '{metadata,day}' IS NULL
AND (
doc #>> '{metadata,versions,app}' IS NULL
OR string_to_array("substring"(doc #>> '{metadata,versions,app}', '(\d+.\d+.\d+)'), '.')::integer[] < '{3,8,0}'::integer[]
)
THEN (doc #>> '{metadata,month}')::integer + 1
ELSE (doc #>> '{metadata,month}')::integer
END,
CASE
WHEN doc #>> '{metadata,day}' IS NOT NULL
THEN doc #>> '{metadata,day}'
ELSE '1'
END
)::date AS period_start,
(CASE
WHEN doc#>>'{metadata,year}' IS NULL THEN '1970-1-1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be null instead of random date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kennsippell , There'll be difference in results when using NULL vs 1970-1-1. Since we have done DISTINCT ON (doc #>> '{metadata,deviceId}', doc #>> '{metadata,user}'), If we put NULL on our field, the NON-NULL value will be selected by default which in our case is first record and old record. If we put 1970-1-1, it will consider the latest data aailable for those user and device. Since in our case, most of the records with null are recent, we will not be creating metadata for recent numbers.

With this, there'll be two cases.

  1. Keep NULL, and get older records, most likely prior to 4.5.0 upgrades, period_start will have good older date and old data will be available.
  2. Keep 1970-1-1 and get recent record for that user and device. period_start will be 1970-1-1 in this case and latest data available.

Let me know what'd you prefer here.

Copy link
Member

@kennsippell kennsippell Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC. The distinct on should pick the first value ordered by device_id, user_name, period_start. NULL is always sorted last by postgres. 1970 is likely also always going to be last, so they seem pretty close in functionality but not identical (did i miss something?)

A third option is that we could use the 4.5.0 release date, since we know the telemetry doc is "at least that old" ? Nov-20-2023. Maybe that is getting too weird.

I think all three are decent options. You probably know best since you're deeper in the data atm. I'd likely go with NULL just to avoid "magic values" - but have no strong opinion and would respect either choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have issue on cht-core to fix this problem of null telemetry dates, it doesn't really matter once that's fixed on what we've put here for useview_telemetry_devices. For other views, I opted to go with NULL.

ELSE
concat_ws(
'-',
doc #>> '{metadata,year}',
CASE
WHEN
doc #>> '{metadata,day}' IS NULL
AND (
doc #>> '{metadata,versions,app}' IS NULL
OR string_to_array("substring"(doc #>> '{metadata,versions,app}', '(\d+.\d+.\d+)'), '.')::integer[] < '{3,8,0}'::integer[]
)
THEN (doc #>> '{metadata,month}')::integer + 1
ELSE (doc #>> '{metadata,month}')::integer
END,
CASE
WHEN doc #>> '{metadata,day}' IS NOT NULL
THEN doc #>> '{metadata,day}'
ELSE '1'
END)
END)::date AS period_start,


doc #>> '{device,deviceInfo,hardware,manufacturer}' AS device_manufacturer,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,20 @@ WITH telemetry_docs_with_metric_blob AS (
doc #>> '{metadata,deviceId}' AS device_id,
doc #>> '{_id}' AS telemetry_doc_id,
doc #>> '{metadata,user}' AS user_name,
concat_ws(
'-', doc #>> '{metadata,year}',
CASE
WHEN (doc #>> '{metadata,day}') IS NULL AND ((doc #>> '{metadata,versions,app}') IS NULL OR string_to_array("substring"(doc #>> '{metadata,versions,app}', '(\d+.\d+.\d+)'::text), '.'::text)::integer[] < '{3,8,0}'::integer[]) THEN ((doc #>> '{metadata,month}')::integer) + 1
ELSE (doc #>> '{metadata,month}')::integer
END,
CASE
WHEN (doc #>> '{metadata,day}') IS NOT NULL THEN doc #>> '{metadata,day}'
ELSE '1'::text
END
)::date AS period_start,
(CASE
WHEN doc#>>'{metadata,year}' IS NULL THEN '1970-1-1'
ELSE
concat_ws(
'-', doc #>> '{metadata,year}',
CASE
WHEN (doc #>> '{metadata,day}') IS NULL AND ((doc #>> '{metadata,versions,app}') IS NULL OR string_to_array("substring"(doc #>> '{metadata,versions,app}', '(\d+.\d+.\d+)'::text), '.'::text)::integer[] < '{3,8,0}'::integer[]) THEN ((doc #>> '{metadata,month}')::integer) + 1
ELSE (doc #>> '{metadata,month}')::integer
END,
CASE
WHEN (doc #>> '{metadata,day}') IS NOT NULL THEN doc #>> '{metadata,day}'
ELSE '1'::text
END)
END)::date AS period_start,
jsonb_object_keys(doc -> 'metrics'::text) AS metric,
doc -> 'metrics' -> jsonb_object_keys(doc -> 'metrics') AS metric_values
FROM couchdb_users_meta
Expand Down
Loading