Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#144): Handle null date in telemetry doc #145

Closed
wants to merge 3 commits into from
Closed

fix(#144): Handle null date in telemetry doc #145

wants to merge 3 commits into from

Conversation

1yuv
Copy link
Member

@1yuv 1yuv commented Jan 31, 2024

Fix #144 by assigning default date of 1970-1-1 where dates are not available.

@1yuv 1yuv requested a review from kennsippell January 31, 2024 23:25
Copy link
Member

@kennsippell kennsippell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution. period_start getting even wilder....

'1'
END
)::date AS period_start,
(CASE
Copy link
Member

@kennsippell kennsippell Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think migrations run twice, so you'll have to create new which REPLACE this view instead of editing this definition. Correct me if I'm wrong.

You can test by:

  1. Deploying production couch2pg. Seeing this bug
  2. Upgrading to this version.
  3. Is it fixed?

I believe the migraiton won't re-run on upgrade...

Hopefully we can replace the definition and not drop... because that causes a cascade drop :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've made the changes.

I had to use drop cascade since materialized views can not be replaced like view CREATE OR REPLACE.

Copy link
Member

@kennsippell kennsippell Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's pretty scary then... who knows what we are going to cascade delete...
i'm not sure how we message this - but i think we need to be really clear about this somehow. can you ask on #development or forum or something?

ELSE
(doc#>>'{metadata,month}')::int --> Month is between 1 - 12
END,
CASE --> day of the month, else 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this isn't your code, but cleaner to use COALESCE here instead of CASE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this code to a function so that it's more readable and single place to fix if any issue.

END
)::date AS period_start,
(CASE
WHEN doc#>>'{metadata,year}' IS NULL THEN '1970-1-1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be null instead of random date?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kennsippell , There'll be difference in results when using NULL vs 1970-1-1. Since we have done DISTINCT ON (doc #>> '{metadata,deviceId}', doc #>> '{metadata,user}'), If we put NULL on our field, the NON-NULL value will be selected by default which in our case is first record and old record. If we put 1970-1-1, it will consider the latest data aailable for those user and device. Since in our case, most of the records with null are recent, we will not be creating metadata for recent numbers.

With this, there'll be two cases.

  1. Keep NULL, and get older records, most likely prior to 4.5.0 upgrades, period_start will have good older date and old data will be available.
  2. Keep 1970-1-1 and get recent record for that user and device. period_start will be 1970-1-1 in this case and latest data available.

Let me know what'd you prefer here.

Copy link
Member

@kennsippell kennsippell Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC. The distinct on should pick the first value ordered by device_id, user_name, period_start. NULL is always sorted last by postgres. 1970 is likely also always going to be last, so they seem pretty close in functionality but not identical (did i miss something?)

A third option is that we could use the 4.5.0 release date, since we know the telemetry doc is "at least that old" ? Nov-20-2023. Maybe that is getting too weird.

I think all three are decent options. You probably know best since you're deeper in the data atm. I'd likely go with NULL just to avoid "magic values" - but have no strong opinion and would respect either choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have issue on cht-core to fix this problem of null telemetry dates, it doesn't really matter once that's fixed on what we've put here for useview_telemetry_devices. For other views, I opted to go with NULL.

@1yuv 1yuv requested a review from kennsippell February 9, 2024 01:32
@1yuv
Copy link
Member Author

1yuv commented Feb 16, 2024

Closing this Pull request without merging since the issue was with Nepali projects only and we wanted to avoid drop cascade. Any projects having this issue can apply changes on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle case when telemetry dates are null
2 participants