Skip to content

fix: remove invalid hostnames #1259 #1260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

lydiapuric
Copy link
Collaborator

@lydiapuric lydiapuric commented Apr 2, 2025

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

Related Issues

#1259
The DaaS team built the Run Helix Query API to load monthly Content Requests (CR) across all hostnames. When the API encounters hostnames containing non-printing or illegal Unicode characters, it returns an empty dataset.

The problem originates from the BigQuery function helix_rum.EVENTS_V5 when using the url parameter set to “-”. This function retrieves data for all hostnames, but many records include invalid hostname values (e.g., hostname?@a.png\u0027"\u003cbmt\u003e for host publish-p23952-e1363387.adobeaemcloud.net).

Thanks for reviewing!

@lydiapuric lydiapuric requested review from trieloff and langswei April 2, 2025 12:50
Copy link

github-actions bot commented Apr 2, 2025

This PR will trigger a patch release when merged.

Copy link
Contributor

@trieloff trieloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what the RF01 is about.

Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (c2a5831) to head (a314eb9).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1260   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          764       764           
=========================================
  Hits           764       764           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lydiapuric
Copy link
Collaborator Author

lydiapuric commented Apr 2, 2025

Seems to be a bug in sqlfluff, see sqlfluff/sqlfluff#6521
RF01 rule should be disabled by default for BigQuery, Databricks, Hive, Redshift, SOQL and SparkSQL due to the support of things like structs and lateral views which trigger false positives.

@langswei
Copy link
Collaborator

langswei commented Apr 7, 2025

The problem originates from the BigQuery function helix_rum.EVENTS_V5

@lydiapuric Then my suggestion is to fix it in EVENTS_V5 instead of in this run-query. That way all queries can benefit from the removal of invalid domains.

@lydiapuric
Copy link
Collaborator Author

@langswei Agree, and I will adjust my PR

@lydiapuric
Copy link
Collaborator Author

Closing this PR in favor of #1263 to implement this check on EVENTS_V5

@lydiapuric lydiapuric closed this Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants