Skip to content

Commit

Permalink
AG-36910 Improve 'href-sanitizer' — add 'removeParam' and 'removeHash…
Browse files Browse the repository at this point in the history
…' values in 'transform' option.

Squashed commit of the following:

commit 737dda8
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 20:26:25 2024 +0300

    add comment

commit da34a66
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 20:04:41 2024 +0300

    update script

commit f3c3616
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 19:49:42 2024 +0300

    fix comments, update script

commit 99439d6
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 15:45:40 2024 +0300

    refactor

commit bafa849
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 15:16:15 2024 +0300

    update compatibility table

commit 83e1d96
Merge: 7fb9cf0 e4cb5f3
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 15:05:16 2024 +0300

    Merge branch 'fix/AG-36910' of ssh://bit.int.agrd.dev:7999/adguard-filters/scriptlets into fix/AG-36910

commit 7fb9cf0
Author: jellizaveta <[email protected]>
Date:   Mon Oct 28 15:01:33 2024 +0300

    update var names, docs, conditions

commit c601f8d
Author: jellizaveta <[email protected]>
Date:   Fri Oct 25 20:37:26 2024 +0300

    update docs

commit ff6f047
Author: jellizaveta <[email protected]>
Date:   Fri Oct 25 20:31:22 2024 +0300

    moved the calculations inside the function

commit e4cb5f3
Author: Slava Leleka <[email protected]>
Date:   Fri Oct 25 20:10:51 2024 +0300

    src/scriptlets/href-sanitizer.ts edited online with Bitbucket

commit 9736333
Author: jellizaveta <[email protected]>
Date:   Fri Oct 25 19:59:04 2024 +0300

    fix docs

commit 1ecfacb
Merge: 5ccadb6 a875fdf
Author: jellizaveta <[email protected]>
Date:   Fri Oct 25 19:47:47 2024 +0300

    merge master, resolve conflicts

commit 5ccadb6
Author: jellizaveta <[email protected]>
Date:   Fri Oct 25 19:37:30 2024 +0300

    AG-36910 Improve 'href-sanitizer' — add 'removeParam' and 'removeHash' values in 'transform' option. #460
  • Loading branch information
jellizaveta committed Oct 30, 2024
1 parent a875fdf commit 33bc1bb
Show file tree
Hide file tree
Showing 4 changed files with 183 additions and 11 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
- `prevent-canvas` scriptlet [#451]
- `parentSelector` option to search for nodes for `remove-node-text` scriptlet [#397]
- `transform` option with `base64decode` value for `href-sanitizer` scriptlet [#455]
- `removeParam` and `removeHash` values in `transform` option for `href-sanitizer` scriptlet [#460]
- new values to `set-cookie` and `set-local-storage-item` scriptlets: `forbidden`, `forever` [#458]

### Changed
Expand All @@ -35,6 +36,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
[#397]: https://github.com/AdguardTeam/Scriptlets/issues/397
[#458]: https://github.com/AdguardTeam/Scriptlets/issues/458
[#457]: https://github.com/AdguardTeam/Scriptlets/issues/457
[#460]: https://github.com/AdguardTeam/Scriptlets/issues/460

## [v1.12.1] - 2024-09-20

Expand Down
2 changes: 1 addition & 1 deletion scripts/compatibility-table.json
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@
},
{
"adg": "set-attr",
"ubo": "set-attr.js"
"ubo": "set-attr.js (removed)"
},
{
"adg": "set-constant",
Expand Down
132 changes: 126 additions & 6 deletions src/scriptlets/href-sanitizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,12 @@ import {
* - `text` — use the text content of the matched element,
* - `[<attribute-name>]` copy the value from attribute `attribute-name` on the same element,
* - `?<parameter-name>` copy the value from URL parameter `parameter-name` of the same element's `href` attribute.
* - `transform` — optional, defaults to no transforming:
* - `transform` — optional, defaults to no transforming. Possible values:
* - `base64decode` — decode the base64 string from specified attribute.
* - `removeHash` — remove the hash from the URL.
* - `removeParam[:<parameters>]` — remove the specified parameters from the URL,
* where `<parameters>` is a comma-separated list of parameter names;
* if no parameter is specified, remove all parameters.
*
* > Note that in the case where the discovered value does not correspond to a valid URL with the appropriate
* > http or https protocols, the value will not be set.
Expand Down Expand Up @@ -111,6 +115,60 @@ import {
* </div>
* ```
*
* 5. Remove the hash from the URL:
*
* ```adblock
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeHash')
* ```
*
* ```html
* <!-- before -->
* <div>
* <a href="http://www.foo.com/out/#aHR0cDovL2V4YW1wbGUuY29tLz92PTEyMw=="></a>
* </div>
*
* <!-- after -->
* <div>
* <a href="http://www.foo.com/out/"></a>
* </div>
* ```
*
* 6. Remove the all parameter(s) from the URL:
*
* ```adblock
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeParam')
* ```
*
* ```html
* <!-- before -->
* <div>
* <a href="https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main"></a>
* </div>
*
* <!-- after -->
* <div>
* <a href="https://foo.com/123123"></a>
* </div>
* ```
*
* 7. Remove the specified parameter(s) from the URL:
*
* ```adblock
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeParam:utm_source,utm_medium')
* ```
*
* ```html
* <!-- before -->
* <div>
* <a href="https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main"></a>
* </div>
*
* <!-- after -->
* <div>
* <a href="https://foo.com/123123?utm_campaign=main"></a>
* </div>
* ```
*
* @added v1.10.25.
*/

Expand All @@ -125,7 +183,13 @@ export function hrefSanitizer(
return;
}

const BASE64_TRANSFORM_MARKER = 'base64decode';
// transform markers
const BASE64_DECODE_TRANSFORM_MARKER = 'base64decode';
const REMOVE_HASH_TRANSFORM_MARKER = 'removeHash';
const REMOVE_PARAM_TRANSFORM_MARKER = 'removeParam';
// separator markers
const MARKER_SEPARATOR = ':';
const COMMA = ',';

// Regular expression to find not valid characters at the beginning and at the end of the string,
// \x21-\x7e is a range that includes the ASCII characters from ! (hex 21) to ~ (hex 7E).
Expand Down Expand Up @@ -337,14 +401,64 @@ export function hrefSanitizer(
return validEncodedHash ? decodeBase64SeveralTimes(validEncodedHash, DECODE_ATTEMPTS_NUMBER) : '';
};

/**
* Removes the hash from the URL.
* @param url URL to remove the hash from
* @returns URL without the hash or empty string if no hash is found
*/
const removeHash = (url: string) => {
const urlObj = new URL(url, window.location.origin);

if (!urlObj.hash) {
return '';
}

urlObj.hash = '';
return urlObj.toString();
};

/**
* Removes the specified parameter from the URL.
* @param url URL to remove the parameter from
* @param transformValue parameter value(s) to remove with marker
* @returns URL without the parameter(s) or empty string if no parameter is found
*/
const removeParam = (url: string, transformValue: string) => {
const urlObj = new URL(url, window.location.origin);

// get the parameter values to remove
const paramNamesToRemoveStr = transformValue.split(MARKER_SEPARATOR)[1];

if (!paramNamesToRemoveStr) {
urlObj.search = '';
return urlObj.toString();
}

const initSearchParamsLength = urlObj.searchParams.toString().length;

const removeParams = paramNamesToRemoveStr.split(COMMA);
removeParams.forEach((param) => {
if (urlObj.searchParams.has(param)) {
urlObj.searchParams.delete(param);
}
});

// if the parameter(s) is not found, return empty string
if (initSearchParamsLength === urlObj.searchParams.toString().length) {
return '';
}

return urlObj.toString();
};

/**
* Extracts the base64 part from a string.
* If no base64 string is found, `null` is returned.
* @param url String to extract the base64 part from.
* @returns The base64 part of the string, or `null` if none is found.
*/
const decodeBase64URL = (url: string) => {
const { search, hash } = new URL(url);
const { search, hash } = new URL(url, document.location.href);

if (search.length > 0) {
return decodeSearchString(search);
Expand Down Expand Up @@ -394,13 +508,19 @@ export function hrefSanitizer(
return;
}
let newHref = extractNewHref(elem, attribute);

// apply transform if specified
if (transform) {
switch (transform) {
case BASE64_TRANSFORM_MARKER:
switch (true) {
case transform === BASE64_DECODE_TRANSFORM_MARKER:
newHref = base64Decode(newHref);
break;
case transform === REMOVE_HASH_TRANSFORM_MARKER:
newHref = removeHash(newHref);
break;
case transform.startsWith(REMOVE_PARAM_TRANSFORM_MARKER): {
newHref = removeParam(newHref, transform);
break;
}
default:
logMessage(source, `Invalid transform option: "${transform}"`);
return;
Expand Down
58 changes: 54 additions & 4 deletions tests/scriptlets/href-sanitizer.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ const createElem = (href, text, attributeName, attributeValue) => {
};

const removeElem = () => {
const elem = document.getElementById('testHref');
if (elem) {
elem.remove();
}
const elem = document.querySelectorAll('#testHref');
elem.forEach((el) => {
if (el) {
el.remove();
}
});
};

const beforeEach = () => {
Expand Down Expand Up @@ -64,6 +66,54 @@ test('Checking if alias name works', (assert) => {
assert.strictEqual(codeByAdgParams, codeByUboParams, 'ubo name - ok');
});

test('Sanitize href - remove all parameters from href', (assert) => {
const expectedHref = 'https://foo.com/123123';
const elem = createElem('https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main');
const selector = 'a[href^="https://foo.com/123123"]';

const scriptletArgs = [selector, '[href]', 'removeParam'];
runScriptlet(name, scriptletArgs);

assert.strictEqual(elem.getAttribute('href'), expectedHref, 'all params from href was removed');
assert.strictEqual(window.hit, 'FIRED');
});

test('Sanitize href - remove parameters from href', (assert) => {
const expectedHref = 'https://foo.com/watch?utm_campaign=main';
const elem = createElem('https://foo.com/watch?v=dbjPnXaacAU&pp=ygUEdGVzdA%3D%3D&utm_campaign=main');
const selector = 'a[href^="https://foo.com/watch"]';

const scriptletArgs = [selector, '[href]', 'removeParam:v,pp'];
runScriptlet(name, scriptletArgs);

assert.strictEqual(elem.getAttribute('href'), expectedHref, 'v and pp params from href was removed');
assert.strictEqual(window.hit, 'FIRED');
});

test('Sanitize href - remove parameter from href', (assert) => {
const expectedHref = 'https://example.org/watch?v=dbjPnXaacAU';
const elem = createElem('https://example.org/watch?v=dbjPnXaacAU&pp=ygUEdGVzdA%3D%3D');
const selector = 'a[href^="https://example.org/watch"]';

const scriptletArgs = [selector, '[href]', 'removeParam:pp'];
runScriptlet(name, scriptletArgs);

assert.strictEqual(elem.getAttribute('href'), expectedHref, 'pp param from href was removed');
assert.strictEqual(window.hit, 'FIRED');
});

test('Sanitize href - remove hash', (assert) => {
const expectedHref = 'https://example.org/?article';
const elem = createElem('https://example.org/?article#utm_source=Facebook');
const selector = 'a[href]';

const scriptletArgs = [selector, '[href]', 'removeHash'];
runScriptlet(name, scriptletArgs);

assert.strictEqual(elem.getAttribute('href'), expectedHref, 'hash from href was removed');
assert.strictEqual(window.hit, 'FIRED');
});

test('Sanitize href - no URL was found in base64', (assert) => {
// encoded string is 'some text, no urls'
const hrefWithBase64 = 'http://foo.com/#c29tZSB0ZXh0LCBubyB1cmxz';
Expand Down

0 comments on commit 33bc1bb

Please sign in to comment.