Skip to content

Commit

Permalink
AG-36282 Add 'transform' option with 'base64decode' value for 'href-s…
Browse files Browse the repository at this point in the history
…anitizer' scriptlet

Squashed commit of the following:

commit 66df93f
Author: jellizaveta <[email protected]>
Date:   Wed Oct 16 13:10:27 2024 +0300

    fix description, update script

commit 8d6f7f0
Author: jellizaveta <[email protected]>
Date:   Tue Oct 15 21:22:36 2024 +0300

    simplify code, add constant

commit 6f6d257
Author: jellizaveta <[email protected]>
Date:   Tue Oct 15 20:21:22 2024 +0300

    fix name, comment, simplify code

commit cd29a52
Author: Slava Leleka <[email protected]>
Date:   Tue Oct 15 19:50:38 2024 +0300

    src/scriptlets/href-sanitizer.ts edited online with Bitbucket

commit 6656964
Author: Slava Leleka <[email protected]>
Date:   Tue Oct 15 19:50:31 2024 +0300

    src/scriptlets/href-sanitizer.ts edited online with Bitbucket

commit cc14769
Author: Slava Leleka <[email protected]>
Date:   Tue Oct 15 19:50:26 2024 +0300

    src/scriptlets/href-sanitizer.ts edited online with Bitbucket

commit 82ee60f
Merge: ffcfd67 901cb2e
Author: jellizaveta <[email protected]>
Date:   Tue Oct 15 16:13:54 2024 +0300

    merge master

commit ffcfd67
Merge: 83ee0aa dac16b7
Author: jellizaveta <[email protected]>
Date:   Mon Oct 14 17:08:24 2024 +0300

    resolve conflicts

commit 83ee0aa
Author: jellizaveta <[email protected]>
Date:   Mon Oct 14 17:03:26 2024 +0300

    add the ability to decode base64 string multiple times, fix typo, decode search params, add tests

commit 436985f
Author: jellizaveta <[email protected]>
Date:   Mon Oct 14 13:19:25 2024 +0300

    fix typo

commit 5eb91b9
Author: jellizaveta <[email protected]>
Date:   Fri Oct 11 21:40:00 2024 +0300

    fix indent in docs

commit f8384c6
Merge: c81eeb6 0ca98f3
Author: jellizaveta <[email protected]>
Date:   Fri Oct 11 20:18:16 2024 +0300

    merge master

commit c81eeb6
Author: jellizaveta <[email protected]>
Date:   Fri Oct 11 20:08:58 2024 +0300

    AG-36282 Add 'transform' option with 'base64decode'  value for 'href-sanitizer' scriptlet. #455
  • Loading branch information
jellizaveta committed Oct 16, 2024
1 parent 901cb2e commit 1de3e88
Show file tree
Hide file tree
Showing 3 changed files with 334 additions and 14 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic

- `prevent-canvas` scriptlet [#451]
- `parentSelector` option to search for nodes for `remove-node-text` scriptlet [#397]
- `transform` option with `base64decode` value for `href-sanitizer` scriptlet [#455]
- new values to `set-cookie` and `set-local-storage-item` scriptlets: `forbidden`, `forever` [#458]

### Changed
Expand All @@ -27,6 +28,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
[Unreleased]: https://github.com/AdguardTeam/Scriptlets/compare/v1.12.1...HEAD
[#451]: https://github.com/AdguardTeam/Scriptlets/issues/451
[#415]: https://github.com/AdguardTeam/Scriptlets/issues/415
[#455]: https://github.com/AdguardTeam/Scriptlets/issues/455
[#414]: https://github.com/AdguardTeam/Scriptlets/issues/414
[#441]: https://github.com/AdguardTeam/Scriptlets/issues/441
[#397]: https://github.com/AdguardTeam/Scriptlets/issues/397
Expand Down
218 changes: 214 additions & 4 deletions src/scriptlets/href-sanitizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,20 @@ import {
* ### Syntax
*
* ```text
* example.org#%#//scriptlet('href-sanitizer', selector[, attribute])
* example.org#%#//scriptlet('href-sanitizer', selector[, attribute, [ transform]])
* ```
*
* - `selector` — required, a CSS selector to match the elements to be sanitized,
* which should be anchor elements (`<a>`) with `href` attribute.
* - `attribute` — optional, default to `text`:
* - `text` — use the text content of the matched element,
* - `[attribute-name]` copy the value from attribute `attribute-name` on the same element,
* - `?parameter` copy the value from URL parameter `parameter` of the same element's `href` attribute.
* - `[<attribute-name>]` copy the value from attribute `attribute-name` on the same element,
* - `?<parameter-name>` copy the value from URL parameter `parameter-name` of the same element's `href` attribute.
* - `transform` — optional, defaults to no transforming:
* - `base64decode` — decode the base64 string from specified attribute.
*
* > Note that in the case where the discovered value does not correspond to a valid URL with the appropriate
* > http or https protocols, the value will not be set.
*
* ### Examples
*
Expand Down Expand Up @@ -88,19 +93,40 @@ import {
* </div>
* ```
*
* 4. Decode the base64 string from specified attribute:
*
* ```adblock
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'base64decode')
* ```
*
* ```html
* <!-- before -->
* <div>
* <a href="http://www.foo.com/out/?aHR0cDovL2V4YW1wbGUuY29tLz92PTEyMw=="></a>
* </div>
*
* <!-- after -->
* <div>
* <a href="http://example.com/?v=123"></a>
* </div>
* ```
*
* @added v1.10.25.
*/

export function hrefSanitizer(
source: Source,
selector: string,
attribute = 'text',
transform = '',
) {
if (!selector) {
logMessage(source, 'Selector is required.');
return;
}

const BASE64_TRANSFORM_MARKER = 'base64decode';

// Regular expression to find not valid characters at the beginning and at the end of the string,
// \x21-\x7e is a range that includes the ASCII characters from ! (hex 21) to ~ (hex 7E).
// This range covers numbers, English letters, and common symbols.
Expand Down Expand Up @@ -144,6 +170,21 @@ export function hrefSanitizer(
return '';
};

/**
* Validates whether a given string is a URL.
*
* @param url The URL string to validate.
* @returns `true` if the string is a valid URL, otherwise `false`.
*/
const isValidURL = (url: string): boolean => {
try {
new URL(url);
return true;
} catch {
return false;
}
};

/**
* Validates a URL, if valid return URL,
* otherwise return null.
Expand Down Expand Up @@ -177,6 +218,161 @@ export function hrefSanitizer(
return element.nodeName.toLowerCase() === 'a' && element.hasAttribute('href');
};

/**
* Recursively searches for the first valid URL within a nested object.
*
* @param obj The object to search for URLs.
* @returns The first found URL as a string, or `null` if none are found.
*/
const extractURLFromObject = (obj: Record<string, unknown>): string | null => {
for (const key in obj) {
if (!Object.prototype.hasOwnProperty.call(obj, key)) {
continue;
}

const value = obj[key];

if (typeof value === 'string' && isValidURL(value)) {
return value;
}

if (typeof value === 'object' && value !== null) {
const result = extractURLFromObject(value as Record<string, unknown>);
if (result) {
return result;
}
}
}

return null;
};

/**
* Checks if the given content has object format.
* @param content The content to check.
* @returns `true` if the content has object format, `false` otherwise.
*/
const isStringifiedObject = (content: string) => content.startsWith('{') && content.endsWith('}');

/**
* Decodes a base64 string several times. If the result is a valid URL, it is returned.
* If the result is a JSON object, the first valid URL within the object is returned.
* @param text The base64 string to decode.
* @param times The number of times to decode the base64 string.
* @returns Decoded base64 string or empty string if no valid URL is found.
*/
const decodeBase64SeveralTimes = (text: string, times: number): string | null => {
let result = text;
for (let i = 0; i < times; i += 1) {
try {
result = atob(result);
} catch (e) {
// Not valid base64 string
if (result === text) {
return '';
}
}
}
// if found valid URL, return it
if (isValidURL(result)) {
return result;
}
// if the result is an object, try to extract URL from it
if (isStringifiedObject(result)) {
try {
const parsedResult = JSON.parse(result);
return extractURLFromObject(parsedResult);
} catch (ex) {
return '';
}
}
logMessage(source, `Failed to decode base64 string: ${text}`);
return '';
};

// URL components markers
const SEARCH_QUERY_MARKER = '?';
const SEARCH_PARAMS_MARKER = '&';
const HASHBANG_MARKER = '#!';
const ANCHOR_MARKER = '#';
// decode attempts for base64 string
const DECODE_ATTEMPTS_NUMBER = 10;

/**
* Decodes the search string by removing the search query marker and decoding the base64 string.
* @param search Search string to decode
* @returns Decoded search string or empty string if no valid URL is found
*/
const decodeSearchString = (search: string) => {
const searchString = search.replace(SEARCH_QUERY_MARKER, '');
let decodedParam;
let validEncodedParam;
if (searchString.includes(SEARCH_PARAMS_MARKER)) {
const searchParamsArray = searchString.split(SEARCH_PARAMS_MARKER);
searchParamsArray.forEach((param) => {
decodedParam = decodeBase64SeveralTimes(param, DECODE_ATTEMPTS_NUMBER);
if (decodedParam && decodedParam.length > 0) {
validEncodedParam = decodedParam;
}
});
return validEncodedParam;
}
return decodeBase64SeveralTimes(searchString, DECODE_ATTEMPTS_NUMBER);
};

/**
* Decodes the hash string by removing the hashbang or anchor marker and decoding the base64 string.
* @param hash Hash string to decode
* @returns Decoded hash string or empty string if no valid URL is found
*/
const decodeHashString = (hash: string) => {
let validEncodedHash = '';

if (hash.includes(HASHBANG_MARKER)) {
validEncodedHash = hash.replace(HASHBANG_MARKER, '');
} else if (hash.includes(ANCHOR_MARKER)) {
validEncodedHash = hash.replace(ANCHOR_MARKER, '');
}

return validEncodedHash ? decodeBase64SeveralTimes(validEncodedHash, DECODE_ATTEMPTS_NUMBER) : '';
};

/**
* Extracts the base64 part from a string.
* If no base64 string is found, `null` is returned.
* @param url String to extract the base64 part from.
* @returns The base64 part of the string, or `null` if none is found.
*/
const decodeBase64URL = (url: string) => {
const { search, hash } = new URL(url);

if (search.length > 0) {
return decodeSearchString(search);
}

if (hash.length > 0) {
return decodeHashString(hash);
}

logMessage(source, `Failed to execute base64 from URL: ${url}`);
return null;
};

/**
* Decodes a base64 string from the given href.
* If the href is a valid URL, the base64 string is decoded.
* If the href is not a valid URL, the base64 string is decoded several times.
* @param href The href to decode.
* @returns The decoded base64 string.
*/
const base64Decode = (href: string): string => {
if (isValidURL(href)) {
return decodeBase64URL(href) || '';
}

return decodeBase64SeveralTimes(href, DECODE_ATTEMPTS_NUMBER) || '';
};

/**
* Sanitizes the href attribute of elements matching the given selector.
*
Expand All @@ -194,9 +390,23 @@ export function hrefSanitizer(
elements.forEach((elem) => {
try {
if (!isSanitizableAnchor(elem)) {
logMessage(source, `${elem} is not a valid element to sanitize`);
return;
}
const newHref = extractNewHref(elem, attribute);
let newHref = extractNewHref(elem, attribute);

// apply transform if specified
if (transform) {
switch (transform) {
case BASE64_TRANSFORM_MARKER:
newHref = base64Decode(newHref);
break;
default:
logMessage(source, `Invalid transform option: "${transform}"`);
return;
}
}

const newValidHref = getValidURL(newHref);
if (!newValidHref) {
logMessage(source, `Invalid URL: ${newHref}`);
Expand Down
Loading

0 comments on commit 1de3e88

Please sign in to comment.