Skip to content

Commit 1de3e88

Browse files
committed
AG-36282 Add 'transform' option with 'base64decode' value for 'href-sanitizer' scriptlet
Squashed commit of the following: commit 66df93f Author: jellizaveta <[email protected]> Date: Wed Oct 16 13:10:27 2024 +0300 fix description, update script commit 8d6f7f0 Author: jellizaveta <[email protected]> Date: Tue Oct 15 21:22:36 2024 +0300 simplify code, add constant commit 6f6d257 Author: jellizaveta <[email protected]> Date: Tue Oct 15 20:21:22 2024 +0300 fix name, comment, simplify code commit cd29a52 Author: Slava Leleka <[email protected]> Date: Tue Oct 15 19:50:38 2024 +0300 src/scriptlets/href-sanitizer.ts edited online with Bitbucket commit 6656964 Author: Slava Leleka <[email protected]> Date: Tue Oct 15 19:50:31 2024 +0300 src/scriptlets/href-sanitizer.ts edited online with Bitbucket commit cc14769 Author: Slava Leleka <[email protected]> Date: Tue Oct 15 19:50:26 2024 +0300 src/scriptlets/href-sanitizer.ts edited online with Bitbucket commit 82ee60f Merge: ffcfd67 901cb2e Author: jellizaveta <[email protected]> Date: Tue Oct 15 16:13:54 2024 +0300 merge master commit ffcfd67 Merge: 83ee0aa dac16b7 Author: jellizaveta <[email protected]> Date: Mon Oct 14 17:08:24 2024 +0300 resolve conflicts commit 83ee0aa Author: jellizaveta <[email protected]> Date: Mon Oct 14 17:03:26 2024 +0300 add the ability to decode base64 string multiple times, fix typo, decode search params, add tests commit 436985f Author: jellizaveta <[email protected]> Date: Mon Oct 14 13:19:25 2024 +0300 fix typo commit 5eb91b9 Author: jellizaveta <[email protected]> Date: Fri Oct 11 21:40:00 2024 +0300 fix indent in docs commit f8384c6 Merge: c81eeb6 0ca98f3 Author: jellizaveta <[email protected]> Date: Fri Oct 11 20:18:16 2024 +0300 merge master commit c81eeb6 Author: jellizaveta <[email protected]> Date: Fri Oct 11 20:08:58 2024 +0300 AG-36282 Add 'transform' option with 'base64decode' value for 'href-sanitizer' scriptlet. #455
1 parent 901cb2e commit 1de3e88

File tree

3 files changed

+334
-14
lines changed

3 files changed

+334
-14
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
1616

1717
- `prevent-canvas` scriptlet [#451]
1818
- `parentSelector` option to search for nodes for `remove-node-text` scriptlet [#397]
19+
- `transform` option with `base64decode` value for `href-sanitizer` scriptlet [#455]
1920
- new values to `set-cookie` and `set-local-storage-item` scriptlets: `forbidden`, `forever` [#458]
2021

2122
### Changed
@@ -27,6 +28,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
2728
[Unreleased]: https://github.com/AdguardTeam/Scriptlets/compare/v1.12.1...HEAD
2829
[#451]: https://github.com/AdguardTeam/Scriptlets/issues/451
2930
[#415]: https://github.com/AdguardTeam/Scriptlets/issues/415
31+
[#455]: https://github.com/AdguardTeam/Scriptlets/issues/455
3032
[#414]: https://github.com/AdguardTeam/Scriptlets/issues/414
3133
[#441]: https://github.com/AdguardTeam/Scriptlets/issues/441
3234
[#397]: https://github.com/AdguardTeam/Scriptlets/issues/397

src/scriptlets/href-sanitizer.ts

Lines changed: 214 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,20 @@ import {
2222
* ### Syntax
2323
*
2424
* ```text
25-
* example.org#%#//scriptlet('href-sanitizer', selector[, attribute])
25+
* example.org#%#//scriptlet('href-sanitizer', selector[, attribute, [ transform]])
2626
* ```
2727
*
2828
* - `selector` — required, a CSS selector to match the elements to be sanitized,
2929
* which should be anchor elements (`<a>`) with `href` attribute.
3030
* - `attribute` — optional, default to `text`:
3131
* - `text` — use the text content of the matched element,
32-
* - `[attribute-name]` copy the value from attribute `attribute-name` on the same element,
33-
* - `?parameter` copy the value from URL parameter `parameter` of the same element's `href` attribute.
32+
* - `[<attribute-name>]` copy the value from attribute `attribute-name` on the same element,
33+
* - `?<parameter-name>` copy the value from URL parameter `parameter-name` of the same element's `href` attribute.
34+
* - `transform` — optional, defaults to no transforming:
35+
* - `base64decode` — decode the base64 string from specified attribute.
36+
*
37+
* > Note that in the case where the discovered value does not correspond to a valid URL with the appropriate
38+
* > http or https protocols, the value will not be set.
3439
*
3540
* ### Examples
3641
*
@@ -88,19 +93,40 @@ import {
8893
* </div>
8994
* ```
9095
*
96+
* 4. Decode the base64 string from specified attribute:
97+
*
98+
* ```adblock
99+
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'base64decode')
100+
* ```
101+
*
102+
* ```html
103+
* <!-- before -->
104+
* <div>
105+
* <a href="http://www.foo.com/out/?aHR0cDovL2V4YW1wbGUuY29tLz92PTEyMw=="></a>
106+
* </div>
107+
*
108+
* <!-- after -->
109+
* <div>
110+
* <a href="http://example.com/?v=123"></a>
111+
* </div>
112+
* ```
113+
*
91114
* @added v1.10.25.
92115
*/
93116

94117
export function hrefSanitizer(
95118
source: Source,
96119
selector: string,
97120
attribute = 'text',
121+
transform = '',
98122
) {
99123
if (!selector) {
100124
logMessage(source, 'Selector is required.');
101125
return;
102126
}
103127

128+
const BASE64_TRANSFORM_MARKER = 'base64decode';
129+
104130
// Regular expression to find not valid characters at the beginning and at the end of the string,
105131
// \x21-\x7e is a range that includes the ASCII characters from ! (hex 21) to ~ (hex 7E).
106132
// This range covers numbers, English letters, and common symbols.
@@ -144,6 +170,21 @@ export function hrefSanitizer(
144170
return '';
145171
};
146172

173+
/**
174+
* Validates whether a given string is a URL.
175+
*
176+
* @param url The URL string to validate.
177+
* @returns `true` if the string is a valid URL, otherwise `false`.
178+
*/
179+
const isValidURL = (url: string): boolean => {
180+
try {
181+
new URL(url);
182+
return true;
183+
} catch {
184+
return false;
185+
}
186+
};
187+
147188
/**
148189
* Validates a URL, if valid return URL,
149190
* otherwise return null.
@@ -177,6 +218,161 @@ export function hrefSanitizer(
177218
return element.nodeName.toLowerCase() === 'a' && element.hasAttribute('href');
178219
};
179220

221+
/**
222+
* Recursively searches for the first valid URL within a nested object.
223+
*
224+
* @param obj The object to search for URLs.
225+
* @returns The first found URL as a string, or `null` if none are found.
226+
*/
227+
const extractURLFromObject = (obj: Record<string, unknown>): string | null => {
228+
for (const key in obj) {
229+
if (!Object.prototype.hasOwnProperty.call(obj, key)) {
230+
continue;
231+
}
232+
233+
const value = obj[key];
234+
235+
if (typeof value === 'string' && isValidURL(value)) {
236+
return value;
237+
}
238+
239+
if (typeof value === 'object' && value !== null) {
240+
const result = extractURLFromObject(value as Record<string, unknown>);
241+
if (result) {
242+
return result;
243+
}
244+
}
245+
}
246+
247+
return null;
248+
};
249+
250+
/**
251+
* Checks if the given content has object format.
252+
* @param content The content to check.
253+
* @returns `true` if the content has object format, `false` otherwise.
254+
*/
255+
const isStringifiedObject = (content: string) => content.startsWith('{') && content.endsWith('}');
256+
257+
/**
258+
* Decodes a base64 string several times. If the result is a valid URL, it is returned.
259+
* If the result is a JSON object, the first valid URL within the object is returned.
260+
* @param text The base64 string to decode.
261+
* @param times The number of times to decode the base64 string.
262+
* @returns Decoded base64 string or empty string if no valid URL is found.
263+
*/
264+
const decodeBase64SeveralTimes = (text: string, times: number): string | null => {
265+
let result = text;
266+
for (let i = 0; i < times; i += 1) {
267+
try {
268+
result = atob(result);
269+
} catch (e) {
270+
// Not valid base64 string
271+
if (result === text) {
272+
return '';
273+
}
274+
}
275+
}
276+
// if found valid URL, return it
277+
if (isValidURL(result)) {
278+
return result;
279+
}
280+
// if the result is an object, try to extract URL from it
281+
if (isStringifiedObject(result)) {
282+
try {
283+
const parsedResult = JSON.parse(result);
284+
return extractURLFromObject(parsedResult);
285+
} catch (ex) {
286+
return '';
287+
}
288+
}
289+
logMessage(source, `Failed to decode base64 string: ${text}`);
290+
return '';
291+
};
292+
293+
// URL components markers
294+
const SEARCH_QUERY_MARKER = '?';
295+
const SEARCH_PARAMS_MARKER = '&';
296+
const HASHBANG_MARKER = '#!';
297+
const ANCHOR_MARKER = '#';
298+
// decode attempts for base64 string
299+
const DECODE_ATTEMPTS_NUMBER = 10;
300+
301+
/**
302+
* Decodes the search string by removing the search query marker and decoding the base64 string.
303+
* @param search Search string to decode
304+
* @returns Decoded search string or empty string if no valid URL is found
305+
*/
306+
const decodeSearchString = (search: string) => {
307+
const searchString = search.replace(SEARCH_QUERY_MARKER, '');
308+
let decodedParam;
309+
let validEncodedParam;
310+
if (searchString.includes(SEARCH_PARAMS_MARKER)) {
311+
const searchParamsArray = searchString.split(SEARCH_PARAMS_MARKER);
312+
searchParamsArray.forEach((param) => {
313+
decodedParam = decodeBase64SeveralTimes(param, DECODE_ATTEMPTS_NUMBER);
314+
if (decodedParam && decodedParam.length > 0) {
315+
validEncodedParam = decodedParam;
316+
}
317+
});
318+
return validEncodedParam;
319+
}
320+
return decodeBase64SeveralTimes(searchString, DECODE_ATTEMPTS_NUMBER);
321+
};
322+
323+
/**
324+
* Decodes the hash string by removing the hashbang or anchor marker and decoding the base64 string.
325+
* @param hash Hash string to decode
326+
* @returns Decoded hash string or empty string if no valid URL is found
327+
*/
328+
const decodeHashString = (hash: string) => {
329+
let validEncodedHash = '';
330+
331+
if (hash.includes(HASHBANG_MARKER)) {
332+
validEncodedHash = hash.replace(HASHBANG_MARKER, '');
333+
} else if (hash.includes(ANCHOR_MARKER)) {
334+
validEncodedHash = hash.replace(ANCHOR_MARKER, '');
335+
}
336+
337+
return validEncodedHash ? decodeBase64SeveralTimes(validEncodedHash, DECODE_ATTEMPTS_NUMBER) : '';
338+
};
339+
340+
/**
341+
* Extracts the base64 part from a string.
342+
* If no base64 string is found, `null` is returned.
343+
* @param url String to extract the base64 part from.
344+
* @returns The base64 part of the string, or `null` if none is found.
345+
*/
346+
const decodeBase64URL = (url: string) => {
347+
const { search, hash } = new URL(url);
348+
349+
if (search.length > 0) {
350+
return decodeSearchString(search);
351+
}
352+
353+
if (hash.length > 0) {
354+
return decodeHashString(hash);
355+
}
356+
357+
logMessage(source, `Failed to execute base64 from URL: ${url}`);
358+
return null;
359+
};
360+
361+
/**
362+
* Decodes a base64 string from the given href.
363+
* If the href is a valid URL, the base64 string is decoded.
364+
* If the href is not a valid URL, the base64 string is decoded several times.
365+
* @param href The href to decode.
366+
* @returns The decoded base64 string.
367+
*/
368+
const base64Decode = (href: string): string => {
369+
if (isValidURL(href)) {
370+
return decodeBase64URL(href) || '';
371+
}
372+
373+
return decodeBase64SeveralTimes(href, DECODE_ATTEMPTS_NUMBER) || '';
374+
};
375+
180376
/**
181377
* Sanitizes the href attribute of elements matching the given selector.
182378
*
@@ -194,9 +390,23 @@ export function hrefSanitizer(
194390
elements.forEach((elem) => {
195391
try {
196392
if (!isSanitizableAnchor(elem)) {
393+
logMessage(source, `${elem} is not a valid element to sanitize`);
197394
return;
198395
}
199-
const newHref = extractNewHref(elem, attribute);
396+
let newHref = extractNewHref(elem, attribute);
397+
398+
// apply transform if specified
399+
if (transform) {
400+
switch (transform) {
401+
case BASE64_TRANSFORM_MARKER:
402+
newHref = base64Decode(newHref);
403+
break;
404+
default:
405+
logMessage(source, `Invalid transform option: "${transform}"`);
406+
return;
407+
}
408+
}
409+
200410
const newValidHref = getValidURL(newHref);
201411
if (!newValidHref) {
202412
logMessage(source, `Invalid URL: ${newHref}`);

0 commit comments

Comments
 (0)