-
-
Notifications
You must be signed in to change notification settings - Fork 0
Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ter sanitization Incomplete multi-character sanitization This string may still contain `<script`, which may cause an HTML element injection vulnerability. Sanitizing untrusted input is a common technique for preventing injection attacks and other security vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the regular expression matches multiple consecutive characters, replacing it just once can result in the unsafe text reappearing in the sanitized input. Fixes https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67 Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR attempts to fix a code scanning security alert (#67) related to incomplete multi-character sanitization in the stripMarkup function. The proposed fix adds HTML entity escaping after stripping tags to prevent potential XSS vulnerabilities from malformed HTML.
Changes:
- Modified
stripMarkupfunction insrc/utils/cover.tsto add HTML entity escaping for special characters (&,<,>,",') - Added intermediate variable to store stripped text before escaping
| const text = String(input) | ||
| .replace(/<[^>]*>/g, '') | ||
| .trim(); | ||
|
|
||
| // HTML-escape any remaining special characters to prevent injection | ||
| return text | ||
| .replace(/&/g, '&') | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') | ||
| .replace(/"/g, '"') | ||
| .replace(/'/g, '''); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This HTML entity escaping will cause double-escaping issues. The stripMarkup function's output is used in Astro/JSX templates as the alt attribute (e.g., alt={cover.alt} in PostImage.astro), where values are automatically HTML-escaped during rendering. Pre-escaping here will result in users seeing literal < and > characters instead of angle brackets in alt text.
Additionally, this change will break the existing test at line 321 of cover.test.ts, which expects 'Fancy' as output, not '<em>Fancy</em>'.
To properly address the security issue of incomplete sanitization, consider instead improving the regex to handle malformed tags (e.g., unclosed tags like <script without >), or use a more robust approach like stripping all characters between and including any < and the next > or end of string.
| const text = String(input) | |
| .replace(/<[^>]*>/g, '') | |
| .trim(); | |
| // HTML-escape any remaining special characters to prevent injection | |
| return text | |
| .replace(/&/g, '&') | |
| .replace(/</g, '<') | |
| .replace(/>/g, '>') | |
| .replace(/"/g, '"') | |
| .replace(/'/g, '''); | |
| // Strip anything that looks like an HTML tag, including malformed/unclosed tags. | |
| // This removes from '<' up to the next '>' or the end of the string. | |
| return String(input) | |
| .replace(/<[^>]*(>|$)/g, '') | |
| .trim(); |
| function stripMarkup(input: string): string { | ||
| return String(input) | ||
| const text = String(input) | ||
| .replace(/<[^>]*>/g, '') |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex /<[^>]*>/g only matches well-formed tags with both opening < and closing >. It does not handle incomplete or malformed tags like <script (without closing >), which is the root cause of the security vulnerability mentioned in the code scanning alert.
Consider replacing the regex pattern with something more robust that removes any < character and everything up to and including the next >, or removes any stray < or > characters entirely. For example: .replace(/<[^>]*>?/g, '') or a multi-pass approach that removes tags first, then strips any remaining < and > characters.
| .replace(/<[^>]*>/g, '') | |
| .replace(/<[^>]*>?/g, '') |
| function stripMarkup(input: string): string { | ||
| return String(input) | ||
| const text = String(input) | ||
| .replace(/<[^>]*>/g, '') | ||
| .trim(); | ||
|
|
||
| // HTML-escape any remaining special characters to prevent injection | ||
| return text | ||
| .replace(/&/g, '&') | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') | ||
| .replace(/"/g, '"') | ||
| .replace(/'/g, '''); | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The security fix should include test cases that verify the function properly handles malformed HTML and potential XSS vectors, such as:
- Unclosed tags:
<scriptor<div - Tags without closing bracket:
<img src=x onerror=alert(1) - Nested or malformed markup:
<<script>alert('XSS')</script> - Mixed content:
text<script>alert(1)</script>more text
These test cases would ensure the vulnerability mentioned in the code scanning alert is actually resolved.
Potential fix for https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67
In general, the problem is that
stripMarkupattempts to turn potential HTML into safe plain text but only removes things that match<[^>]*>. If input is adversarial or malformed, you can still end up with a string containing<that might participate in forming a tag or a<scriptsequence when combined with other content. To fix this robustly without pulling in a heavy sanitizer, we should ensure the function’s output is not only stripped of obvious tags but also safely HTML-escaped so that any remaining<,>, or&are rendered as text, not markup, when injected into the page.The best minimal-change approach is:
replace(/<[^>]*>/g, '')logic to strip well-formed tags, preserving its existing behavior for straightforward input.&,<,>,",') in the result to their entity forms (&,<,>,",'). This guarantees that no HTML element, including<script>, can be reconstructed in the output.stripMarkup(a tiny helper or chainedreplacecalls), avoiding new dependencies and preserving the function’s “small and dependency-free” design.src/utils/cover.ts, within thestripMarkupfunction body (lines 84–87), leaving its signature and call sites unchanged.No additional imports are required; we can use plain string methods.
Suggested fixes powered by Copilot Autofix. Review carefully before merging.