Skip to content

Conversation

@davidsneighbour
Copy link
Owner

Potential fix for https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67

In general, the problem is that stripMarkup attempts to turn potential HTML into safe plain text but only removes things that match <[^>]*>. If input is adversarial or malformed, you can still end up with a string containing < that might participate in forming a tag or a <script sequence when combined with other content. To fix this robustly without pulling in a heavy sanitizer, we should ensure the function’s output is not only stripped of obvious tags but also safely HTML-escaped so that any remaining <, >, or & are rendered as text, not markup, when injected into the page.

The best minimal-change approach is:

  1. Keep the current replace(/<[^>]*>/g, '') logic to strip well-formed tags, preserving its existing behavior for straightforward input.
  2. After trimming, escape special HTML characters (&, <, >, ", ') in the result to their entity forms (&amp;, &lt;, &gt;, &quot;, &#39;). This guarantees that no HTML element, including <script>, can be reconstructed in the output.
  3. Implement the escaping inline in stripMarkup (a tiny helper or chained replace calls), avoiding new dependencies and preserving the function’s “small and dependency-free” design.
  4. Make all changes only inside src/utils/cover.ts, within the stripMarkup function body (lines 84–87), leaving its signature and call sites unchanged.

No additional imports are required; we can use plain string methods.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

…ter sanitization

Incomplete multi-character sanitization

This string may still contain `<script`, which may cause an HTML element injection vulnerability.

Sanitizing untrusted input is a common technique for preventing injection attacks and other security vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the regular expression matches multiple consecutive characters, replacing it just once can result in the unsafe text reappearing in the sanitized input.

Fixes https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@davidsneighbour davidsneighbour marked this pull request as ready for review January 12, 2026 21:54
Copilot AI review requested due to automatic review settings January 12, 2026 21:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix a code scanning security alert (#67) related to incomplete multi-character sanitization in the stripMarkup function. The proposed fix adds HTML entity escaping after stripping tags to prevent potential XSS vulnerabilities from malformed HTML.

Changes:

  • Modified stripMarkup function in src/utils/cover.ts to add HTML entity escaping for special characters (&, <, >, ", ')
  • Added intermediate variable to store stripped text before escaping

Comment on lines +85 to +95
const text = String(input)
.replace(/<[^>]*>/g, '')
.trim();

// HTML-escape any remaining special characters to prevent injection
return text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;');
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This HTML entity escaping will cause double-escaping issues. The stripMarkup function's output is used in Astro/JSX templates as the alt attribute (e.g., alt={cover.alt} in PostImage.astro), where values are automatically HTML-escaped during rendering. Pre-escaping here will result in users seeing literal &lt; and &gt; characters instead of angle brackets in alt text.

Additionally, this change will break the existing test at line 321 of cover.test.ts, which expects 'Fancy' as output, not '&lt;em&gt;Fancy&lt;/em&gt;'.

To properly address the security issue of incomplete sanitization, consider instead improving the regex to handle malformed tags (e.g., unclosed tags like <script without >), or use a more robust approach like stripping all characters between and including any < and the next > or end of string.

Suggested change
const text = String(input)
.replace(/<[^>]*>/g, '')
.trim();
// HTML-escape any remaining special characters to prevent injection
return text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;');
// Strip anything that looks like an HTML tag, including malformed/unclosed tags.
// This removes from '<' up to the next '>' or the end of the string.
return String(input)
.replace(/<[^>]*(>|$)/g, '')
.trim();

Copilot uses AI. Check for mistakes.
function stripMarkup(input: string): string {
return String(input)
const text = String(input)
.replace(/<[^>]*>/g, '')
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex /<[^>]*>/g only matches well-formed tags with both opening < and closing >. It does not handle incomplete or malformed tags like <script (without closing >), which is the root cause of the security vulnerability mentioned in the code scanning alert.

Consider replacing the regex pattern with something more robust that removes any < character and everything up to and including the next >, or removes any stray < or > characters entirely. For example: .replace(/<[^>]*>?/g, '') or a multi-pass approach that removes tags first, then strips any remaining < and > characters.

Suggested change
.replace(/<[^>]*>/g, '')
.replace(/<[^>]*>?/g, '')

Copilot uses AI. Check for mistakes.
Comment on lines 84 to 96
function stripMarkup(input: string): string {
return String(input)
const text = String(input)
.replace(/<[^>]*>/g, '')
.trim();

// HTML-escape any remaining special characters to prevent injection
return text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;');
}
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security fix should include test cases that verify the function properly handles malformed HTML and potential XSS vectors, such as:

  • Unclosed tags: <script or <div
  • Tags without closing bracket: <img src=x onerror=alert(1)
  • Nested or malformed markup: <<script>alert('XSS')</script>
  • Mixed content: text<script>alert(1)</script>more text

These test cases would ensure the vulnerability mentioned in the code scanning alert is actually resolved.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants