Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513

davidsneighbour · 2026-01-12T21:54:12Z

Potential fix for https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67

In general, the problem is that stripMarkup attempts to turn potential HTML into safe plain text but only removes things that match <[^>]*>. If input is adversarial or malformed, you can still end up with a string containing < that might participate in forming a tag or a <script sequence when combined with other content. To fix this robustly without pulling in a heavy sanitizer, we should ensure the function’s output is not only stripped of obvious tags but also safely HTML-escaped so that any remaining <, >, or & are rendered as text, not markup, when injected into the page.

The best minimal-change approach is:

Keep the current replace(/<[^>]*>/g, '') logic to strip well-formed tags, preserving its existing behavior for straightforward input.
After trimming, escape special HTML characters (&, <, >, ", ') in the result to their entity forms (&, <, >, ", '). This guarantees that no HTML element, including <script>, can be reconstructed in the output.
Implement the escaping inline in stripMarkup (a tiny helper or chained replace calls), avoiding new dependencies and preserving the function’s “small and dependency-free” design.
Make all changes only inside src/utils/cover.ts, within the stripMarkup function body (lines 84–87), leaving its signature and call sites unchanged.

No additional imports are required; we can use plain string methods.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

…ter sanitization Incomplete multi-character sanitization This string may still contain `<script`, which may cause an HTML element injection vulnerability. Sanitizing untrusted input is a common technique for preventing injection attacks and other security vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the regular expression matches multiple consecutive characters, replacing it just once can result in the unsafe text reappearing in the sanitized input. Fixes https://github.com/davidsneighbour/kollitsch.dev/security/code-scanning/67 Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR attempts to fix a code scanning security alert (#67) related to incomplete multi-character sanitization in the stripMarkup function. The proposed fix adds HTML entity escaping after stripping tags to prevent potential XSS vulnerabilities from malformed HTML.

Changes:

Modified stripMarkup function in src/utils/cover.ts to add HTML entity escaping for special characters (&, <, >, ", ')
Added intermediate variable to store stripped text before escaping

Copilot · 2026-01-12T21:57:48Z

src/utils/cover.ts

+  const text = String(input)
    .replace(/<[^>]*>/g, '')
    .trim();
+
+  // HTML-escape any remaining special characters to prevent injection
+  return text
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;')
+    .replace(/"/g, '&quot;')
+    .replace(/'/g, '&#39;');


This HTML entity escaping will cause double-escaping issues. The stripMarkup function's output is used in Astro/JSX templates as the alt attribute (e.g., alt={cover.alt} in PostImage.astro), where values are automatically HTML-escaped during rendering. Pre-escaping here will result in users seeing literal < and > characters instead of angle brackets in alt text.

Additionally, this change will break the existing test at line 321 of cover.test.ts, which expects 'Fancy' as output, not '<em>Fancy</em>'.

To properly address the security issue of incomplete sanitization, consider instead improving the regex to handle malformed tags (e.g., unclosed tags like <script without >), or use a more robust approach like stripping all characters between and including any < and the next > or end of string.

Suggested change

const text = String(input)

.replace(/<[^>]*>/g, '')

.trim();

// HTML-escape any remaining special characters to prevent injection

return text

.replace(/&/g, '&')

.replace(/</g, '<')

.replace(/>/g, '>')

.replace(/"/g, '"')

.replace(/'/g, ''');

// Strip anything that looks like an HTML tag, including malformed/unclosed tags.

// This removes from '<' up to the next '>' or the end of the string.

return String(input)

.replace(/<[^>]*(>|$)/g, '')

.trim();

Copilot · 2026-01-12T21:57:49Z

src/utils/cover.ts

 function stripMarkup(input: string): string {
-  return String(input)
+  const text = String(input)
    .replace(/<[^>]*>/g, '')


The regex /<[^>]*>/g only matches well-formed tags with both opening < and closing >. It does not handle incomplete or malformed tags like <script (without closing >), which is the root cause of the security vulnerability mentioned in the code scanning alert.

Consider replacing the regex pattern with something more robust that removes any < character and everything up to and including the next >, or removes any stray < or > characters entirely. For example: .replace(/<[^>]*>?/g, '') or a multi-pass approach that removes tags first, then strips any remaining < and > characters.

Suggested change

.replace(/<[^>]*>/g, '')

.replace(/<[^>]*>?/g, '')

Copilot · 2026-01-12T21:57:49Z

src/utils/cover.ts

 function stripMarkup(input: string): string {
-  return String(input)
+  const text = String(input)
    .replace(/<[^>]*>/g, '')
    .trim();
+
+  // HTML-escape any remaining special characters to prevent injection
+  return text
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;')
+    .replace(/"/g, '&quot;')
+    .replace(/'/g, '&#39;');
 }


The security fix should include test cases that verify the function properly handles malformed HTML and potential XSS vectors, such as:

Unclosed tags: <script or <div

Tags without closing bracket: <img src=x onerror=alert(1)

Nested or malformed markup: <<script>alert('XSS')</script>

Mixed content: text<script>alert(1)</script>more text

These test cases would ensure the vulnerability mentioned in the code scanning alert is actually resolved.

davidsneighbour marked this pull request as ready for review January 12, 2026 21:54

Copilot AI review requested due to automatic review settings January 12, 2026 21:54

Copilot started reviewing on behalf of davidsneighbour January 12, 2026 21:55 View session

Copilot AI reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513

Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513

Uh oh!

davidsneighbour commented Jan 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 12, 2026

Uh oh!

Copilot AI Jan 12, 2026

Uh oh!

Copilot AI Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513

Are you sure you want to change the base?

Potential fix for code scanning alert no. 67: Incomplete multi-character sanitization #1513

Uh oh!

Conversation

davidsneighbour commented Jan 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants