Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory span #183

Open
wants to merge 30 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ec48fcb
Improve performance by parsing HtmlColor with memory span
onizet Sep 28, 2024
a5048cd
Improve regex to avoid a call to HtmlDecode
onizet Sep 28, 2024
eabbce6
Unit and Margin also now use memory span
onizet Sep 28, 2024
1dff57b
Bug fix on Span polyfill method
onizet Sep 30, 2024
a1cd67e
Reuse existing .Net framework implementation of HtmlDecode if available
onizet Sep 30, 2024
b523d8b
Fix sln markup
onizet Sep 30, 2024
e79b024
Handle case sensitiveness on unit metric
onizet Oct 5, 2024
8ce2cec
Use regex compiled for performance boost
onizet Nov 11, 2024
ca5dff9
Set a timeout for regex
onizet Dec 10, 2024
129bebb
Optimise parsing of Style attributes
onizet Dec 11, 2024
66834d8
Ensure to catch regex timeout exception
onizet Dec 12, 2024
4f516aa
Prefer usage of await and async method
onizet Dec 12, 2024
19c263f
Optimise parsing of Style attributes
onizet Dec 14, 2024
07e3f2e
Update changelog
onizet Dec 14, 2024
2bd74d3
Add Benchmark project
onizet Dec 14, 2024
83851e9
Improve unit tests
onizet Dec 14, 2024
7a5322b
Allow to run multiple runtimes benchmarks
onizet Dec 14, 2024
52a3610
Include the csproj in the sln
onizet Dec 14, 2024
04e9302
Remove neat support for SearchValues as performant is worst than Read…
onizet Dec 14, 2024
2b14315
Merge branch 'dev' into memory-span
onizet Dec 16, 2024
dbfb53e
Merge branch 'memory-span' of github.com:onizet/html2openxml into mem…
onizet Dec 16, 2024
89f4e5f
Fix compilation error
onizet Dec 16, 2024
e62d833
Use FrozenDictionary for better performance (thanks Graham!)
onizet Dec 16, 2024
5dc31f7
Decrease warnings
onizet Dec 16, 2024
03bd94d
Code simplification
onizet Dec 20, 2024
9ec8636
Rewrite parsing to improve code readability
onizet Dec 21, 2024
935da6c
Do not allocate new array in a loop
onizet Jan 6, 2025
af8711e
AngleSharp update
onizet Jan 6, 2025
9b8f240
Ensure to trim input before parsing the color
onizet Jan 10, 2025
436a600
Use benefits of FrozenDictionary only for net8, no cumbersome net462 …
onizet Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## 3.3.0

- Rewriting of parsing to use System.Span instead of Regex
- Set Timeout on remaining Regex to prevent any DoS attack
=======
## 3.2.2

- Supports a feature to disable heading numbering #175
Expand Down
66 changes: 37 additions & 29 deletions HtmlToOpenXml.sln
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,42 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Demo", "examples\Demo\Demo.
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "HtmlToOpenXml.Tests", "test\HtmlToOpenXml.Tests\HtmlToOpenXml.Tests.csproj", "{CA0A68E0-45A0-4A01-A061-F951D93D6906}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Benchmark", "examples\Benchmark\Benchmark.csproj", "{143A3684-FAEB-43D0-A895-09BE5FDF85F6}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
EndGlobalSection
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Debug|Any CPU.Build.0 = Debug|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.ActiveCfg = Release|Any CPU
{EF700F30-C9BB-49A6-912C-E3B77857B514}.Release|Any CPU.Build.0 = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Debug|Any CPU.Build.0 = Debug|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.ActiveCfg = Release|Any CPU
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F}.Release|Any CPU.Build.0 = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Debug|Any CPU.Build.0 = Debug|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.ActiveCfg = Release|Any CPU
{CA0A68E0-45A0-4A01-A061-F951D93D6906}.Release|Any CPU.Build.0 = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Debug|Any CPU.Build.0 = Debug|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.ActiveCfg = Release|Any CPU
{143A3684-FAEB-43D0-A895-09BE5FDF85F6}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{EF700F30-C9BB-49A6-912C-E3B77857B514} = {58520A98-BA53-4BA4-AAE3-786AA21331D6}
{A1ECC760-B9F7-4A00-AF5F-568B5FD6F09F} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{CA0A68E0-45A0-4A01-A061-F951D93D6906} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
{143A3684-FAEB-43D0-A895-09BE5FDF85F6} = {84EA02ED-2E97-47D2-992E-32CC104A3A7A}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {14EE1026-6507-4295-9FEE-67A55C3849CE}
SolutionGuid = {194D4CBE-A20A-4E32-967B-E1BBD3922C29}
EndGlobalSection
EndGlobal
21 changes: 21 additions & 0 deletions examples/Benchmark/Benchmark.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>net48;net8.0</TargetFrameworks>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
<SonarQubeExclude>true</SonarQubeExclude>
<LangVersion>latest</LangVersion>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="BenchmarkDotnet" Version="0.14.0" />
<ProjectReference Include="..\..\src\Html2OpenXml\HtmlToOpenXml.csproj" />
</ItemGroup>

<ItemGroup>
<EmbeddedResource Include="*.html" />
</ItemGroup>

</Project>
35 changes: 35 additions & 0 deletions examples/Benchmark/Benchmarks.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using HtmlToOpenXml;

[MemoryDiagnoser]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net48)]
[SimpleJob(runtimeMoniker: RuntimeMoniker.Net80, baseline: true)]
public class Benchmarks
{
[Benchmark]
public async Task ParseWithSpan()
{
string html = ResourceHelper.GetString("benchmark.html");

using (MemoryStream generatedDocument = new MemoryStream())
using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart? mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}

HtmlConverter converter = new HtmlConverter(mainPart);
converter.RenderPreAsTable = true;

await converter.ParseBody(html);
mainPart.Document.Save();
}
}
}
3 changes: 3 additions & 0 deletions examples/Benchmark/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();
7 changes: 7 additions & 0 deletions examples/Benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Benchmarks

How to run the benchmark tool.
First build the project: `dotnet build -c Release`

Then run the performance test targeting multiple runtimes:
`dotnet run -c Release -f net8.0 --runtimes net48 net8.0`
40 changes: 40 additions & 0 deletions examples/Benchmark/ResourceHelper.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Copyright (c) 2017 Deal Stream sàrl. All rights reserved
*/
using System.IO;
using System.Reflection;
using System.Resources;

/// <summary>
/// Helper class to get an embedded resources.
/// </summary>
public static class ResourceHelper
{
public static string GetString(string resourceName)
{
return GetString(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static string GetString(Assembly assembly, string resourceName)
{
using (var stream = GetStream(assembly, resourceName))
{
using (var reader = new StreamReader(stream))
return reader.ReadToEnd();
}
}

public static Stream GetStream(string resourceName)
{
return GetStream(typeof(ResourceHelper).GetTypeInfo().Assembly, resourceName);
}

public static Stream GetStream(Assembly assembly, string resourceName)
{
var stream = assembly.GetManifestResourceStream(assembly.GetName().Name + "." + resourceName);
if (stream == null)
throw new MissingManifestResourceException($"Requested resource `{resourceName}` was not found in the assembly `{assembly}`.");

return stream;
}
}
124 changes: 124 additions & 0 deletions examples/Benchmark/benchmark.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sample HTML Page</title>
</head>
<body>

<!-- Header Section -->
<h1 style="color: blue; text-align: center;">Welcome to My Sample Page</h1>

<!-- Paragraph with styling -->
<p style="font-family: Arial, 'Times New Roman', sans-serif; font-size: 14px; color: grey;">
This is a sample paragraph. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>

<!-- Link with styling -->
<a href="https://www.example.com" style="text-decoration: none; color: green;">Visit Example</a>

<!-- Image with styling -->
<img src="https://via.placeholder.com/150" alt="Placeholder Image" style="border: 2px solid black;">

<!-- List with styling -->
<ul style="list-style-type: square;">
<li style="color: red;">Item 1</li>
<li style="color: orange;">Item 2</li>
<li style="color: yellow;">Item 3</li>
</ul>

<!-- Table with styling -->
<table style="width: 100%; border-collapse: collapse;">
<tr style="background-color: #f2f2f2;">
<th style="border: 1px solid black;">Header 1</th>
<th style="border: 1px solid black;">Header 2</th>
</tr>
<tr>
<td style="border: 1px solid black;">Cell 1</td>
<td style="border: 1px solid black;">Cell 2</td>
</tr>
<tr style="background-color: #f2f2f2;">
<td style="border: 1px solid black;">Cell 3</td>
<td style="border: 1px solid black;">Cell 4</td>
</tr>
</table>

<!-- Form with styling -->
<form style="background-color: #f9f9f9; padding: 20px; border: 1px solid #ccc;">
<label for="name" style="font-weight: bold;">Name:</label>
<input type="text" id="name" name="name" style="width: 100%; padding: 10px; margin-top: 5px;">

<label for="email" style="font-weight: bold; margin-top: 10px; display: block;">Email:</label>
<input type="email" id="email" name="email" style="width: 100%; padding: 10px; margin-top: 5px;">

<button type="submit" style="background-color: blue; color: white; padding: 10px 20px; margin-top: 10px;">Submit</button>
</form>

<!-- Additional Content to Reach 300 Lines -->

<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce convallis, mauris imperdiet gravida bibendum, nisl turpis suscipit mauris, sed placerat ipsum ligula sed magna. Maecenas nisl est, ultrices nec, congue eget, auctor vitae, massa.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce luctus vestibulum augue ut aliquet. Nunc sagittis dictum nisi. Sed id blandit purus. Proin quis orci. Quisque convallis libero in sapien pharetra tincidunt.
</p>

<!-- Additional lines of paragraph to reach 300 lines -->

<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.
</p>
<p style="font-family: Arial, sans-serif; font-size: 14px; color: grey;">
Fusce convallis, mauris imperdiet gravida bibendum, nisl turpis suscipit mauris, sed placerat ipsum ligula sed magna. Maecenas nisl est, ultrices nec, congue eget, auctor vitae, massa.
</p>
<p style="font-family: Arial, 'Times New Roman', sans-serif; font-size: 14px; color: grey;">
Fusce luctus vestibulum augue ut aliquet. Nunc sagittis dictum nisi. Sed id blandit purus. Proin quis orci. Quisque convallis libero in sapien pharetra tincidunt.
</p>

</body>
</html>
1 change: 1 addition & 0 deletions examples/Demo/Demo.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
<PropertyGroup>
<TargetFramework>net8.0</TargetFramework>
<OutputType>Exe</OutputType>
<SonarQubeExclude>true</SonarQubeExclude>
</PropertyGroup>

<ItemGroup>
Expand Down
4 changes: 2 additions & 2 deletions examples/Demo/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ static async Task Main(string[] args)
// instead of creating it from scratch.
using (var buffer = ResourceHelper.GetStream("Resources.template.docx"))
{
buffer.CopyTo(generatedDocument);
await buffer.CopyToAsync(generatedDocument);
}

generatedDocument.Position = 0L;
Expand All @@ -47,7 +47,7 @@ static async Task Main(string[] args)
AssertThatOpenXmlDocumentIsValid(package);
}

File.WriteAllBytes(filename, generatedDocument.ToArray());
await File.WriteAllBytesAsync(filename, generatedDocument.ToArray());
}

Process.Start(new ProcessStartInfo(filename) { UseShellExecute = true });
Expand Down
Loading
Loading