Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid three expensive allocations in UriHelper (dotnet/corefx#36056)
In this repro: ```C# using System; using System.Diagnostics; using System.Text; class Program { static void Main() { string input = $"param1={GenerateUrlEncoded(40)}¶m2={GenerateUrlEncoded(220)}"; Console.WriteLine("Input length: " + input.Length); var sw = Stopwatch.StartNew(); string result = Uri.UnescapeDataString(input); Console.WriteLine("Result length: " + result.Length); Console.WriteLine(sw.Elapsed); } private static string GenerateUrlEncoded(int rowsCount) { var sb = new StringBuilder(); for (int i = 0x100; i < 0x999; i++) { sb.Append((char)i); if (i % 10 == 0) sb.Append('<'); if (i % 20 == 0) sb.Append('>'); if (i % 15 == 0) sb.Append('\"'); } string escaped = Uri.EscapeDataString(sb.ToString()); sb.Clear(); for (int i = 0; i < rowsCount; i++) { sb.AppendLine(escaped); } return sb.ToString(); } } ``` on my machine it ends up allocating ~630GB of memory and takes ~14 seconds. Almost all of that ~14 seconds is spent in gc_heap::allocate_large, and most of that inside memset_repmovs. This ends up being due to some large allocations being done in a tight loop. This PR contains three simple fixes that address the majority of the problem. There's still more that can be done here, but this is the lowest of the low-hanging fruit and makes the biggest impact: 1. In UnescapeString, the previous code was allocating a new char[bytes.Length] for each iteration. Stop doing that. Instead, just reuse the same array over and over and only grow it if it's smaller than is needed. 2. In MatchUTF8Sequence, the previous code was allocating a byte[] for each character or surrogate pair. Stop doing that. Instead, just use a reusable four-byte segment of stack. 3. In UnescapeString, the previous code was allocating a new UTF8Encoding for each iteration of the loop. Stop doing that. The object is thread-safe and can be used for all requests, so we just make it a static. These changes drop that ~630GB to ~22MB and that ~14s to ~0.05s. Subsequently, there's more memory-related changes that could be done in this code, e.g. using pooling, addressing some of the other allocation, but I've left that for the future. Commit migrated from dotnet/corefx@5e84d5b
- Loading branch information