-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce struct size from 56 bytes to the recommended 16 #20
Conversation
Thank you for this. Could you link to where the recommendation of 16 byte is? Side note: I knew I recognized your name from somewhere. I'm a big fan of ELMAH, thanks for making it! |
This change is now on NuGet as v1.2.2 - https://www.nuget.org/packages/ByteSize/1.2.2. |
Thanks for merging and publishing an update so quickly! |
It's actually more of a guideline and anything above should be measured to understand the impact. Since you asked for a link mentioning that magical 16 bytes number, see the following passage in “Choosing Between Class and Struct”:
Here's also good old blog entry that goes into overall optimization details around value types:
Also bear in mind that when you embed one value type in another, like in BenchmarksOn measuring the performance impact of this change, here are the numbers from before:
And after reducing the value size (which means less copying and computation in the constructor), the impact is approximately 3 fold increase in performance:
Benchmark Code// ReSharper disable CheckNamespace
using System;
using System.Linq;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using ByteSizeLib;
static class Program
{
static void Main() =>
Console.WriteLine(BenchmarkRunner.Run<Benchmark>());
}
[Config(typeof(Config))]
public class Benchmark
{
[Benchmark]
public static void Test()
{
var total = ByteSize.FromBytes(8);
for (var i = 0; i < 10000; i++)
total = Add(total, total);
}
[MethodImpl(MethodImplOptions.NoInlining)]
static ByteSize Add(ByteSize a, ByteSize b) => a + b;
sealed class Config : ManualConfig
{
public Config()
{
var jobs =
from jit in Job.AllJits
from runtime in new[] { Job.Clr.Runtime }
select jit.With(runtime);
Add(jobs.ToArray());
}
}
} DisassembliesFollowing is the disassembly of the 32-bit x86 JIT-ed code of 00C2046D sub esp,70h
; ⁞
; var total = ByteSize.FromBytes(8);
00C20481 lea edi,[ebp-78h]
00C20484 xor eax,eax
00C20486 lea ecx,[eax+0Eh]
00C20489 rep stos dword ptr es:[edi]
00C2048B fld qword ptr ds:[0C20570h]
00C20491 sub esp,8
00C20494 fstp qword ptr [esp]
00C20497 lea ecx,[ebp-78h]
00C2049A call dword ptr ds:[785468h]
00C204A0 lea edi,[ebp-40h]
00C204A3 lea esi,[ebp-78h]
00C204A6 mov ecx,0Eh
00C204AB rep movs dword ptr es:[edi],dword ptr [esi]
; for (var i = 0; i < 10000; i++)
00C204AD xor esi,esi
; total = Add(total, total);
00C204AF lea eax,[ebp-40h]
00C204B2 sub esp,38h
00C204B5 movq xmm0,mmword ptr [eax]
00C204B9 movq mmword ptr [esp],xmm0
00C204BE movq xmm0,mmword ptr [eax+8]
00C204C3 movq mmword ptr [esp+8],xmm0
00C204C9 movq xmm0,mmword ptr [eax+10h]
00C204CE movq mmword ptr [esp+10h],xmm0
00C204D4 movq xmm0,mmword ptr [eax+18h]
00C204D9 movq mmword ptr [esp+18h],xmm0
00C204DF movq xmm0,mmword ptr [eax+20h]
00C204E4 movq mmword ptr [esp+20h],xmm0
00C204EA movq xmm0,mmword ptr [eax+28h]
00C204EF movq mmword ptr [esp+28h],xmm0
00C204F5 movq xmm0,mmword ptr [eax+30h]
00C204FA movq mmword ptr [esp+30h],xmm0
00C20500 lea eax,[ebp-40h]
00C20503 sub esp,38h
00C20506 movq xmm0,mmword ptr [eax]
00C2050A movq mmword ptr [esp],xmm0
00C2050F movq xmm0,mmword ptr [eax+8]
00C20514 movq mmword ptr [esp+8],xmm0
00C2051A movq xmm0,mmword ptr [eax+10h]
00C2051F movq mmword ptr [esp+10h],xmm0
00C20525 movq xmm0,mmword ptr [eax+18h]
00C2052A movq mmword ptr [esp+18h],xmm0
00C20530 movq xmm0,mmword ptr [eax+20h]
00C20535 movq mmword ptr [esp+20h],xmm0
00C2053B movq xmm0,mmword ptr [eax+28h]
00C20540 movq mmword ptr [esp+28h],xmm0
00C20546 movq xmm0,mmword ptr [eax+30h]
00C2054B movq mmword ptr [esp+30h],xmm0
00C20551 lea ecx,[ebp-40h]
00C20554 call dword ptr ds:[784D94h]
; for (var i = 0; i < 10000; i++)
00C2055A inc esi
00C2055B cmp esi,2710h
00C20561 jl 00C204AF After applying this PR, the code is considerably smaller as well as the allocated stack space ( 0297046C sub esp,10h
; ⁞
; var total = ByteSize.FromBytes(8);
02970482 fld qword ptr ds:[2970500h]
02970488 sub esp,8
0297048B fstp qword ptr [esp]
0297048E call 729A4B83
02970493 sub esp,8
02970496 fstp qword ptr [esp]
02970499 call 728B1CD4
0297049E mov ecx,eax
029704A0 fld dword ptr ds:[2970508h]
029704A6 lea eax,[ebp-14h]
029704A9 mov dword ptr [eax],ecx
029704AB mov dword ptr [eax+4],edx
029704AE fstp qword ptr [eax+8]
; for (var i = 0; i < 10000; i++)
029704B1 xor esi,esi
; total = Add(total, total);
029704B3 lea eax,[ebp-14h]
029704B6 sub esp,10h
029704B9 movq xmm0,mmword ptr [eax]
029704BD movq mmword ptr [esp],xmm0
029704C2 movq xmm0,mmword ptr [eax+8]
029704C7 movq mmword ptr [esp+8],xmm0
029704CD lea eax,[ebp-14h]
029704D0 sub esp,10h
029704D3 movq xmm0,mmword ptr [eax]
029704D7 movq mmword ptr [esp],xmm0
029704DC movq xmm0,mmword ptr [eax+8]
029704E1 movq mmword ptr [esp+8],xmm0
029704E7 lea ecx,[ebp-14h]
029704EA call dword ptr ds:[10C4D94h]
; for (var i = 0; i < 10000; i++)
029704F0 inc esi
029704F1 cmp esi,2710h
029704F7 jl 029704B3 The effects are similar for 64-bit. Here's what the JIT compiled before: 00007FFA795708C2 sub rsp,108h
; ⁞
; var total = ByteSize.FromBytes(8);
00007FFA795408DF xor ecx,ecx
00007FFA795408E1 lea rax,[rsp+98h]
00007FFA795408E9 vxorpd xmm1,xmm1,xmm1
00007FFA795408EE vmovdqu xmmword ptr [rax],xmm1
00007FFA795408F3 vmovdqu xmmword ptr [rax+10h],xmm1
00007FFA795408F9 vmovdqu xmmword ptr [rax+20h],xmm1
00007FFA795408FF mov qword ptr [rax+30h],rcx
00007FFA79540903 lea rcx,[rsp+98h]
00007FFA7954090B vmovsd xmm1,qword ptr [7FFA79540A20h]
00007FFA79540914 call 00007FFA79540148
00007FFA79540919 vmovdqu xmm0,xmmword ptr [rsp+98h]
00007FFA79540923 vmovdqu xmmword ptr [rsp+0D0h],xmm0
00007FFA7954092D vmovdqu xmm0,xmmword ptr [rsp+0A8h]
00007FFA79540937 vmovdqu xmmword ptr [rsp+0E0h],xmm0
00007FFA79540941 vmovdqu xmm0,xmmword ptr [rsp+0B8h]
00007FFA7954094B vmovdqu xmmword ptr [rsp+0F0h],xmm0
00007FFA79540955 mov rcx,qword ptr [rsp+0C8h]
00007FFA7954095D mov qword ptr [rsp+100h],rcx
; for (var i = 0; i < 10000; i++)
00007FFA79540965 xor esi,esi
; total = Add(total, total);
00007FFA79540967 lea rcx,[rsp+0D0h]
00007FFA7954096F vmovdqu xmm0,xmmword ptr [rsp+0D0h]
00007FFA79540979 vmovdqu xmmword ptr [rsp+60h],xmm0
00007FFA79540980 vmovdqu xmm0,xmmword ptr [rsp+0E0h]
00007FFA7954098A vmovdqu xmmword ptr [rsp+70h],xmm0
00007FFA79540991 vmovdqu xmm0,xmmword ptr [rsp+0F0h]
00007FFA7954099B vmovdqu xmmword ptr [rsp+80h],xmm0
00007FFA795409A5 mov rdx,qword ptr [rsp+100h]
00007FFA795409AD mov qword ptr [rsp+90h],rdx
00007FFA795409B5 vmovdqu xmm0,xmmword ptr [rsp+0D0h]
00007FFA795409BF vmovdqu xmmword ptr [rsp+28h],xmm0
00007FFA795409C6 vmovdqu xmm0,xmmword ptr [rsp+0E0h]
00007FFA795409D0 vmovdqu xmmword ptr [rsp+38h],xmm0
00007FFA795409D7 vmovdqu xmm0,xmmword ptr [rsp+0F0h]
00007FFA795409E1 vmovdqu xmmword ptr [rsp+48h],xmm0
00007FFA795409E8 mov rdx,qword ptr [rsp+100h]
00007FFA795409F0 mov qword ptr [rsp+58h],rdx
00007FFA795409F5 lea rdx,[rsp+60h]
00007FFA795409FA lea r8,[rsp+28h]
00007FFA795409FF call 00007FFA79540098
; for (var i = 0; i < 10000; i++)
00007FFA79540A04 inc esi
00007FFA79540A06 cmp esi,2710h
00007FFA79540A0C jl 00007FFA79540967 And which reduces to the following after the PR (stack allocation went from 00007FFA795704B2 sub esp,50h
; ⁞
; var total = ByteSize.FromBytes(8);
00007FFA795604C6 vmovsd xmm0,qword ptr [7FFA79560548h]
00007FFA795604CF call 00007FFAD8D11D04
00007FFA795604D4 vcvttsd2si rcx,xmm0
00007FFA795604D9 vmovsd xmm0,qword ptr [7FFA79560550h]
00007FFA795604E2 mov qword ptr [rsp+40h],rcx
00007FFA795604E7 vmovsd qword ptr [rsp+48h],xmm0
; for (var i = 0; i < 10000; i++)
00007FFA795604EE xor esi,esi
; total = Add(total, total);
00007FFA795604F0 lea rcx,[rsp+40h]
00007FFA795604F5 lea rdx,[rsp+30h]
00007FFA795604FA mov r8,qword ptr [rsp+40h]
00007FFA795604FF mov qword ptr [rdx],r8
00007FFA79560502 vmovsd xmm0,qword ptr [rsp+48h]
00007FFA79560509 vmovsd qword ptr [rdx+8],xmm0
00007FFA7956050F lea rdx,[rsp+20h]
00007FFA79560514 mov r8,qword ptr [rsp+40h]
00007FFA79560519 mov qword ptr [rdx],r8
00007FFA7956051C vmovsd xmm0,qword ptr [rsp+48h]
00007FFA79560523 vmovsd qword ptr [rdx+8],xmm0
00007FFA79560529 lea rdx,[rsp+30h]
00007FFA7956052E lea r8,[rsp+20h]
00007FFA79560533 call 00007FFA79560098
; for (var i = 0; i < 10000; i++)
00007FFA79560538 inc esi
00007FFA7956053A cmp esi,2710h
00007FFA79560540 jl 00007FFA795604F0 |
Thanks for the thorough response. Makes sense the only data we need to store is the total number of bits. |
ByteSize
weighs in at 56 bytes, which is far from the recommendation of 16 bytes for value types (to avoid excessive copying and bloating other value types). This PR turnsKiloBytes
,GigaBytes
,TeraBytes
andPetaBytes
into pure computed properties because these are trivial calculations and not all of them may be needed by the user. This brings the size ofByteSize
to 16 bytes and also avoids the cost of computing them during initialization if they're never used.