Perf: Collection Count() is Slower in Core than the CLR #11971

RealDotNetDave · 2019-02-04T17:08:39Z

I have been benchmarking performance of Count() for a collection. I am seeing that Core is a lot slower than the CLR.

Count() (Clr) | VM 8.138 | LAPTOP 4.847
Count() (Core) | VM 18.073 | LAPTOP 12.355
Count() with Predicate (Clr) | VM 275,516.47 | LAPTOP 183,284.90
Count() with Predicate (Core) | VM 307,900.18 | LAPTOP 217,629.30

The VM above is an Azure VM: Size: Standard D8s v3 (8 vcpus, 32 GB memory)
Benchmarks were done with BenchmarkDotNet.

stephentoub · 2019-02-04T17:21:08Z

Which collection?

Which "Count()"? LINQ's?

What OS?

What version of .NET Core?

Can you share the benchmark?

RealDotNetDave · 2019-02-04T17:30:57Z

Thanks for the quick response. Here is the info you requested:

Test was done with List<> using Count() from Linq. The collection has 10,000 items in it.

Tests were done with Clr 4.7.2 and Core 2.2 using BenchmarkDotNet.

The benchmark code looks like this:
_personCollection.Count() > 0;

Here are the computer stats (in ns):
Laptop
OS Name: Microsoft Windows 10 Pro Version 10.0.17134 Build 17134
System Manufacturer: Microsoft Corporation
System Model: Surface Laptop
System Type: x64-based PC
Processor: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz, 2496 Mhz, 2 Core(s), 4 Logical Processor(s)
BIOS Version/Date: Microsoft Corporation 137.2307.769, 8/3/2018
SMBIOS Version: 3.1
Locale: United States
Installed Physical Memory (RAM): 16.0 GB
Total Virtual Memory: 18.3 GB

Virtual Machine
Azure VM Size: Standard D8s v3 (8 vcpus, 32 GB memory)
OS Name: Microsoft Windows 10 Pro Version 10.0.17134 Build 17134
System Manufacturer: Microsoft Corporation
System Model: Virtual Machine
System Type: x64-based PC
Processor: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz, 2295 Mhz, 4 Core(s), 8 Logical Processor(s)
BIOS Version/Date: American Megatrends Inc. 090007, 6/2/2017
SMBIOS Version: 2.3
Locale: United States
Installed Physical Memory (RAM): 32.0 GB
Total Virtual Memory: 36.7 GB

EgorBo · 2019-02-04T21:04:24Z

Wrote a small benchmark (not sure it's what @RealDotNetDave meant but anyway):

[CoreJob(true)]
[ClrJob]
public class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    public IEnumerable<string> _list;

    [GlobalSetup]
    public void Setup() => _list = Enumerable.Range(0, 10000).Select(i => i.ToString()).ToList();

    [Benchmark]
    public bool Count() => _list.Count() > 0;

    [Benchmark]
    public bool IsICollectionT() => _list is ICollection<string>;
}

// * Summary *

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.523 (1803/April2018Update/Redstone4)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
Frequency=3609377 Hz, Resolution=277.0561 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host] : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3260.0
  Core   : .NET Core 2.2.1 (CoreCLR 4.6.27207.03, CoreFX 4.6.27207.03), 64bit RyuJIT


         Method |  Job | Runtime |      Mean |     Error |    StdDev | Ratio |
--------------- |----- |-------- |----------:|----------:|----------:|------:|
          Count |  Clr |     Clr |  4.300 ns | 0.0020 ns | 0.0019 ns |  0.42 |
          Count | Core |    Core | 10.226 ns | 0.0126 ns | 0.0105 ns |  1.00 |
                |      |         |           |           |           |       |
 IsICollectionT |  Clr |     Clr |  1.598 ns | 0.0018 ns | 0.0014 ns |  0.76 |
 IsICollectionT | Core |    Core |  2.103 ns | 0.0013 ns | 0.0011 ns |  1.00 |

^ .NET Core 2.2 (3.0-master has the same numbers)
The Count<T> implementation is almost the same (at least the fast path) in .NET Core and .NET Fx.

RealDotNetDave · 2019-02-05T17:03:06Z

So it looks like you are seeing the same thing as I am.

jkotas · 2019-02-07T08:01:22Z

Here are a few diffs that I have noticed:

Better optimizations of the loop in Count<> slow path (less stack spilling) are regressing the fast path (more registers saved).
Instruction mix used for setup of interface dispatch:

Before:

mov r11,7FF7DE100020h
mov     rax,qword ptr [r11]
cmp     dword ptr [rcx],ecx
add     rsp,20h

After (seems to be slower in microbenchmark):

mov r11,7FFF92EF0030h
mov     rax,qword ptr [00007fff`92ef0030]	<- notice that this instruction takes more bytes than before, but it avoids the data-flow dependency
cmp     dword ptr [rcx],ecx
add     rsp,20h

AndyAyersMS · 2019-02-08T18:52:28Z

Wonder if the interface dispatch change was a result of dotnet/coreclr#16267?

@jkotas are you still investigating?

jkotas · 2019-02-08T19:04:25Z

I do not plan to do any more investigations on this.

AndyAyersMS · 2019-02-09T03:23:34Z

Looks more like the jit has trouble with the the current expression of Count. I'll have to dig in and figure out exactly what is tripping it up.

Looking at some variations in implementing Count -- Count2 is similar but peels off the slow and exception cases, Count3 is similar to what desktop does, Count4 is desktop plus peeling -- core does better on my box for all but the one we use now (not 100% fair comparison because I can't check for IIListProvider, but we don't get that far in this test anyways).

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.18323
Intel Core i7-4770HQ CPU 2.20GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host] : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.3740.0
  Core   : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT

Method	Job	Runtime	Mean	Error	StdDev	Ratio	RatioSD
Count	Clr	Clr	6.963 ns	0.0359 ns	0.0336 ns	0.37	0.00
Count	Core	Core	18.933 ns	0.2310 ns	0.2161 ns	1.00	0.00

Count2	Clr	Clr	7.528 ns	0.1878 ns	0.3032 ns	1.11	0.05
Count2	Core	Core	6.927 ns	0.1084 ns	0.0961 ns	1.00	0.00

Count3	Clr	Clr	7.169 ns	0.2585 ns	0.2418 ns	1.10	0.04
Count3	Core	Core	6.566 ns	0.1467 ns	0.1225 ns	1.00	0.00

Count4	Clr	Clr	7.166 ns	0.0330 ns	0.0293 ns	1.11	0.03
Count4	Core	Core	6.509 ns	0.1598 ns	0.1709 ns	1.00	0.00

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[CoreJob(true)]
[ClrJob]

public class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }

    public IEnumerable<string> _list;

    [GlobalSetup]
    public void Setup() => _list = Enumerable.Range(0, 10000).Select(i => i.ToString()).ToList();

    [Benchmark]
    public bool Count() => _list.Count() > 0;

    [Benchmark]
    public bool Count2() => _list.Count2() > 0;

    [Benchmark]
    public bool Count3() => _list.Count3() > 0;

    [Benchmark]
    public bool Count4() => _list.Count4() > 0;
}

static class Extensions
{
    public static int Count2<TSource>(this IEnumerable<TSource> source)
    {
        if (source == null)
        {
            CountError();
        }
        
        if (source is ICollection<TSource> collectionoft)
        {
            return collectionoft.Count;
        }
        
        if (source is ICollection collection)
        {
            return collection.Count;
        }

        return Count2Slow(source);
    }

    public static int Count2Slow<TSource>(IEnumerable<TSource> source)
    {
        int count = 0;
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            checked
            {
                while (e.MoveNext())
                {
                    count++;
                }
            }
        }
        
        return count;
    }

    public static int Count3<TSource>(this IEnumerable<TSource> source)
    {
        if (source == null) CountError();
        ICollection<TSource> collectionoft = source as ICollection<TSource>;
        if (collectionoft != null) return collectionoft.Count;
        ICollection collection = source as ICollection;
        if (collection != null) return collection.Count;
        int count = 0;
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            checked
            {
                while (e.MoveNext()) count++;
            }
        }
        return count;
    }

    public static int Count4<TSource>(this IEnumerable<TSource> source)
    {
        if (source == null) CountError();
        ICollection<TSource> collectionoft = source as ICollection<TSource>;
        if (collectionoft != null) return collectionoft.Count;
        ICollection collection = source as ICollection;
        if (collection != null) return collection.Count;

        return Count4Slow<TSource>(source);
    }

    public static int Count4Slow<TSource>(IEnumerable<TSource> source)
    {
        int count = 0;
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            checked
            {
                while (e.MoveNext()) count++;
            }
        }
        return count;
    }

    public static void CountError()
    {
        throw new Exception("source");
    }
}

stephentoub · 2019-02-09T17:44:40Z

Count3 is similar to what desktop does

So since this basically differs in:

ICollection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;

vs

if (source is ICollection<TSource> collectionoft)
    return collectionoft.Count;

the JIT is stumbling over the IL pattern generated by the latter? That pattern has started to be used fairly aggressively across corefx, and coreclr in some places, so if the JIT is having more difficulties with it, it'd be great to get that fixed.

jkotas · 2019-02-09T18:07:25Z

the JIT is stumbling over the IL pattern generated by the latter

Right, the IL generated for this pattern looks like this:

ldarg.0
isinst     class [mscorlib]System.Collections.Generic.ICollection`1<!!0>
dup
stloc.0
brfalse.s  IL_XXX

ldloc.0
callvirt   instance int32 class [mscorlib]System.Collections.Generic.ICollection`1<!!0>::get_Count()
...

The problem is the dup instruction. The dup IL instruction is not as cheap as it looks. It causes the JIT to introduce a new local. As far as the JIT is concerned, the above IL is equivalent to:

.local temp;

ldarg.0
isinst     class [mscorlib]System.Collections.Generic.ICollection`1<!!0>
stloc temp
ldloc temp
ldloc temp
stloc.0
brfalse.s  IL_XXX

ldloc.0
callvirt   instance int32 class [mscorlib]System.Collections.Generic.ICollection`1<!!0>::get_Count()
...

The JIT optimizations have to work hard to get rid of the extra local and figure out that it is redundant with the other local that is there already. We can fix the JIT to do that, but it would be better if Roslyn avoids generating the problematic dup instruction here, i.e. generate the following IL:

ldarg.0
isinst     class [mscorlib]System.Collections.Generic.ICollection`1<!!0>
stloc.0
ldloc.0
brfalse.s  IL_XXX

ldloc.0
callvirt   instance int32 class [mscorlib]System.Collections.Generic.ICollection`1<!!0>::get_Count()
...

cc @jaredpar

jaredpar · 2019-02-09T18:15:07Z

@agocke who has been looking at this recently.

AndyAyersMS · 2019-02-09T18:51:16Z

The generated code for Count and Count2 (which [look like the should] have the same IL constructs for the fast paths) shows more prolog/epilog pressure in Count, but given the path length we're testing here I'm surprised that matters so much for perf. So there may be something else going on here too.

AndyAyersMS · 2019-02-09T19:42:24Z

Ah, Count2 doesn't have dup in my case -- maybe I'm using a different CSC version.

jaredpar · 2019-02-09T20:20:45Z

@AndyAyersMS make sure you specify /release otherwise the dup won't be emitted.

AndyAyersMS · 2019-02-10T18:41:00Z

I'm building the code above with the 3.0 preview 2 sdk, via dotnet build -c Release. It is invoking

 Microsoft (R) Visual C# Compiler version 3.0.19.6714 (359844cc)

Looks the this passes /debug- /optimize+ to csc ... is that not sufficient?

AndyAyersMS · 2019-02-10T19:33:32Z

With current master I don't see a lot of codegen or perf diffs for the dup and non-dup versions. Here Count2c is written to more or less exactly mimic Count.

Count2a and Count2b factor out the slow loop, Count2a, throws via a helper, Count2 is as above.

;; 3.0 preview 2
D:\bugs\22400\ex>dotnet run -c Release
  Count: 1987.81ms
  Count2: 870.48ms
  Count2a: 1437.34ms
  Count2b: 1417.22ms
  Count2c: 1373.82ms

;; current master
  Count: 1555.89ms
  Count2: 933.90ms
  Count2a: 1385.73ms
  Count2b: 1417.63ms
  Count2c: 1532.44ms

Via perfview, the big difference seen in Count2 versus the others seemingly comes from it only testing for two interfaces instead of three -- once we add in a third interface test it seems like we start missing fast lookup and have to resolve via the JitGenericHandleCache and the cost of the runtime method handle access goes way up. It doesn't seem to matter that the first interface test succeeds and so we don't even execute the other tests.

AndyAyersMS · 2019-02-10T22:25:29Z

As far as Count vs Count2, I think what happens something like the following.

Generic methods are given 4 special dictionary slots that can be used for fast lookup. Here one of these gets used for the type parameter, and the others are used for various types and methods and so on that depend on the type parameter. These are allocated on demand during jit time as the jit runs and calls back into the runtime for various constructs.

At runtime, if a handle request isn't in one of these slots, it is looked up in a global hash table.

In Count there are 6 dependent items -- 3 interface types and the 3 methods they invoke. When we jit we selectively follow control flow in the method in some DFS manner, and in the case of Count the jit happens to import all 3 type tests before any of the associated calls, as the importer tends to follow taken branches in its DFS. Since there are 3 type tests in the method and they are reachable from entry by taken branches, they fill up the remaining 3 special slots, and when the jit circles back to generating code for the the called methods, they don't get special slots. So accessing them is slower at runtime.

If we look at Count2 there are only two type tests, and so the first method also gets a preferred slot, and this is the one we end up invoking. So perf is improved.

For isinst; brfalse we might be able to tweak the importer so it looks at the fall-through code first, presuming the earliest encountered test is the most important (though in these cases there are other IL instructions in between these two). Or maybe there is some way to arrange for some generic methods to get more slots...?

jkotas · 2019-02-10T22:31:47Z

Generic methods are given 4 special dictionary slots that can be used for fast lookup

This limit came from fragile NGen. With R2R, there should be nothing fundamental preventing us from making the number of dictionary slots for generic methods dynamic - give each generic method the exact number of slots it needs to run efficiently.

jaredpar · 2019-02-11T00:52:11Z

@AndyAyersMS

Looks the this passes /debug- /optimize+ to csc ... is that not sufficient?

That should indeed be sufficient here. I'm not sure why you don't see the dup ...

AndyAyersMS · 2019-02-11T08:03:30Z

I also plan to look into the perf difference between 3.0 preview 2 and the current master.

AndyAyersMS · 2019-02-14T02:24:28Z

I have a hypothesis about the 3.0 preview 2 vs current master diffs, related to the above. If correct, this can also help explain the Core vs Desktop perf difference that initiated this issue.

It appears that when we rejit a ready to run precompiled method, the dictionary slots used by ready to run can't be shared by jitted code. So we run out of slots "faster" in a tier1 rejit if we prejit than we would if we hadn't prejitted, and this impacts perf of the tier1 code.

My guess is that in the 3.0 preview 2 SDK we must be prejitting System.Linq and so it suffers from this effect -- the R2R version is slow, and uses up slots, so the Tier1 version is not as fast as it would be if we hadn't prejitted at all.

When I run locally I'm likely not using a prejitted System.Linq and (usually) am not prejitting my test code. So I don't see this extra cost.

We can simulate this with a simple test program.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;

class P
{
    const int N = 100_000_000;
    public IEnumerable<string> _list;

    [MethodImpl(MethodImplOptions.NoOptimization)]
    public static void Main()
    {
        var p = new P();
        p._list = Enumerable.Range(0, 10000).Select(i => i.ToString()).ToList();
        p.Count2();

        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();        
        for (int i = 0; i < N; i++)
        {
            p.Count2();
        }
        stopWatch.Stop();
        
        TimeSpan ts = stopWatch.Elapsed;
        Console.WriteLine($"  Count2: {ts.TotalMilliseconds:F2}ms");
    }

    public bool Count2() => _list.Count2() > 0;
}

static class Extensions
{
    public static int Count2<TSource>(this IEnumerable<TSource> source)
    {
        if (source == null)
        {
            return 0;
        }
        
        if (source is ICollection<TSource> collectionoft)
        {
            return collectionoft.Count;
        }

        int count = 0;
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            checked
            {
                while (e.MoveNext())
                {
                    count++;
                }
            }
        }
        
        return count;
    }
}

I get the following perf numbers for local runs via corerun:

Prejitted ?	Tiered ?	Time (ms)	Notes
No	No	1034
No	Yes	1091	small bit of tiering overhead
Yes	No	2032	large bit of R2R overhead
Yes	Yes	1543	R2R uses up slots and slows down Tier1

One would expect that last row would be closer to 1091.

So if we do work adjust dictionary slot counts for R2R to make sure we don't run out, we should also look into how R2R shares slots with rejitted Tier1 code.

Desktop doesn't use R2R and doesn't rejit -- it will either be jitting or using NGEN prejitted code, so it doesn't hit this case, and (assuming NGEN has overhead here similar to jitted code) it should see "fast" perf.

Core 2.2 uses R2R and doesn't rejit by default, so it will see the slowest case.

cc @kouvel

jkotas · 2019-02-14T05:23:16Z

If the tiered JIT is on, we may want to modify the R2R to never use the fast slots. Reserve the fast slots for use by the tiered JIT code only. It should be easy change to make (it is just a change in the runtime, it does not affect the file format or the compiler).

cc @fadimounir

AndyAyersMS · 2019-02-14T17:27:18Z

To to sum up, there seem to be a few avenues for follow up:

consider increasing number of fast slots for generic methods
look into how prejitted and rejitted share or don't share fast slots
consider not allowing R2R to use fast slots
if number of fast slots is limited, try and maximize effective fast slot usage (eg dotnet/coreclr#22601)

I assume similar things apply to generic types, though perhaps less often? It might be worth some investigation. Perhaps we could monitor how often we end up checking the generic handle cache.

Since this is mostly work on the runtime side, I'm going to update the issue area.

Disable use of fast dictionary slots for R2R images when tiered JITing is enabled. Fixes #22400

AndyAyersMS · 2019-02-19T20:05:03Z

@sergiy-k @fadimounir this is the issue I was referring to....

RealDotNetDave · 2019-02-23T18:43:01Z

Wow, I really like all the back and forth on this. Will this be fixed in .NET 3.x?

RealDotNetDave changed the title ~~Collection Count() is Slower in Core than the CLR~~ Perf: Collection Count() is Slower in Core than the CLR Feb 4, 2019

AndyAyersMS self-assigned this Feb 8, 2019

AndyAyersMS removed their assignment Feb 14, 2019

jkotas referenced this issue in jkotas/coreclr Feb 15, 2019

Reserve fast dictionary slots for Tier1 code

cc61e65

Disable use of fast dictionary slots for R2R images when tiered JITing is enabled. Fixes #22400

jkotas closed this as completed in dotnet/coreclr#22619 Feb 15, 2019

jkotas referenced this issue in dotnet/coreclr Feb 15, 2019

Reserve fast dictionary slots for Tier1 code (#22619)

bae641f

Disable use of fast dictionary slots for R2R images when tiered JITing is enabled. Fixes #22400

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

AndyAyersMS mentioned this issue Jan 31, 2020

JIT: consider priority driven importation #12041

Closed

AndyAyersMS mentioned this issue Sep 15, 2020

[System.Linq] Consider adding runtime checks for IReadOnlyCollection<T> in input sources #42254

Open

ghost locked as resolved and limited conversation to collaborators Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Collection Count() is Slower in Core than the CLR #11971

Perf: Collection Count() is Slower in Core than the CLR #11971

RealDotNetDave commented Feb 4, 2019

stephentoub commented Feb 4, 2019

RealDotNetDave commented Feb 4, 2019 •

edited

Loading

EgorBo commented Feb 4, 2019

RealDotNetDave commented Feb 5, 2019

jkotas commented Feb 7, 2019

AndyAyersMS commented Feb 8, 2019

jkotas commented Feb 8, 2019

AndyAyersMS commented Feb 9, 2019

stephentoub commented Feb 9, 2019

jkotas commented Feb 9, 2019 •

edited

Loading

jaredpar commented Feb 9, 2019

AndyAyersMS commented Feb 9, 2019 •

edited

Loading

AndyAyersMS commented Feb 9, 2019

jaredpar commented Feb 9, 2019

AndyAyersMS commented Feb 10, 2019

AndyAyersMS commented Feb 10, 2019 •

edited

Loading

AndyAyersMS commented Feb 10, 2019 •

edited

Loading

jkotas commented Feb 10, 2019

jaredpar commented Feb 11, 2019

AndyAyersMS commented Feb 11, 2019

AndyAyersMS commented Feb 14, 2019 •

edited

Loading

jkotas commented Feb 14, 2019 •

edited

Loading

AndyAyersMS commented Feb 14, 2019

AndyAyersMS commented Feb 19, 2019

RealDotNetDave commented Feb 23, 2019

Perf: Collection Count() is Slower in Core than the CLR #11971

Perf: Collection Count() is Slower in Core than the CLR #11971

Comments

RealDotNetDave commented Feb 4, 2019

stephentoub commented Feb 4, 2019

RealDotNetDave commented Feb 4, 2019 • edited Loading

EgorBo commented Feb 4, 2019

RealDotNetDave commented Feb 5, 2019

jkotas commented Feb 7, 2019

AndyAyersMS commented Feb 8, 2019

jkotas commented Feb 8, 2019

AndyAyersMS commented Feb 9, 2019

stephentoub commented Feb 9, 2019

jkotas commented Feb 9, 2019 • edited Loading

jaredpar commented Feb 9, 2019

AndyAyersMS commented Feb 9, 2019 • edited Loading

AndyAyersMS commented Feb 9, 2019

jaredpar commented Feb 9, 2019

AndyAyersMS commented Feb 10, 2019

AndyAyersMS commented Feb 10, 2019 • edited Loading

AndyAyersMS commented Feb 10, 2019 • edited Loading

jkotas commented Feb 10, 2019

jaredpar commented Feb 11, 2019

AndyAyersMS commented Feb 11, 2019

AndyAyersMS commented Feb 14, 2019 • edited Loading

jkotas commented Feb 14, 2019 • edited Loading

AndyAyersMS commented Feb 14, 2019

AndyAyersMS commented Feb 19, 2019

RealDotNetDave commented Feb 23, 2019

RealDotNetDave commented Feb 4, 2019 •

edited

Loading

jkotas commented Feb 9, 2019 •

edited

Loading

AndyAyersMS commented Feb 9, 2019 •

edited

Loading

AndyAyersMS commented Feb 10, 2019 •

edited

Loading

AndyAyersMS commented Feb 10, 2019 •

edited

Loading

AndyAyersMS commented Feb 14, 2019 •

edited

Loading

jkotas commented Feb 14, 2019 •

edited

Loading