Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unecessary Unboxing on Seq functions #1469

Closed
liboz opened this issue Aug 22, 2016 · 1 comment
Closed

Unecessary Unboxing on Seq functions #1469

liboz opened this issue Aug 22, 2016 · 1 comment

Comments

@liboz
Copy link
Contributor

liboz commented Aug 22, 2016

I was looking into why the Seq.map function is so much slower than System.Linq's Select extension method and found that one major reason is that Seq.map has an unnecessary unboxing step (see the unbox.any below).

        IL_0000: nop
        IL_0001: ldarg.0
        IL_0002: ldarg.1
        IL_0003: newobj instance void class Microsoft.FSharp.Collections.IEnumerator/map@111<!!b, !!a>::.ctor(class Microsoft.FSharp.Core.FSharpFunc`2<!1, !0>, class [mscorlib] System.Collections.Generic.IEnumerator`1<!1>)
        IL_0008: unbox.any class [mscorlib]System.Collections.Generic.IEnumerator`1<!!b>
        IL_000d: ret

Method 1:
To see how to remove this step, I poked around and found this line in IlxGen.fs, which seems to generate this extra line of IL. Commenting out that line appears to have a this sort of effect (but cost more allocations):

Method Platform Jit count Median StdDev Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
LinqSelect X64 RyuJit 10 848.7264 ns 34.2269 ns 3,038.81 - - 590.74
oldMap X64 RyuJit 10 1,544.1877 ns 87.8153 ns 2,318.00 - - 451.25
newMap X64 RyuJit 10 1,259.8084 ns 48.7878 ns 2,492.00 - - 484.89
LinqSelect X86 LegacyJit 10 860.3415 ns 31.4182 ns 1,547.38 - - 302.12
oldMap X86 LegacyJit 10 1,975.4831 ns 88.3854 ns 1,178.00 - - 229.05
newMap X86 LegacyJit 10 1,750.1149 ns 47.8851 ns 1,207.03 - - 240.31

I also found an earlier discussion (#871) and a comment by @manofstick about the performance benefits and how that change at least used to pass all the tests. However, there appears to be a related comment that the reason this exists is because of stack merge points. Are these actually a problem because Roslyn also has a similar comment but it doesn't generate the extra casts? And if so, is there a way we could ignore the casts just for those cases?

Some additional benefits of this approach would be that there would be a performance benefit every time an upcast is done (for example every time you use an interface inside F#)

Method 2:

Another way to deal with the extra cast might be to just remove the MapEnumerator type and thus avoid upcasting at all with something like this (for comparison here):

let mapImplementation f (e : IEnumerator<_>) : IEnumerator<_> =
    let mutable curr = Unchecked.defaultof<_>
    let mutable state = NotStarted
    let GetCurrent () = 
        match state with
        |   InProcess -> curr
        |   NotStarted -> notStarted()
        |   Finished -> alreadyFinished()
    { new IEnumerator<'T> with
            member this.Current = GetCurrent()
       interface IEnumerator with
            member this.Current = box(GetCurrent())
            member this.MoveNext () =
                state <- InProcess
                if e.MoveNext() then
                    curr <- f e.Current
                    true
                else
                    state <- Finished
                    false
            member this.Reset() = noReset()
       interface System.IDisposable with
            member this.Dispose() = e.Dispose() }

Although this helps with the performance, it does appear to come at the cost of more allocations (this
test does 4 chained select/map operations with the id function)

Method Platform Jit count Median StdDev Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
LinqSelect X64 RyuJit 10 842.2784 ns 36.2418 ns 2,921.75 - - 563.48
oldMap X64 RyuJit 10 1,545.4245 ns 61.1134 ns 2,358.00 - - 455.39
newMap X64 RyuJit 10 1,190.1807 ns 57.4496 ns 3,058.00 - - 589.40
LinqSelect X86 LegacyJit 10 859.0624 ns 26.6690 ns 1,547.28 - - 299.70
oldMap X86 LegacyJit 10 1,957.1439 ns 70.0879 ns 1,178.00 - - 227.23
newMap X86 LegacyJit 10 1,660.6148 ns 66.1325 ns 1,472.00 - - 289.18

I'm looking for some guidance on which kind of approach would be better before I try submitting a PR.

Thanks

@manofstick
Copy link
Contributor

manofstick commented Oct 3, 2016

Ha, good memory, I had forgotten about that. OK, will that gives us a good reason to totally ignored small collections until after we have a good compete 1st cut off the Seq replacement baseline. Oops, posted in wrong spo (damn small phone display)t, but you know the context!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants