Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use default memory management within libxml2 #2734

Closed
wants to merge 1 commit into from

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

Related to #2722, performance.

use default memory management within libxml2 instead of ruby's memory management functions (which has been used since 2009).

HTML4 parsing is 14-19% faster:

alloc html4 parse 656:    22371.3 i/s
BASE_ html4 parse 656:    19589.4 i/s - 1.14x  (± 0.03) slower

alloc html4 parse 70095:      461.4 i/s
BASE_ html4 parse 70095:      385.5 i/s - 1.19x  (± 0.11) slower

alloc html4 parse 1929522:       43.7 i/s
BASE_ html4 parse 1929522:       37.2 i/s - 1.18x  (± 0.05) slower

HTML5 parsing is 5-8% faster:

alloc html5 parse 656:    17909.1 i/s
BASE_ html5 parse 656:    16597.7 i/s - 1.08x  (± 0.04) slower

alloc html5 parse 70095:      277.3 i/s
BASE_ html5 parse 70095:      263.4 i/s - same-ish: difference falls within error

alloc html5 parse 1929522:       16.6 i/s
BASE_ html5 parse 1929522:       15.8 i/s - 1.05x  (± 0.03) slower

HTML4 serialization is 10-14% faster:

alloc html4 srlze 656:    69707.3 i/s
BASE_ html4 srlze 656:    63537.4 i/s - 1.10x  (± 0.02) slower

alloc html4 srlze 70095:     1504.5 i/s
BASE_ html4 srlze 70095:     1331.9 i/s - 1.13x  (± 0.02) slower

alloc html4 srlze 1929522:      145.5 i/s
BASE_ html4 srlze 1929522:      128.0 i/s - 1.14x  (± 0.02) slower

HTML5 serialization is not very impacted, as expected.

alloc html5 srlze 656:    42255.7 i/s
BASE_ html5 srlze 656:    41118.8 i/s - 1.03x  (± 0.02) slower

alloc html5 srlze 70095:     1050.6 i/s
BASE_ html5 srlze 70095:     1033.8 i/s - same-ish: difference falls within error

alloc html5 srlze 1929522:      121.5 i/s
BASE_ html5 srlze 1929522:      118.3 i/s - 1.03x  (± 0.02) slower

Have you included adequate test coverage?

N/A

Does this change affect the behavior of either the C or the Java implementations?

N/A

instead of ruby's memory management functions, which has been the case
since 2009.

HTML4 parsing is 14-19% faster:

    alloc html4 parse 656:    22371.3 i/s
    BASE_ html4 parse 656:    19589.4 i/s - 1.14x  (± 0.03) slower

    alloc html4 parse 70095:      461.4 i/s
    BASE_ html4 parse 70095:      385.5 i/s - 1.19x  (± 0.11) slower

    alloc html4 parse 1929522:       43.7 i/s
    BASE_ html4 parse 1929522:       37.2 i/s - 1.18x  (± 0.05) slower

HTML5 parsing is 5-8% faster:

    alloc html5 parse 656:    17909.1 i/s
    BASE_ html5 parse 656:    16597.7 i/s - 1.08x  (± 0.04) slower

    alloc html5 parse 70095:      277.3 i/s
    BASE_ html5 parse 70095:      263.4 i/s - same-ish: difference falls within error

    alloc html5 parse 1929522:       16.6 i/s
    BASE_ html5 parse 1929522:       15.8 i/s - 1.05x  (± 0.03) slower

HTML4 serialization is 10-14% faster:

    alloc html4 srlze 656:    69707.3 i/s
    BASE_ html4 srlze 656:    63537.4 i/s - 1.10x  (± 0.02) slower

    alloc html4 srlze 70095:     1504.5 i/s
    BASE_ html4 srlze 70095:     1331.9 i/s - 1.13x  (± 0.02) slower

    alloc html4 srlze 1929522:      145.5 i/s
    BASE_ html4 srlze 1929522:      128.0 i/s - 1.14x  (± 0.02) slower

HTML5 serialization is not very impacted, as expected.

    alloc html5 srlze 656:    42255.7 i/s
    BASE_ html5 srlze 656:    41118.8 i/s - 1.03x  (± 0.02) slower

    alloc html5 srlze 70095:     1050.6 i/s
    BASE_ html5 srlze 70095:     1033.8 i/s - same-ish: difference falls within error

    alloc html5 srlze 1929522:      121.5 i/s
    BASE_ html5 srlze 1929522:      118.3 i/s - 1.03x  (± 0.02) slower
@tenderlove
Copy link
Member

The performance wins seem nice. IIRC the reason we used custom allocators was so that the GC had more accurate book keeping WRT to actual memory usage (as well as an opportunity to actually run the GC). I'd expect this change to cause average memory usage of any particular app to increase since we won't GC as frequently.

That said, it's probably fine to do this. I just wanted to add some context! 😃

@flavorjones
Copy link
Member Author

@byroot had also asked to re-run the benchmarks with GC disabled, I'll get around to that.

@flavorjones
Copy link
Member Author

flavorjones commented Dec 20, 2022

With GC disabled ...

HTML4 parsing is 8-25% faster:

alloc html4 parse 656:    26619.4 i/s
BASE_ html4 parse 656:    21296.6 i/s - 1.25x  (± 0.07) slower

alloc html4 parse 70095:      573.4 i/s
BASE_ html4 parse 70095:      502.9 i/s - 1.14x  (± 0.02) slower

alloc html4 parse 1929522:       49.1 i/s
BASE_ html4 parse 1929522:       45.2 i/s - 1.08x  (± 0.02) slower

HTML5 parsing is 0-6% faster:

alloc html5 parse 656:    25314.1 i/s
BASE_ html5 parse 656:    23772.3 i/s - 1.06x  (± 0.03) slower

alloc html5 parse 70095:      280.3 i/s
BASE_ html5 parse 70095:      273.6 i/s - same-ish: difference falls within error

alloc html5 parse 1929522:       15.8 i/s
BASE_ html5 parse 1929522:       15.7 i/s - same-ish: difference falls within error

HTML4 serialization is 10-18% faster:

alloc html4 srlze 656:    71962.7 i/s
BASE_ html4 srlze 656:    63734.2 i/s - 1.13x  (± 0.02) slower

alloc html4 srlze 70095:     1519.9 i/s
BASE_ html4 srlze 70095:     1286.8 i/s - 1.18x  (± 0.02) slower

alloc html4 srlze 1929522:      136.5 i/s
BASE_ html4 srlze 1929522:      123.9 i/s - 1.10x  (± 0.03) slower

@byroot
Copy link
Contributor

byroot commented Dec 21, 2022

Interesting. I would have expected most of the perf difference to be caused by ruby_xmalloc triggering GC more frequently, but apparently not.

👍

@larskanis
Copy link
Member

larskanis commented Dec 21, 2022

Not triggering the GC can have a quite impressive memory bloat in some situations. In pg.gem we had this issue for years, so that Sam Saffron wrote a blog post about it. To that time I solved this by adding an estimator for the amount of memory that is used behind the scenes and this value is sent to the ruby GC per rb_gc_adjust_memory_usage(). This solved the memory bloat issue quite well. Later on a new function to libpq was accepted, that tells the exact amount of memory.

Maybe it's possible to use the native allocation functions in libxml2 and instruct the ruby GC about it per rb_gc_adjust_memory_usage()?

@flavorjones
Copy link
Member Author

OK, I'm going to hold off on trying to land this in the upcoming 1.14.0 release, so we can consider a few other solutions.

@flavorjones flavorjones removed this from the v1.14.0 milestone Dec 21, 2022
@casperisfine
Copy link

Based on the discovery in #2807, I'm now in support of this change. The reasoning is:

  • It's hard to know for sure which libxml2 function will allocate or not, so hard to ensure libxml2 won't allocate during GC.
  • Even if we get it right today, upstream may change and break the gem, causing a vm crash, it's too dangerous.
  • GC trigger heurisitc based on malloc are really not the most relevant ones, it's unlikely to cause increased memory usage in real world scenarios.

@flavorjones
Copy link
Member Author

@casperisfine For whatever it's worth, I think there are ways around the issue we discovered in #2807, I've created #2822 so I don't forget to circle back. That doesn't need to be the forcing function here (though it may still be a good idea anyway).

@flavorjones
Copy link
Member Author

Closing in favor of #2843 which keeps the current default but allows opting into using system malloc via environment variable.

@flavorjones flavorjones closed this Apr 7, 2023
@flavorjones flavorjones deleted the flavorjones-libxml-use-default-malloc branch April 7, 2023 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants