Why is this Rust program handling CSV data "only" 3-4 times faster than an equivalent Python program? #341
-
The reproduction is in this repository: https://github.com/jan24/checklog (see my answer below for the precise steps I followed). This question was originally sourced from reddit. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Okay, here's what I did to get my baseline. First, I cloned and built your program:
Next I ran the programs to check that they at least appear to behave the same (i.e., have the same output):
Okay, so far so good. Next, I used Hyperfine to benchmark things:
So the Rust program is about 4.5 times faster. That seems pretty good? Your OP says it is "70% faster." Given my measurements, 70% of 218ms is 152.6, and so I think 70% faster means the Rust program would be around 65ms. In my case, it was 48ms. So perhaps even faster than 70% faster, but it's in the right ballpark I think. If I actually look at the timings in your README, I see 3.221s for Python and 0.864s for Rust. In that case, the Rust program is 3.7 times faster. So, overall, we're pretty close. At this point, I feel like I have a decent reproduction and I'm pretty sure I'm at least roughly seeing what you're seeing. Now we can actually sink our teeth into this. The first thing I'm going to do is increase the corpus size. For you, your timings are near a second or so, which I think is a decent target. So this step might not make sense for you. But in my case, the timings are under 100ms and I usually like to get that up to something a bit higher to reduce the impact of noise. Bigger data also tends to approximate the "worst case" in the real world a little better for a variety of reasons. So I'm going to increase your CSV data ten-fold and re-run the benchmark:
All right, so that's good. The benchmark scales. The relative difference remains roughly the same, with the gap closing somewhat. Now it's time to profile the Rust program to see where time is actually being spent: Indeed, most time is being spent in
I then re-built the program and re-profiled it. From looking at the revised profile, I can now see that a big chunk of the time is being spent in And the corresponding source code: rust-csv/csv-core/src/reader.rs Lines 1248 to 1274 in 533d37b This is actually good, because that routine is an optimization that the CSV parser uses when it knows there aren't any state transitions and just needs to accumulate data. Looking at your actual source code, I see that you're using So to me, the above suggests that the CSV portion of your program is operating pretty smoothly. It is worth pointing out that while I believe this crate's parser is faster than Python's So let's look at what the rest of your program is doing. After all, it looks like CSV handling is only about half (or a little less) of the total runtime. From the profile, I see Looking at the actual source code (and the profile), nothing really jumps out at me as something worth fixing. You could almost certainly micro-optimize the writing portion to be faster in a variety of ways, but it's not something I feel inclined to do (and it's not related to CSV parsing). I also wouldn't expect it to make a huge difference because, after all, you are doing a fair bit of work for each record. Rust definitely loses a bit of its edge here because there's lots of writing and allocating, and both of those things are things that Python is not terrible at. As I said on reddit, one of the key advantages Rust gives you over Python is the ability to amortize allocation. You do that for CSV parsing and that almost certainly helps, but it's harder to do that for your formatting task. For comparison, I did also profile the Python program. It, too, is spending a good chunk of time in the CSV parser. It's also spending a lot of time producing and building objects: Given that this CSV parser should be faster but not necessarily other-worldly faster than Python's, and given that a good chunk of your task is just raw CSV parsing and is otherwise just mangling data, I think "3-4 times faster" seems about right. If you were inclined to make this program faster, I would suggest forgetting about micro-optimizing the output formatting (or save that for last) and try one of the following:
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Okay, here's what I did to get my baseline. First, I cloned and built your program:
Next I ran the programs to check that they at least appear to behave the same (i.e., have the same output):