Why is this Rust program handling CSV data "only" 3-4 times faster than an equivalent Python program? #341

BurntSushi · 2023-10-23T13:34:26Z

BurntSushi
Oct 23, 2023
Maintainer

The reproduction is in this repository: https://github.com/jan24/checklog (see my answer below for the precise steps I followed).

This question was originally sourced from reddit.

Answered by BurntSushi

Oct 23, 2023

Okay, here's what I did to get my baseline. First, I cloned and built your program:

$ git clone https://github.com/jan24/checklog
$ cd checklog
$ cargo build --manifest-path checklog_rs/Cargo.toml --release

Next I ran the programs to check that they at least appear to behave the same (i.e., have the same output):

$ ./checklog_rs/target/release/checklog app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
trying to process 1 csv files:
    app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
csv file: app_logs_fake.csv
BQ4 csv logs format verify pass
Output 3 logs:
Log_name            Lines_count  …

View full answer

BurntSushi · 2023-10-23T14:21:42Z

BurntSushi
Oct 23, 2023
Maintainer Author

Okay, here's what I did to get my baseline. First, I cloned and built your program:

$ git clone https://github.com/jan24/checklog
$ cd checklog
$ cargo build --manifest-path checklog_rs/Cargo.toml --release

Next I ran the programs to check that they at least appear to behave the same (i.e., have the same output):

$ ./checklog_rs/target/release/checklog app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
trying to process 1 csv files:
    app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
csv file: app_logs_fake.csv
BQ4 csv logs format verify pass
Output 3 logs:
Log_name            Lines_count         Log_file_path
sequence            5894                app_logs_fake-sequence.log
CONSOLE_11          83943               app_logs_fake-CONSOLE_11.log
SWITCH_01           1298                app_logs_fake-SWITCH_01.log

$ python checklog.py app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
trying to process 1 csv files:
    app_logs_fake.csv
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
csv file: app_logs_fake.csv
BQ4 csv logs format verify pass
Output 3 logs:
Log_name            Lines_count         Log_file_path
sequence            5894                app_logs_fake-sequence.log
CONSOLE_11          83943               app_logs_fake-CONSOLE_11.log
SWITCH_01           1298                app_logs_fake-SWITCH_01.log

Okay, so far so good. Next, I used Hyperfine to benchmark things:

$ hyperfine -w3 "./checklog_rs/target/release/checklog app_logs_fake.csv" "python checklog.py app_logs_fake.csv"
Benchmark 1: ./checklog_rs/target/release/checklog app_logs_fake.csv
  Time (mean ± σ):      48.8 ms ±   0.4 ms    [User: 42.5 ms, System: 6.2 ms]
  Range (min … max):    48.0 ms …  50.1 ms    58 runs

Benchmark 2: python checklog.py app_logs_fake.csv
  Time (mean ± σ):     218.7 ms ±   2.9 ms    [User: 206.0 ms, System: 11.8 ms]
  Range (min … max):   216.5 ms … 227.4 ms    13 runs

Summary
  './checklog_rs/target/release/checklog app_logs_fake.csv' ran
    4.48 ± 0.07 times faster than 'python checklog.py app_logs_fake.csv'

So the Rust program is about 4.5 times faster. That seems pretty good? Your OP says it is "70% faster." Given my measurements, 70% of 218ms is 152.6, and so I think 70% faster means the Rust program would be around 65ms. In my case, it was 48ms. So perhaps even faster than 70% faster, but it's in the right ballpark I think. If I actually look at the timings in your README, I see 3.221s for Python and 0.864s for Rust. In that case, the Rust program is 3.7 times faster. So, overall, we're pretty close.

At this point, I feel like I have a decent reproduction and I'm pretty sure I'm at least roughly seeing what you're seeing. Now we can actually sink our teeth into this.

The first thing I'm going to do is increase the corpus size. For you, your timings are near a second or so, which I think is a decent target. So this step might not make sense for you. But in my case, the timings are under 100ms and I usually like to get that up to something a bit higher to reduce the impact of noise. Bigger data also tends to approximate the "worst case" in the real world a little better for a variety of reasons. So I'm going to increase your CSV data ten-fold and re-run the benchmark:

$ for ((i=1;i<=10;i++)); do cat app_logs_fake.csv; done > app_logs_fake.10x.csv

$ hyperfine -w3 "./checklog_rs/target/release/checklog app_logs_fake.10x.csv" "python checklog.py app_logs_fake.10x.csv"
Benchmark 1: ./checklog_rs/target/release/checklog app_logs_fake.10x.csv
  Time (mean ± σ):     514.6 ms ±  16.6 ms    [User: 423.4 ms, System: 69.0 ms]
  Range (min … max):   488.7 ms … 537.6 ms    10 runs

Benchmark 2: python checklog.py app_logs_fake.10x.csv
  Time (mean ± σ):      2.077 s ±  0.010 s    [User: 1.950 s, System: 0.095 s]
  Range (min … max):    2.062 s …  2.090 s    10 runs

Summary
  './checklog_rs/target/release/checklog app_logs_fake.10x.csv' ran
    4.04 ± 0.13 times faster than 'python checklog.py app_logs_fake.10x.csv'

All right, so that's good. The benchmark scales. The relative difference remains roughly the same, with the gap closing somewhat. Now it's time to profile the Rust program to see where time is actually being spent:

Indeed, most time is being spent in csv_core::Reader::read_record. So perhaps there is something to be done! Time to look a little more closely at the profile. Okay, once I started doing that I realized that the profiling data was a bit incomplete, so I added debug symbols to the release build by adding this to the bottom of checklog_rs/Cargo.toml:

[profile.release]
debug = true

I then re-built the program and re-profiled it. From looking at the revised profile, I can now see that a big chunk of the time is being spent in DfaClasses::scan_and_copy:

And the corresponding source code:

rust-csv/csv-core/src/reader.rs

Lines 1248 to 1274 in 533d37b

    
           /// Scan and copy the input bytes to the output buffer quickly. 
        
           /// 
        
           /// This assumes that the current state of the DFA is either `InField` or 
        
           /// `InQuotedField`. In this case, all bytes corresponding to the first 
        
           /// equivalence class (i.e., not a delimiter/quote/escape/etc.) are 
        
           /// guaranteed to never result in a state transition out of the current 
        
           /// state. This function takes advantage of that copies every byte from 
        
           /// `input` in the first equivalence class to `output`. Once a byte is seen 
        
           /// outside the first equivalence class, we quit and should fall back to 
        
           /// the main DFA loop. 
        
           #[inline(always)] 
        
           fn scan_and_copy( 
        
               &self, 
        
               input: &[u8], 
        
               nin: &mut usize, 
        
               output: &mut [u8], 
        
               nout: &mut usize, 
        
           ) { 
        
               while *nin < input.len() 
        
                   && *nout < output.len() 
        
                   && self.classes[input[*nin] as usize] == 0 
        
               { 
        
                   output[*nout] = input[*nin]; 
        
                   *nin += 1; 
        
                   *nout += 1; 
        
               } 
        
           }

This is actually good, because that routine is an optimization that the CSV parser uses when it knows there aren't any state transitions and just needs to accumulate data.

Looking at your actual source code, I see that you're using csv::Reader::read_record. That's good because you're amortizing allocation. The only thing to be done there that could be faster is csv::Reader::read_byte_record. But it looks to me like you really do want a StringRecord. Switching to a ByteRecord is possible, but would be a sizeable refactoring.

So to me, the above suggests that the CSV portion of your program is operating pretty smoothly. It is worth pointing out that while I believe this crate's parser is faster than Python's csv module in raw throughput, Python's parser is written in C and is generally quite fast. Usually where CSV parsing in Python falls over is what you do with the data once the parser has done its job. There's usually a lot of object creation and what not that happens that slows you down.

So let's look at what the rest of your program is doing. After all, it looks like CSV handling is only about half (or a little less) of the total runtime. From the profile, I see core::fmt::Write is taking about 13% of the total time. And this seems to roughly line up with the total timings as reported by functions called from main:

Looking at the actual source code (and the profile), nothing really jumps out at me as something worth fixing. You could almost certainly micro-optimize the writing portion to be faster in a variety of ways, but it's not something I feel inclined to do (and it's not related to CSV parsing). I also wouldn't expect it to make a huge difference because, after all, you are doing a fair bit of work for each record. Rust definitely loses a bit of its edge here because there's lots of writing and allocating, and both of those things are things that Python is not terrible at. As I said on reddit, one of the key advantages Rust gives you over Python is the ability to amortize allocation. You do that for CSV parsing and that almost certainly helps, but it's harder to do that for your formatting task.

For comparison, I did also profile the Python program. It, too, is spending a good chunk of time in the CSV parser. It's also spending a lot of time producing and building objects:

Given that this CSV parser should be faster but not necessarily other-worldly faster than Python's, and given that a good chunk of your task is just raw CSV parsing and is otherwise just mangling data, I think "3-4 times faster" seems about right.

If you were inclined to make this program faster, I would suggest forgetting about micro-optimizing the output formatting (or save that for last) and try one of the following:

Look into using simdcsv. I don't believe there is a Rust port yet though.
Add multi-threading in some way. There are a ton of different approaches to this, but if you can afford to create an index first with csv-index (which is itself quite fast), then you can pretty easily chunk up the CSV file correctly and process each chunk in parallel. There are other approaches. All of them likely require significant refactoring.

2 replies

jan24 Oct 26, 2023

I'm trying to multi-thread read the csv source file with csv-index, i.e. different thread read different rows, but all fn related read in crate csv need &mut self or self, how can i read it in mulit thread ?

jan24 Oct 26, 2023

every thread need open the csv source files to get a independt copy ?

jan24 · 2023-10-26T10:33:08Z

jan24
Oct 26, 2023

I'm trying to multi-thread read the csv source file with csv-index, i.e. different thread read different rows, but all fn related read in crate csv need &mut self or self, how can i read it in mulit thread ?

1 reply

BurntSushi Oct 26, 2023
Maintainer Author

I suggest you check out xsv. It does multi-threaded reading.

But yes, absolutely, you certainly need one reader per thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is this Rust program handling CSV data "only" 3-4 times faster than an equivalent Python program? #341

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why is this Rust program handling CSV data "only" 3-4 times faster than an equivalent Python program? #341

BurntSushi Oct 23, 2023 Maintainer

Replies: 2 comments · 3 replies

BurntSushi Oct 23, 2023 Maintainer Author

jan24 Oct 26, 2023

jan24 Oct 26, 2023

jan24 Oct 26, 2023

BurntSushi Oct 26, 2023 Maintainer Author

BurntSushi
Oct 23, 2023
Maintainer

Replies: 2 comments 3 replies

BurntSushi
Oct 23, 2023
Maintainer Author

jan24
Oct 26, 2023

BurntSushi Oct 26, 2023
Maintainer Author