Skip to content
This repository has been archived by the owner on Apr 19, 2020. It is now read-only.

Not wroking correctly on file with too many lines #13

Open
LinjianLi opened this issue Oct 16, 2019 · 5 comments
Open

Not wroking correctly on file with too many lines #13

LinjianLi opened this issue Oct 16, 2019 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@LinjianLi
Copy link

LinjianLi commented Oct 16, 2019

Environment

  • Ubuntu 18.04
  • CLion 2019.1.4

Issue

I test the program on a CSV file 20k_rows_data.csv.txt with 20K lines and the program does not work correctly. (I change the filename with .txt, because GitHub issue does not support uploading .csv file.)

int main() {
  csv::Reader csv;
  csv.read("../tests/inputs/20k_rows_data.csv.txt");
  auto rows = csv.rows();
  auto cols = csv.cols();
  int row_count = 0;
  for (auto row : rows) {
    std::string s = std::to_string(++row_count);
    for (auto col : cols) {
      s += ' ' + (std::string)(row[col]);
    }
    std::cout << s << std::endl;
  }
}

Part of the output is like (copy from my console):

5332     
5333     
5334 1 1 1 1 1
5335     
5336     
5337 1 1 1 1 1
5338     
5339     
5340     
5341 1 1 1 1 1
5342     
5343     

Note that the outputs are not the same each time I run it.

@p-ranav p-ranav added the bug Something isn't working label Oct 16, 2019
@p-ranav p-ranav self-assigned this Oct 16, 2019
@benguela
Copy link

I also have this issue, is there a fix planned soon?

@p-ranav p-ranav added the help wanted Extra attention is needed label Feb 3, 2020
@nevion
Copy link

nevion commented Mar 5, 2020

@p-ranav perhaps it makes sense to make a single threaded/simpler version of the reader implementation and opt-in to the threaded with flags? Runtime or compile time. This issue kind of discouraged me.

@nevion
Copy link

nevion commented Mar 5, 2020

Just fooling around with these changes, it looks like the test passes with std or unordered_map on my computer - but going back to unordered_flat_map causes it to have blank records again so I'm of the opinion's there's race conditions going on

diff --git a/include/csv/reader.hpp b/include/csv/reader.hpp
index 56542d7..0793d3b 100644
--- a/include/csv/reader.hpp
+++ b/include/csv/reader.hpp
@@ -46,8 +46,13 @@ SOFTWARE.
 #include <iterator>
 #include <atomic>
 #include <string_view>
+#include <map>
 
 namespace csv {
+    template<typename K, typename V>
+    using map_type = std::map<K, V>;
+    //using map_type = std::unordered_map<K, V>;
+    //using map_type = unordered_flat_map<K, V>;
 
   class Reader {
   public:
@@ -121,16 +126,16 @@ namespace csv {
 
     bool ready() {
       size_t rows = 0;
-      number_of_rows_processed_.try_dequeue(rows);
-      row_iterator_queue_.try_dequeue(ready_index_);
-      bool result = (ready_index_ < expected_number_of_rows_ && ready_index_ < rows);
+      auto firstValid = number_of_rows_processed_.try_dequeue(rows);
+      auto secondValid = row_iterator_queue_.try_dequeue(ready_index_);
+      bool result = firstValid && secondValid && (ready_index_ < expected_number_of_rows_ && ready_index_ < rows);
       return result;
     }
 
-    unordered_flat_map<std::string_view, std::string> next_row() {
+    map_type<std::string_view, std::string> next_row() {
       row_iterator_queue_.enqueue(next_index_);
       next_index_ += 1;
-      unordered_flat_map<std::string_view, std::string> result;
+      map_type<std::string_view, std::string> result;
       rows_.try_dequeue(rows_ctoken_, result);
       return result;
     }
@@ -218,8 +223,8 @@ namespace csv {
       }
     }
 
-    std::vector<unordered_flat_map<std::string_view, std::string>> rows() {
-      std::vector<unordered_flat_map<std::string_view, std::string>> rows;
+    std::vector<map_type<std::string_view, std::string>> rows() {
+      std::vector<map_type<std::string_view, std::string>> rows;
       while (!done()) {
         if (ready()) {
           rows.push_back(next_row());
@@ -448,9 +453,9 @@ namespace csv {
     std::string filename_;
     std::ifstream stream_;
     std::vector<std::string> headers_;
-    unordered_flat_map<std::string_view, std::string> current_row_;
+    map_type<std::string_view, std::string> current_row_;
     std::string current_value_;
-    ConcurrentQueue<unordered_flat_map<std::string_view, std::string>> rows_;
+    ConcurrentQueue<map_type<std::string_view, std::string>> rows_;
     ProducerToken rows_ptoken_;
     ConsumerToken rows_ctoken_;
     ConcurrentQueue<size_t> number_of_rows_processed_;
@@ -473,7 +478,7 @@ namespace csv {
     ProducerToken values_ptoken_;
     ConsumerToken values_ctoken_;
     std::string current_dialect_name_;
-    unordered_flat_map<std::string, Dialect> dialects_;
+    map_type<std::string, Dialect> dialects_;
     Dialect current_dialect_;
     size_t done_index_;
     size_t ready_index_;

I noticed the try_dequeue's return bool but this is never checked. I'm also not sure why in the next_row pathways, and somehow we can have a ready that completes but next_row() return's an empty record from the concurrent queue.

@wdznak
Copy link

wdznak commented Mar 7, 2020

This code also results in the wrong answer.
0 instead of 1.

csv::Writer csvFile("Test.csv");
csvFile.configure_dialect()
    .delimiter(", ")
    .column_names("D", "O", "H", "L", "C", "V", "M");
csvFile.write_row("1", "2", "3", "4", "5", "6", "7");
csvFile.close();

csv::Reader csv;
csv.read("Test.csv");
auto rows = csv.rows();

cout << rows.size() << "\n";

If i write another row then the answer is correct (2).

Edit: I see that the issue is Closed but still exists. At least in a version provided by vcpkg.

@p-ranav
Copy link
Owner

p-ranav commented Apr 19, 2020

Hello,

I'm working on a second implementation of this library: https://github.com/p-ranav/csv2. The reader is ready for use. Check it out. Hopefully it works better. I'm planning to archive this repo in favor of csv2.

Sorry again for all the issues you've faced with this library.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants