Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for ReadOnlySpan for efficient values handling #1

Open
VitaliyMF opened this issue Nov 28, 2018 · 4 comments
Open

Add API for ReadOnlySpan for efficient values handling #1

VitaliyMF opened this issue Nov 28, 2018 · 4 comments
Labels
enhancement New feature or request

Comments

@VitaliyMF
Copy link
Contributor

In .NET Core 2.1 and upcoming netstandard 2.1 ReadOnlySpan can be used for zero-allocation handling of string values - without need to create 'string' object from the buffer.

To support this it is enough to add ProcessValueInBuffer overload or maybe add method ReadOnlySpan<char> GetSpanValue(int idx).

@VitaliyMF VitaliyMF added the enhancement New feature or request label Nov 28, 2018
@skyyearxp
Copy link

skyyearxp commented May 5, 2020

use ReadOnlySpan can speed up 10% more.

NOTE: need to be run in !!!Release Mode!!!

		public ReadOnlySpan<char> GetReadOnlySpan(int idx)
		{
			if (idx >= fieldsCount) throw new IndexOutOfRangeException();

			var f = fields[idx];

			if ((f.Quoted && f.EscapedQuotesCount > 0) || f.End >= bufferLength)
			{
				var chArr = f.GetValue(buffer).ToCharArray();
				return new ReadOnlySpan<char>(chArr, 0, chArr.Length);
			}
			else if (f.Quoted)
			{
				return new ReadOnlySpan<char>(buffer, f.Start + 1, f.Length - 2);
			}
			else
			{
				return new ReadOnlySpan<char>(buffer, f.Start, f.Length);
			}
		}


need to be run in !!!Release Mode!!!

@VitaliyMF
Copy link
Contributor Author

@skyyearxp have you performed any performance tests?

@skyyearxp
Copy link

yes,but the performance is not good enough, i am processing the simple csv file. so i am trying to use the simplest way to handle the csv data. the csv file may be 6GB with time format string, double string, int string, i am trying parse int/double by myself. read buffer, scan buffer, find ',' and parse int/double when needed, when meet '\r''\n' then new line.

@VitaliyMF
Copy link
Contributor Author

@skyyearxp NReco.Csv parser efficiency should be close to max possible performance of CSV parsing (that handles all valid CSVs) that is possible with C# / single thread. Usage of CSV column value accessor that returns ReadOnlySpan<char> should avoid unnecessary allocations, but most likely processing time will not change significantly.

Have you tried to increase buffer size (CsvReader.BufferSize = 32kb by default)? Also, performance of the underlying TextReader is also very important. For example, if you know that your CSV doesn't contain Unicode chars, do not use UTF8 encoding and use ASCII instead + try to wrap your input stream with BufferedStream with rather large buffer size.

If parse speed is still not acceptable, only way is to use multi-threaded implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants