Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c++ regex_match native Ubuntu vs WSL #4046

Closed
T4lus opened this issue May 17, 2019 · 2 comments
Closed

c++ regex_match native Ubuntu vs WSL #4046

T4lus opened this issue May 17, 2019 · 2 comments

Comments

@T4lus
Copy link

T4lus commented May 17, 2019

I have some strange behavior while using both Ubuntu and Ubuntu on Win 10 (WSL). Both versions are 18.04 and GCC, and I'm compiling with these flags: -std=c++11 -g.

My problem is that I have this regular expression: ^[\\t ]*(?:([.A-Za-z0-9_]+[:]))?(?:[\\t ]*([A-Za-z]{2,4})(?:[\\t ]+(@[A-Za-z0-9_]+(?:(?:\\+|-)[0-9]+)?|\".+?\"|\'.+?\'|[.A-Za-z0-9_]+)(?:[\\t ]*[,][\\t ]*(@[A-Za-z0-9_]+(?:(?:\\+|-)[0-9]+)?|\".+?\"|\'.+?\'|[.A-Za-z0-9_]+))?(?:[\\t ]*[,][\\t ]*(@[A-Za-z0-9_]+(?:(?:\\+|-)[0-9]+)?|\".+?\"|\'.+?\'|[.A-Za-z0-9_]+))?)?)?

for matching some assembler-like string (yeah, I know it's long).

I'm using it like this :

    std::ifstream infile(this->inFile.c_str());
	while (std::getline(infile, line)) {
		lineNumber++;
		line = line.substr(0, line.find("#"));
		if (line == "")
			continue;
		line = reduce(line);

		if (regex_match(line, m, op_reg)) {
			std::vector<std::string> ops;
			ops.push_back(std::to_string(lineNumber));
		    for (int i = 1; i < m.size(); i++) {
		    	if (m[i] != "") {
		    		ops.push_back(m[i]);
		    	}
		    }
		    this->maps.opMap.push_back(ops);
		}
	}

This piece of code is working well on my native Ubuntu installation, but on the WSL version of Ubuntu regex_match always returns false.

Have some of you already encountered this? And if so, how have you managed it ?

@T4lus
Copy link
Author

T4lus commented May 18, 2019

I found the issue is that even in WSL we have \r\n line ending style

@therealkenc
Copy link
Collaborator

therealkenc commented May 19, 2019

I found the issue is that even in WSL we have \r\n line ending style

Neither WSL/RealLinux kernel nor the Windows kernel "have \r\n" line ending style. Your input file has a line ending style. This snippet will behave the same on Real Linux and WSL given the same input.

You'll need more code to handle files with differing eol conventions, if that's a design goal. Bonus points for handling all the charset eol conventions, a common one on Windows being UTF16. [Which is needed if your program is going to consume assembler like strings containing characters other than 'merican.]

how have you managed it?

Short answer is strip the \r chars from the ifstream before handing off to std::getline(ins, line, '\n'). Pull the whole file into a std::stringstream is a big hammer way to do that. Boost iostreams are another way to go.

Long answer is use a library like libicu because you can't even assume a [.A-Za-z0-9_] regex has much meaning let alone the file's eol convention. [Shorter long answer: you don't want the long answer.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants