Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows line breaks on Linux prevent removal of white space from end of line #256

Closed
bersbersbers opened this issue Mar 16, 2021 · 27 comments
Labels
Help-wanted I'd like assistance with this issue, please! implemented implemented tag means that either an enhancement or feature request has been implemented, or bugfix

Comments

@bersbersbers
Copy link

bersbersbers commented Mar 16, 2021

Using latexindent 3.9 from TeX Live on Linux.

original .tex code

printf "1 \n" | latexindent | tr " " "_"
printf "1 \n" | unix2dos | latexindent | tr " " "_"

actual/given output

1
1_

desired or expected output

1
1
@bersbersbers
Copy link
Author

bersbersbers commented Mar 16, 2021

In addition, I see different amounts of white space in empty lines when using or not using unix2dos - I am working on reproducing these as well.

@bersbersbers
Copy link
Author

Here we go:

printf "\\\begin{thebibliography}{00}\n\t\\\bibitem{b1} B1\n\n\t\\\bibitem{b2} B2\n\\\end{thebibliography}\n" | latexindent | tr "\t" "_"
printf "\\\begin{thebibliography}{00}\n\t\\\bibitem{b1} B1\n\n\t\\\bibitem{b2} B2\n\\\end{thebibliography}\n" | unix2dos | latexindent | tr "\t" "_"

Output:

\begin{thebibliography}{00}
_\bibitem{b1} B1

_\bibitem{b2} B2
\end{thebibliography}

\begin{thebibliography}{00}
_\bibitem{b1} B1
_
_\bibitem{b2} B2
\end{thebibliography}

Expected:
Twice the same. I don't care much about whether there should be tabs/spaces in empty lines, but I can say that black removes them:

> printf "if True:\n    pass\n    \n    pass\n" | black -q - | tr " " "_"
if_True:
____pass

____pass

@cmhughes
Copy link
Owner

I'm sorry for the delay.

To be honest, I'm confused by this and don't understand what it's about. I'll try and ask some questions at some point.

@bersbersbers
Copy link
Author

bersbersbers commented Mar 22, 2021

No problem, this is not an urgent issue at all. I fixed it by converting all my files using dos2unix once, and that fixed it. But I thought you might be interested in fixing it for other users, too.

What is this about? This is about getting a .tex file/template saved on Windows, then working with it on Linux. latexindent won't fully work in that scenario, leaving white space that it would otherwise remove.

I use unix2dos solely in the above MWEs to reproduce and demonstrate problem.

Feel free to ask for further clarification at any point in time ;)

@cmhughes
Copy link
Owner

OK, that's helpful, thanks very much. I'll follow up with you once I've understood this more.

@cmhughes cmhughes added the Help-wanted I'd like assistance with this issue, please! label Mar 27, 2021
@qiancy98
Copy link
Contributor

qiancy98 commented May 2, 2021

I do not know if this is a stupid idea:
Maybe we can change \r\n into \n in linux-environment first and then do our replacing?
I do think that this may not influent anything whether we use \r\n or \n.

@cmhughes
Copy link
Owner

@bersbersbers any ideas on this? The body is read in using Latexindent /FileExtention.pm,

https://github.com/cmhughes/latexindent.pl/blob/main/LatexIndent/FileExtension.pm

@bersbersbers
Copy link
Author

@cmhughes do you mean in terms of the what or the how?

As to the what, I would have no problem if latexindent normalized file endings (such as piping the whole file through dos2unix before reading - 's/\r\n/\n/g', basically), but I can see how people who expect latexindent to maintain line endings would not like that. One might also react differently to files with mixed line endings.

As to the how and the code snippet: I have no perl experience at all, so I don't know what these two lines do on different platforms:

open(MAINFILE, $fileName) or die "Could not open input file, $fileName";
push(@lines,$_) while(<MAINFILE>);

Do they strip the line endings or not? Completely or just what is expected on the platform? Etc.

@cmhughes
Copy link
Owner

Thanks, these are helpful questions. I'll look into them and report back.

@cmhughes
Copy link
Owner

How about if there is a switch, something like

NormaliseLineBreaks: 1/0

And it would do something like you describe : s/\r\n/\n/g....?

@bersbersbers
Copy link
Author

Yeah, why not - sounds good on Linux.

Would it do the same on Windows, or would it "normalise" to \r\n there? How about MacOS? Do people still use \n\r there?

@cmhughes
Copy link
Owner

These are good questions.

Instead of a new setting, for your use case, could you use the replacement switch and the following settings :

replacements:
  -
    substitution: s/\r\n/\n/sg

@bersbersbers
Copy link
Author

I'm sure I could. Then again, most people only need to do this once, so dos2unix is much easier.

My motivation to open this issue was rather to have you think about latexindents default behavior.

@cmhughes
Copy link
Owner

Would this feature need an option to operate on Linux-based line breaks and turn them into Windows-based line breaks?

@bersbersbers
Copy link
Author

Would this feature need an option to operate on Linux-based line breaks and turn them into Windows-based line breaks?

I think that depends on what happens without this option.

  • If latexindent's default behavior, especially on Windows, is to convert everything to Linux-based line break, then operate on Linux-based line break, them write Linux-based line breaks, then I would say yes.
  • If the default behavior is to convert all foreign line breaks to the OS's native line break, then operate on and write out these native line breaks, I don't think such an option is required.

There may be more ifs involved, but these are the first one that come to mind for me.

@cmhughes
Copy link
Owner

As of 74ee20c I've updated Document.pm to include a switch for this.

Can you grab copy Document.pm into your installation (or clone the develop branch) and then try your example with

latexindent.pl -y="dos2unixlinebreaks:1" myfile.tex

Does this do as you would like?

@bersbersbers
Copy link
Author

I'll gladly try (in three weeks, though).

@bersbersbers
Copy link
Author

Hmm - may I am doing something wrong, but I replaced Document.pm with the one from https://raw.githubusercontent.com/cmhughes/latexindent.pl/74ee20c4d6675094623ef2c748bc4e52a9566ae3/LatexIndent/Document.pm and I don't see a change:

$ printf "1 \n" | unix2dos | latexindent -y="dos2unixlinebreaks:1" | tr " " "_"
1_

Same with

$ printf "1 \n" | unix2dos | perl /path/to/latexindent.pl -y="dos2unixlinebreaks:1" | tr " " "_"
1_

Deleting that Document.pm file raises and error, so I am certainly replacing the correct file.

Is the above working for you? It should output 1 if it is working.

I am using TeX Live, if that matters.

@cmhughes
Copy link
Owner

If I start with

issue-256.tex

\begin{thebibliography}{00}
	\bibitem{b1} B1

	\bibitem{b2} B2
\end{thebibliography}

experiment 1

unix2dos issue-256.tex
latexindent.pl -s issue-256.tex -o +-mod1
file issue-256-mod1.tex
issue-256-mod1.tex: LaTeX document, ASCII text, with CRLF, LF line terminators

gives

issue-256-mod1.tex

\begin{thebibliography}{00}
	\bibitem{b1} B1
	
	\bibitem{b2} B2
\end{thebibliography}

experiment 2

Running

latexindent.pl -s -y="dos2unixlinebreaks:1" issue-256.tex -o +-mod2
file issue-256-mod2.tex
issue-256-mod2.tex: LaTeX document, ASCII text

gives

issue-256-mod2.tex

\begin{thebibliography}{00}
	\bibitem{b1} B1
	
	\bibitem{b2} B2
\end{thebibliography}

Can you verify you receive the same outputs? Are these as expected/desired? Could you let me know how to test your example using the template above?

@bersbersbers
Copy link
Author

bersbersbers commented Jun 17, 2021

Can you verify you receive the same outputs?

Yes, I do. But I don't see any difference between the two outputs that you show, so I wonder how this example helps to verify the issue.

Are these as expected/desired?

I think so, but they have little to do with the issue ("Windows line breaks on Linux prevent removal of white space from end of line").

Could you let me know how to test your example using the template above?

Add spaces to the ends of lines:

sed -i "s/\s*$/   /" issue-256.tex
unix2dos issue-256.tex
cat issue-256.tex | tr " " "_"

latexindent -s issue-256.tex -o +-mod1
latexindent -s -y="dos2unixlinebreaks:1" issue-256.tex -o +-mod2

file issue-256-mod1.tex
cat issue-256-mod1.tex | tr " " "_"

file issue-256-mod2.tex
cat issue-256-mod2.tex | tr " " "_"

I receive this as an output (twice) - the ___ (triple underscores) indicate spaces at the ends of lines which are not removed:

\begin{thebibliography}{00}
        \bibitem{b1}_B1___

        \bibitem{b2}_B2___
\end{thebibliography}

(The output files do have different line endings as shown by you, so the -y="dos2unixlinebreaks:1" seems to be active, so my exchange of Document.pm seems to have worked.)

@cmhughes
Copy link
Owner

cmhughes commented Jun 17, 2021 via email

@bersbersbers
Copy link
Author

I had not, in fact, because my original code in #256 (comment) did not use that, either - but same thing:

sed -i "s/\s*$/   /" issue-256.tex
unix2dos issue-256.tex
cat issue-256.tex | tr " " "_"

latexindent -s -y="removeTrailingWhitespace:beforeProcessing:1;afterProcessing:1" issue-256.tex -o +-mod1
latexindent -s -y="removeTrailingWhitespace:beforeProcessing:1;afterProcessing:1,dos2unixlinebreaks:1" issue-256.tex -o +-mod2

file issue-256-mod1.tex
cat issue-256-mod1.tex | tr " " "_"

file issue-256-mod2.tex
cat issue-256-mod2.tex | tr " " "_"

gives (twice)

\begin{thebibliography}{00}
        \bibitem{b1}_B1___

        \bibitem{b2}_B2___
\end{thebibliography}

@bersbersbers
Copy link
Author

To stay with my original example:

This works (1):

printf "1 \n" | latexindent | tr " " "_"

This does not (1_):

printf "1 \n" | unix2dos | latexindent -y="dos2unixlinebreaks:1" | tr " " "_"

Tell me if this should be working, then I'll investigate further if my installation is correctly patched. If this is not working for you, either, the issue is not fixed.

cmhughes added a commit that referenced this issue Jun 17, 2021
@cmhughes
Copy link
Owner

Ok, as of 954d88c I think this works as you would like. If you grab Document.pm from the develop branch, then your following example, and others, should work as desired

printf "1 \n" | unix2dos | latexindent.pl -y="dos2unixlinebreaks:1" | tr " " "_"

produces 1.

Is this as you would like?

@bersbersbers
Copy link
Author

Is this as you would like?

Yes! I cannot test the current Document.pm ("store_switches" is not exported by the LatexIndent::Switches module), but I trust that this is working and can test more after the release.

@cmhughes
Copy link
Owner

OK. Feel free to grab Switches.pm, or to clone develop.

For the moment, I'll label this as resolved. It'll be part of the next release, coming soon hopefully.

@cmhughes cmhughes added the implemented implemented tag means that either an enhancement or feature request has been implemented, or bugfix label Jun 18, 2021
cmhughes added a commit that referenced this issue Jun 18, 2021
@cmhughes cmhughes mentioned this issue Jun 19, 2021
@cmhughes
Copy link
Owner

Resolved as of #278, upload to ctan to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help-wanted I'd like assistance with this issue, please! implemented implemented tag means that either an enhancement or feature request has been implemented, or bugfix
Projects
None yet
Development

No branches or pull requests

3 participants