Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragment files with \r\n line endings get written out as \r\r\n #524

Open
bennyrowland opened this issue Jun 22, 2023 · 2 comments · May be fixed by #525
Open

Fragment files with \r\n line endings get written out as \r\r\n #524

bennyrowland opened this issue Jun 22, 2023 · 2 comments · May be fixed by #525

Comments

@bennyrowland
Copy link

I am using the https://github.com/sphinx-contrib/sphinxcontrib-towncrier Sphinx extension which embeds the latest towncrier draft changelog into the built changelog. This works nicely but there is an odd effect that when a changelog fragment has multiple lines separated by \r\n line endings, the draft changelog gets written with those line endings replaced by a double carriage return \r\r\n then reading that back in Python converts it to \n\n so that each line of the fragment gets its own paragraph in the rendered changelog.

This effect seems to be caused by mixing binary reading of fragments with text output, as alluded to in this issue #420 by @hynek. I should note that this is all on Windows. Python has a rather weird behaviour trying to handle newlines in text because it forces all newlines to be \n internally. So reading a file as text it will convert any \r\n newlines into simply \n. Correspondingly, when it writes out the string as text it will automatically revert the \n to a \r\n string. But if the file is read in binary then it will keep the \r\n form for all newlines, then when it is written out as text the \n is converted to \r\n, leaving the written bytes as \r\r\n.

You can reproduce this with this very simple example code:

with open("example.txt", "w") as fout:
    fout.write("\r\n")
with open("example.txt", "rb") as fin:
    print(fin.read())  # prints "\r\r\n"
with open("example.txt", "r") as fin:
    print(fin.read().encode("utf8"))  # prints "\n\n"

Of course, when the content is written straight to the console then you don't see this issue, so it is normally fine for --draft, but when the content is read back in to Python for further use then it is a problem (on Windows). The simplest solution to this problem would be to encode the draft output before echoing it here:

click.echo(content)

click can handle writing the bytes just as easily as a str, but this preserves the newlines without adding the extra \r and makes the problem go away. I can't see any obvious side effects for any other use case of doing this encode step. Happy to put together a PR if desired, although this is only a 1 line change.

@adiroiban
Copy link
Member

Many thanks for the report and analysis.

This functionality looks very useful :)

I am not familiar with click...but if click can receive bytes and then do whatever it takes to get those bytes printed in a Windows terminal, then this should be ok


At the same time, I think that the draft functionality, was really intended only for human consumption :) and not as an API / inter process protocol

maybe we can have something like draft --out /path/to/some-file and it will write into bytes...

but why do you use draft ? (just asking)

I think that now there is also build --keep as implement here #453
Would that work for you?

@bennyrowland
Copy link
Author

@adiroiban thanks for the feedback. I should point out that I am just a user of sphinxcontrib-towncrier, I was going to open an issue with them but thought I would try and figure out the problem and it lead me back up to towncrier.

I think it would be possible to use the build --keep option in this case, although it would be rather less convenient because rather than accepting the stdout from subprocess.run() calling towncrier --build you would have to figure out where the output file is and what it is called (which I guess can vary depending on user configuration), read it in and then delete it when you are done. I also see that in the conversation on #453 you say "why do this when there is already --draft?" ;-)

I have made the change to click.echo(content.format("utf8")) locally and the standard user experience of calling build --draft is unchanged (I guess unless you were running on classic Mac OS which uses \r as the newline character). I think basically the right thing to do is be consistent when reading/writing in Python: either use text or binary but don't mix between them. I have also found that when writing into the actual news file in non-draft mode, the file is explicitly opened with newline="", presumably because this was previously causing problems there too. That is not an option when the output is being echoed to stdout via click, so I think that echoing the binary data instead is probably the "correct" fix to match the non-draft behaviour. I will put together a PR for you to consider, at least.

@bennyrowland bennyrowland linked a pull request Jun 23, 2023 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants