-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docx comments on tracked-changes insertions not handled properly #9833
Comments
Here we have <w:ins w:id="3" w:author="Daniel Steinbrook" w:date="2024-05-31T19:14:00Z">
<w:r>
<w:t xml:space="preserve">This is some
</w:t>
</w:r>
<w:commentRangeStart w:id="4"/>
<w:r>
<w:t xml:space="preserve">added
</w:t>
</w:r>
</w:ins>
<w:commentRangeEnd w:id="4"/> As you can see, the w:commentRangeStart appears inside a closed w:ins element. I suspect that's the problem, but I'd have to look more closely at the docx parser. (I'm not familiar with the comment parsing code and the original author is no longer active.) |
Hmm, maybe there's something weird about that particular document with the comment spanning the insertion boundary. But, the broader problem is that comments within the insertion do not show up in the converted output. Here's another examplewith two comments, where only one comment (the one not in the tracked-changes insertion) is preserved after pandoc conversion. I was expecting that both comments would be included when |
Here's some relevant code, I think: |
With this diff
I get the following native output from your sample. Can you confirm that this is correct?
|
That is potentially correct, to the best of my limited knowledge of how the insertion spans should be arranged. It certainly includes the full comment text within the insertion. Thanks for the impressively quick patch! When might this become part of an official pandoc build? |
It will be in the next release. Not sure when that will be, but if you're desperate you can compile from source or use a nightly. |
When a Word comment applies to text that is an unaccepted track-changes insertion, that comment is not included in the pandoc output.
Here is a minimal example file, with two paragraphs, each with a single comment. One paragraph is normal text while the other paragraph is an insertion using track changes.
Converting to markdown (for example), only the comment from the first paragraph is preserved. Oddly, there are still two
comment-end
spans.$ pandoc comments_test.docx -t markdown --track-changes=all
When going roundtrip from docx to json and back again, Word reports errors when opening the resulting docx.
$ pandoc -f json -t docx -o comments_test_roundtrip.docx <(pandoc comments_test.docx -t json --track-changes=all)
"Word found unreadable content in comments_test_roundtrip.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."
If you click Yes, the resulting document has two comments, but the second comment has no content or metadata.
Using pandoc 3.2 on macOS Sonoma 14.5.
The text was updated successfully, but these errors were encountered: