You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's convert the .docx to Markdown with pandoc --from docx --to markdown+table_captions table-mwe.docx.
Observed Output
The caption is connected to the first table instead of the second table.
Lorem ipsum
-----------------------------------------------------------------------
A B
----------------------------------- -----------------------------------
C D
-----------------------------------------------------------------------
: Numbers from 1 to 4
Lorem ipsum
-----------------------------------------------------------------------
1 2
----------------------------------- -----------------------------------
3 4
-----------------------------------------------------------------------
Lorem ipsum
Expected Output
Lorem ipsum
-----------------------------------------------------------------------
A B
----------------------------------- -----------------------------------
C D
-----------------------------------------------------------------------
Lorem ipsum
-----------------------------------------------------------------------
1 2
----------------------------------- -----------------------------------
3 4
-----------------------------------------------------------------------
: Numbers from 1 to 4
Lorem ipsum
Environment
pandoc --version returns
pandoc 3.1.11.1
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /home/raniere/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
The text was updated successfully, but these errors were encountered:
OK, it looks like the docx reader does the following:
in bodyToOutput: looks for all the captions among the body paragraphs and puts a list of them in state
in bodyPartToBlocks for Tbl: gets the list of captions from state, takes the first one, and modifies state to contain the rest
So the captions are assigned to tables in the order they occur, no matter their proximity to the tables. Obviously that's giving bad results in this case, but it is a bit tricky to devise better heuristics.
Consider table-mwe.docx that has two tables:
Let's convert the
.docx
to Markdown withpandoc --from docx --to markdown+table_captions table-mwe.docx
.Observed Output
The caption is connected to the first table instead of the second table.
Expected Output
Environment
pandoc --version
returnsThe text was updated successfully, but these errors were encountered: