You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently a token is internally represented by <tei:w>, a punctuation character by <tei:pc> – which is a hard-coded assumption. What might be conceptually right can lead to unexpected behaviour in some edge cases:
In the current state, we cannot output <tei:w> as a structure.
We don't want to confuse such elements introduced by the tokenizer with ones already present in the markup of the source document (e.g. hy<tei:pc>-</tei:pc><lb break="no"/>phenation)
Solution: introduce <xtx:w> and <xtx:pc> elements. Moving them into the TEI namespace should be a serialization option at the end of the process.
The text was updated successfully, but these errors were encountered:
Currently a token is internally represented by
<tei:w>
, a punctuation character by<tei:pc>
– which is a hard-coded assumption. What might be conceptually right can lead to unexpected behaviour in some edge cases:<tei:w>
as a structure.hy<tei:pc>-</tei:pc><lb break="no"/>phenation
)Solution: introduce
<xtx:w>
and<xtx:pc>
elements. Moving them into the TEI namespace should be a serialization option at the end of the process.The text was updated successfully, but these errors were encountered: