-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding of "Numeral symbols other than decimal digits" (EGD 4.2.2) #39
Comments
Dear Arlo, As far as I know, the numerals in Khmer corpus are not written with decimal system, except dates. Salomé and Chloé may confirm this. Best, |
Yes, the above notes conform to our encoding guidelines. |
in that case, @michaelnmmeyer, please wrap in |
This is addressed in e71eaed. There remains a number of occurrences to check and correct manually, to wit:
|
Thanks. I have converted the above into a task list and will take car of it. |
@chhomkunthea : I don't understand the cases
|
|
Dear Arlo, In the case of K. 915, I would like to propose below:
And for K. 1017, it should be:
|
@chhomkunthea : thanks. I have implemented your suggestion in K. 915 (or rather cleaned up the file which had some conflicts after you had implemented your suggestions). |
I think I would prefer |
@michaelnmmeyer — in tfc-khmer-epigraphy, there is a massive number of
<num>
elements whose contents are made up of symbols other than decimal digits that have not been wrapped in<g type="numeral">
by the responsible encoder(s) as EGD 4.2.2 prescribes. Examples:<num value="1">I</num>
should be<num value="1"><g type="numeral">I</g></num>
<num value="4">IIII</num>
should be<num value="4"><g type="numeral">IIII</g></num>
<num value="123">100 20III</num>
should be<num value="123"><g type="numeral">100</g><g type="numeral">20</g><g type="numeral">III</g></num>
There are also cases like
<num value="80">80</num>
which look like they contain decimal digits but where the transliteration is probably a representation of a non-decimal notation system, and so ought to be<num value="80"><g type-numeral>80</g></num>
(as in the 123 example above). But there is no way for a machine to tell that these are not decimal units.@chhomkunthea : do we ever have numbers noted with the decimal system outside of dates in the Khmer corpus? If we do not, then all such cases can automatically be converted to the encoding with
<g>
. You seem to have ignored EGD 4.2.2 so far. Please re-read it carefully.Can you process the xml files and apply
<g>
wherever an algorithm can determine that the contents of<num>
is not (explusively) a series of decimal digits?@danbalogh : please correct me if I have made any mistake in my representation of our encoding rules.
@chloechollet and @salomepichon: please take note of the above if you weren't aware of the rules yet.
The text was updated successfully, but these errors were encountered: