Is the DeepL split text correct? #82

swfsql · 2023-04-30T04:29:19Z

I believe that deepl split text should splits the text into an array of text lines.

Currently it appears that the regex doesn't splits by line, and the regex usage doesn't change the text variable.

But I'm not sure if I'm confusing the calls to LMT_split_into_sentences vs LMT_split_text (I noticed that using deepl in the browser, it appears to use the later).

I made a test in the browser, translating "こんにちは世界" and "こんにちは" in two separated lines, and in case it helps, it made the calls:

An option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_split_text:

{
    "jsonrpc":"2.0",
    "method" : "LMT_split_text",
    "params":{"texts":["こんにちは世界","こんにちは"],
    "commonJobParams":{"mode":"translate"},
    "lang":{
        "lang_user_selected":"JA",
        "preference":{
            "weight":/*..*/,
            "default":"default"}
        }
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0",
    "id":/*..*/,
    "result":{
        "lang":{"detected":"JA","isConfident":true,"detectedLanguages":{"JA":1.0}},
        "texts":[
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F\u4E16\u754C"}]}]},
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F"}]}]}
        ]
    }
}

Then an option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_handle_jobs:

{
    "jsonrpc":"2.0",
    "method": "LMT_handle_jobs",
    "params":{
        "jobs":[
            {
                "kind":"default",
                "sentences":[{"text":"こんにちは世界","id":0,"prefix":""}],
                "raw_en_context_before":[],
                "raw_en_context_after":["こんにちは"],
                "preferred_num_beams":1
            },
            {
                "kind":"default",
                "sentences":[ {"text":"こんにちは","id":1, "prefix":""}],
                "raw_en_context_before":["こんにちは世界"],
                "raw_en_context_after":[],
                "preferred_num_beams":1
            }
        ],
        "lang":{"preference":{"weight":{},"default":"default"},"source_lang_computed":"JA","target_lang":"EN"},
        "priority":1,
        "commonJobParams":{"regionalVariant":"en-US","mode":"translate","browserType":1},
        "timestamp":/*..*/
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0","id":/*..*/,
    "result":{"translations":[
        {"beams":[{"sentences":[{"text":"Hello World","ids":[0]}],"num_symbols":3}],"quality":"normal"},
        {"beams":[{"sentences":[{"text":"Hello world","ids":[1]}],"num_symbols":3}],"quality":"normal"}],
    "target_lang":"EN","source_lang":"JA","source_lang_is_confident":false,"detectedLanguages":/*..*/}
}

(for some reason it incorrectly translated both lines to Hello World, but it gave a beam for each line)

I hope this helps in case there is something missing! (I would make more tests but I'm getting "Too Many Requests" error, so I guess my IP was banned for trying invalid requests).

The text was updated successfully, but these errors were encountered:

Animenosekai · 2023-04-30T17:13:28Z

Thanks for reporting this issue !

Seems like #83 closed it. Let me know if you want to reopen 👍

swfsql · 2023-05-02T05:15:22Z

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

translate/translatepy/translators/deepl.py

Line 122 in 11650ca

SENTENCES_SPLITTING_REGEX.split(text), None

Animenosekai · 2023-05-06T21:59:21Z

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

translate/translatepy/translators/deepl.py

Line 122 in 11650ca

SENTENCES_SPLITTING_REGEX.split(text), None

Oh that's right, I should look into it !

hiteshbedre mentioned this issue Apr 30, 2023

Text splitting: DeepL Text Split on white space character #83

Merged

Animenosekai closed this as completed in #83 Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the DeepL split text correct? #82

Is the DeepL split text correct? #82

swfsql commented Apr 30, 2023 •

edited

Loading

Animenosekai commented Apr 30, 2023

swfsql commented May 2, 2023

Animenosekai commented May 6, 2023

Is the DeepL split text correct? #82

Is the DeepL split text correct? #82

Comments

swfsql commented Apr 30, 2023 • edited Loading

Animenosekai commented Apr 30, 2023

swfsql commented May 2, 2023

Animenosekai commented May 6, 2023

swfsql commented Apr 30, 2023 •

edited

Loading