Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the DeepL split text correct? #82

Closed
swfsql opened this issue Apr 30, 2023 · 3 comments · Fixed by #83
Closed

Is the DeepL split text correct? #82

swfsql opened this issue Apr 30, 2023 · 3 comments · Fixed by #83

Comments

@swfsql
Copy link

swfsql commented Apr 30, 2023

I believe that deepl split text should splits the text into an array of text lines.

Currently it appears that the regex doesn't splits by line, and the regex usage doesn't change the text variable.

But I'm not sure if I'm confusing the calls to LMT_split_into_sentences vs LMT_split_text (I noticed that using deepl in the browser, it appears to use the later).

I made a test in the browser, translating "こんにちは世界" and "こんにちは" in two separated lines, and in case it helps, it made the calls:

An option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_split_text:

{
    "jsonrpc":"2.0",
    "method" : "LMT_split_text",
    "params":{"texts":["こんにちは世界","こんにちは"],
    "commonJobParams":{"mode":"translate"},
    "lang":{
        "lang_user_selected":"JA",
        "preference":{
            "weight":/*..*/,
            "default":"default"}
        }
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0",
    "id":/*..*/,
    "result":{
        "lang":{"detected":"JA","isConfident":true,"detectedLanguages":{"JA":1.0}},
        "texts":[
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F\u4E16\u754C"}]}]},
            {"chunks":[{"sentences":[{"prefix":"","text":"\u3053\u3093\u306B\u3061\u306F"}]}]}
        ]
    }
}

Then an option and then a post request to https://www2.deepl.com/jsonrpc?method=LMT_handle_jobs:

{
    "jsonrpc":"2.0",
    "method": "LMT_handle_jobs",
    "params":{
        "jobs":[
            {
                "kind":"default",
                "sentences":[{"text":"こんにちは世界","id":0,"prefix":""}],
                "raw_en_context_before":[],
                "raw_en_context_after":["こんにちは"],
                "preferred_num_beams":1
            },
            {
                "kind":"default",
                "sentences":[ {"text":"こんにちは","id":1, "prefix":""}],
                "raw_en_context_before":["こんにちは世界"],
                "raw_en_context_after":[],
                "preferred_num_beams":1
            }
        ],
        "lang":{"preference":{"weight":{},"default":"default"},"source_lang_computed":"JA","target_lang":"EN"},
        "priority":1,
        "commonJobParams":{"regionalVariant":"en-US","mode":"translate","browserType":1},
        "timestamp":/*..*/
    },
    "id":/*..*/
}

With the response:

{
    "jsonrpc":"2.0","id":/*..*/,
    "result":{"translations":[
        {"beams":[{"sentences":[{"text":"Hello World","ids":[0]}],"num_symbols":3}],"quality":"normal"},
        {"beams":[{"sentences":[{"text":"Hello world","ids":[1]}],"num_symbols":3}],"quality":"normal"}],
    "target_lang":"EN","source_lang":"JA","source_lang_is_confident":false,"detectedLanguages":/*..*/}
}

(for some reason it incorrectly translated both lines to Hello World, but it gave a beam for each line)

I hope this helps in case there is something missing! (I would make more tests but I'm getting "Too Many Requests" error, so I guess my IP was banned for trying invalid requests).

@Animenosekai
Copy link
Owner

Thanks for reporting this issue !

Seems like #83 closed it. Let me know if you want to reopen 👍

@swfsql
Copy link
Author

swfsql commented May 2, 2023

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

SENTENCES_SPLITTING_REGEX.split(text), None

@Animenosekai
Copy link
Owner

@Animenosekai Thanks!

On a new note, I think the regex is not being used, as it's return is not being stored in any variable, nor it's being used as a function return:

SENTENCES_SPLITTING_REGEX.split(text), None

Oh that's right, I should look into it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants