Text classification in different languages #3250

the42 · 2023-03-15T16:03:46Z

the42
Mar 15, 2023

I followed the helpful tutorial on Text classification. It seems this type of classification assumes English input? What would the yaml configuration look like if my target language is german?

What languages does ludwig support in general?

arnavgarg1 · 2023-03-15T17:25:38Z

arnavgarg1
Mar 15, 2023
Collaborator

Hi @the42, the tutorial does assume the text input is in English, however, this is not a requirement!

Ludwig does support at least one text encoder out of the box that is multilingual: XLMRoberta. Here's a masking example I used that's in Hindi. I also tried an equivalent in Spanish.

Additionally, if you would like, you can use a pretrained multilingual text encoder from Huggingface by using the AutoTransformer text encoder and specifying the pretrained model/path from Huggingface.

Let me know if this helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text classification in different languages #3250

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Text classification in different languages #3250

the42 Mar 15, 2023

Replies: 1 comment

arnavgarg1 Mar 15, 2023 Collaborator

the42
Mar 15, 2023

arnavgarg1
Mar 15, 2023
Collaborator