-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write and run automation flows in different human languages #68
Comments
hi kensoh, I have been looking at the same with my fork of tagui. I'm in Switzerland, where there are mainly 4 languages used..swiss-german, french, italian and English. Using the php classes, we can use out-of-the-box aliases. Using the tap.class.php file as an example, you have the following.
By simply extending the class_alias, you can have:
Where: Obviously that is scalable for other languages and logic can be added to handle the operators for some other steps. But the concept is simple enough. |
Thanks Gus for your inputs! In above implementation, adding a new language would require updating all the individual step files for TagUI. And it can only be done by a developer. I'm looking at another direction, maybe a .csv file for each language or a combined .csv file for all languages.
Probably one file per language is good in that individual languages can develop on its own optimal speed by users. This way, users can easily update their .csv file to correct words for their native languages and they don't have to be developers to do that. But for this implementation, I'll have to write a separate engine to do the translation. That's ok I guess, and shouldn't be difficult. With that engine, a bonus is users can do translation of TagUI scripts from 1 language to another. For example, someone writing an automation flow in Chinese can translate it to Irish and it still works (of course, the supporting language .csv file definitions have to be there) Also tagging the other issue which you shared your major refactoring implementation of separating steps into separate PHP class files, a lot of interesting ideas there - https://github.com/tebelorg/TagUI/issues/51 Another challenge that comes with multiple languages will be language-specific issues. For example, Hebrew reads from right to left, and some languages the order of noun and verb is opposite of English and languages with Germanic roots. I don't think it is easy to handle those. But this idea of multi-language flow files is so interesting to me that I'll go ahead with implementing some working cases, and users or myself can improve it later on if this use case is meaningful to them. |
Can I suggest you look at how .po or .pot files are used instead of .csv as a possible alternative? the advantage is you can, long term, handle ALL text strings, steps/intents, debug messages etc, in multiple languages within your application and it is simple for non-devs to tweak the languages. I've worked on projects in 80+ languages, including arabic, korean, hebrew, russian etc. as well as the standard european and it is not difficult to rejig your parsing logic based on a base language key ( using the ISO 639-1 Code) e.g. en for english, ar for arabic to handle right-to-left. in simple PHP logic, it might look something like this:
|
Thanks Gus for your suggestion, adding on a link with more info for PO files. Probably I'll still stick with .csv but will look more into this. Using .po file type is a standard for developers / translators, but most users' laptops won't have the software to view / edit those files. This adds another layer of friction for a user to modify the language config file. My hunch now is deploying as .po will increase the time to roll-out this new feature, while making harder for a non-developer/translator user to modify. Possible benefits may be if there are existing .po file assets which can be made use of to facilitate building the language definitions. Ideally if possible, I would want to release v3.0 with framework for multi-language + machine-learning integration with Yandex CatBoost by next weekend. Don't know if that's possible! |
you can use notepad or TextEdit to edit .po files Kensoh or vi. furthermore, for tagui users who want to use multilingual capabilities, it's highly likely they will (a) be familiar with .po files and (b) probably already have one of the many open source .po editors available. hth |
Adding on with a link to the 10 most spoken languages in the world. Because I'm from Singapore (south-east asia), I would really hope to at least see Chinese, Hindi, Japanese, Malay and maybe Russian languages support. For the release, at least there should be Chinese (manually verified), and maybe Google-translated versions of these languages used in my region. The .csv definition for Chinese is done, can start coding the translation engine and look for problems to solve to implement this. |
- initial working commit for translation engine - to translate to and from a different language - base on language definition .csv spreadsheet - created simple language file for chinese - simple logic for this initial skeleton - probably need to consider restricting accidentally replacing identifiers, and many other considerations - after refining, integrate into tagui execution flow
Made initial working commit for translation engine with following comments -
At the moment, this skeleton translation engine can translate below example flow from A. Chinese to B. English and also from B to A. This opens up the use case where an english flow file can be translated for sharing, updating or automation execution by non-english speaking users. It also means with the language definition files ready, a Chinese flow file can potentially be translated into a working French flow file for example (by using English as the intermediary reference language). A. sample chinese flow
B. sample translated flow
|
- 2nd iteration reduces mistakes by translating contextually - 1st iteration is simply search and replace matches - this iteration uses TagUi’s internal model of TagUI steps and default syntax in english to make translations only when the translation fits the internal model - for example, translating the first phrase/keyword only if the translated word is part of TagUI steps - eg translating conditions only if that flow line starts with condition-starting keywords - eg translating separators keywords only if they are valid separators for that corresponding step - with this commit translating to and from different languages can retain accuracy much better
Committed 2nd iteration with following comments -
Should be ready to start integrating into tagui / tagui.cmd execution flow, whether by file naming convention, configuration, option, and whether to create additional translate / translate.cmd to support direction translation of automation flow files from language X to language Y (eg Chinese to Hindi). 3rd iteration, added below commit with following comments -
4th iteration, added another commit below with following notes -
|
- adding translation logic for helper functions such as title(), url(), text(), timer(), count(), present(), visible() - these functions can be useful in conditions, check step, variable assignments, or some steps such as echo / dump / write - with the translation logic, users can call these functions in their native language by configuring the language definition file
- with this commit, supporting components are there to allow taking automation flow in multiple languages and showing execution in other languages - ported translation engine from tagui_parse.php to tagui_header.js - flow language display can be controlled by the variable flow_language - open the way for showing automation execution in different languages - updated language definitions to windows format .csv - added english and template definitions that can be used for translations or reference
- move flow_language variable from tagui_header.js to tagui_config.txt for easy configuration - fix regression on test mode string replacement due to multi-language change in tagui_header.js
- completed translation engine for TagUI, more details here - https://github.com/tebelorg/TagUI/issues/68 - working multi-language flow support for 21 languages (english and chinese language definitions are manually edited, 19 others are automatically created) - created language build automation flow that ‘self-builds’ language definition csv files by using google translate - use tagui_language variable in tagui_config.txt to set the flow file language - use variable tagui_language instead of flow_language to track execution output language - parsing engine (tagui_parse.php) to use translation engine (translate.php) during initial parsing - let translate.php accept internal calls from tagui_parse.php during parsing before execution - log translation of other languages to english reference language for easier troubleshooting - various improvements and bug fixes in translation engine - print step to print output on the next line instead of combining into 1 line, for consistency with other steps behavior
5th iteration commit with following notes -
|
Working feature in master branch (can try by overwriting your existing packaged installation). Above comment has more details on the 21 languages (mostly auto-built using google translate).
Below are some of the stuffs that can be done. Taking in flow files written in different native languages. Running them and displaying the steps in the same or other languages. Translating flow files to and from english, etc. Of course, most of the language definitions are automatically self-built, and would be wrong without understanding UI interaction context. Native language users can update the language definition csv themselves or submit PRs with correct words to be used. original automation flow file in chinese
running with tagui_language = 'chinese' in tagui_config.txt
add tagui_language = 'russian' in 2nd line of flow file and run again
from command prompt, php translate.php flow_file from chinese
from command prompt, php translate.php flow_file_translated to french
Though this multi-language flow support only takes a few days to implement, it is one of the most interesting features of TagUI to me. It somewhat shows TagUI as a DSL (domain specific language) for UI interaction that can be easily translated to and from different languages. For creation, maintenance and execution of automation flows. Really like this possibility of hyper-localization. Now can switch back to experimenting on AI / ML integration with Yandex CatBoost. |
Published TagUI v3.0 - Native Languages Release, keeping issue open for related issues. Usage details in readme - https://github.com/tebelorg/TagUI#native-languages |
Note to track new TagUI steps not yet included in language definitions -
(some of these new steps are irrelevant to have multi-language functionality) |
updated - feature implemented (more details at bottom of thread)
The languages are Bengali, Chinese, English, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Tagalog, Tamil, Thai, Vietnamese. A user can easily automate building a new native language definition by using this TagUI automation flow that builds the vocabulary set using Google Translate.
This starting set is partly chosen base on the list of most commonly used languages, partly from the countries around where I'm from (Singapore), and partly countries with a lot of developers.
This is purely an experimental idea.Since TagUI steps and conditions are in natural-language-like syntax, it might be easily extendable to support 'natural language' in other languages besides English. For example, Chinese (which I'm a native speaker), and Hindi (which there are many talents in test automation and robotic processx automation).The hard way to do this is to rewrite the code for each step and condition to factor in different languages. But that is tedious -> hard to update or extend to other new languages.
An easy way is to use English as the internal reference and try to minimize changes to existing codebase. Then create a translation engine which takes in configuration files (.csv files for example), that translates from another language into the English reference language before execution. That engine can also be used to translate flow files between different languages, from Chinese to Hindi for example. Or English to Chinese.
If the design is flexible enough, users can easily create their own language configuration csv files for their native language. And write automation flow files directly in their native language. There will be implementation issues no doubt, but this hyper-localization idea is too interesting not to try.
Below is quick sanity test on macOS, looks like no apparent technical roadblock to the idea -> comparing and displaying strings in other languages works correctly. The flow files also shows text correctly in the other languages.
The integration with machine learning (starting with Yandex CatBoost) is important. But this idea is equally if not more important imo. Probably this can be developed concurrently with ML for TagUI v3.0.
The text was updated successfully, but these errors were encountered: