Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write and run automation flows in different human languages #68

Closed
5 tasks done
kensoh opened this issue Nov 9, 2017 · 12 comments
Closed
5 tasks done

Write and run automation flows in different human languages #68

kensoh opened this issue Nov 9, 2017 · 12 comments
Labels

Comments

@kensoh
Copy link
Member

kensoh commented Nov 9, 2017

updated - feature implemented (more details at bottom of thread)

  • iteration 1 - simple search and replace matches of language strings
  • iteration 2 - TagUI steps and syntax model to reduce false positives
  • iteration 3 - translation for helper functions title(), text(), present() etc
  • iteration 4 - automation steps execution output in different languages
  • iteration 5 - working engine with 21 languages (mostly automatically built)

The languages are Bengali, Chinese, English, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Tagalog, Tamil, Thai, Vietnamese. A user can easily automate building a new native language definition by using this TagUI automation flow that builds the vocabulary set using Google Translate.

This starting set is partly chosen base on the list of most commonly used languages, partly from the countries around where I'm from (Singapore), and partly countries with a lot of developers.


This is purely an experimental idea. Since TagUI steps and conditions are in natural-language-like syntax, it might be easily extendable to support 'natural language' in other languages besides English. For example, Chinese (which I'm a native speaker), and Hindi (which there are many talents in test automation and robotic processx automation).

The hard way to do this is to rewrite the code for each step and condition to factor in different languages. But that is tedious -> hard to update or extend to other new languages.

An easy way is to use English as the internal reference and try to minimize changes to existing codebase. Then create a translation engine which takes in configuration files (.csv files for example), that translates from another language into the English reference language before execution. That engine can also be used to translate flow files between different languages, from Chinese to Hindi for example. Or English to Chinese.

If the design is flexible enough, users can easily create their own language configuration csv files for their native language. And write automation flow files directly in their native language. There will be implementation issues no doubt, but this hyper-localization idea is too interesting not to try.

Below is quick sanity test on macOS, looks like no apparent technical roadblock to the idea -> comparing and displaying strings in other languages works correctly. The flow files also shows text correctly in the other languages.

START - automation started - Thu Nov 09 2017 15:16:47 GMT+0800 (+08)
http://tebel.org/index_mobile.php - Tebel.Automation

点击
WORKS!

http://tebel.org/index_mobile.php - Tebel.Automation
FINISH - automation finished - 2.9s
START - automation started - Thu Nov 09 2017 15:19:05 GMT+0800 (+08)
http://tebel.org/index_mobile.php - Tebel.Automation

क्लिक
WORKS!

http://tebel.org/index_mobile.php - Tebel.Automation
FINISH - automation finished - 2.1s

The integration with machine learning (starting with Yandex CatBoost) is important. But this idea is equally if not more important imo. Probably this can be developed concurrently with ML for TagUI v3.0.

@kensoh kensoh added the feature label Nov 9, 2017
@dublindrupaller
Copy link

dublindrupaller commented Nov 9, 2017

hi kensoh,

I have been looking at the same with my fork of tagui. I'm in Switzerland, where there are mainly 4 languages used..swiss-german, french, italian and English.

Using the php classes, we can use out-of-the-box aliases. Using the tap.class.php file as an example, you have the following.

`
/**
 * @file
 *
 */

/**
 *  tap class which is a child of step
 *  The class contains three methods:
 *  - public getIntent()
 *  - public parseIntent()
 *  - public get_header_js() 
 */

class tap extends step {
      
  /**
   * Construct wrapper object   
   */
  public function __construct($intent){    
    $this->intent = $intent;
  }
  /**
   * @return string
   */
  public function getIntent($intent) {    

    if ((substr($intent,0,4)=="tap ") || (substr($intent,0,6)=="click ")) {
      return $this->intent;
    }    
    return FALSE;
  }

  /**
   * @return string 
   *   casperjs code as string
   *
   * @param string $raw_intent
   *   The full written step line for passing directly to the casperjs output or parsing for sikuli
   * @param array $params
   *   Array of params for the given step
   * @param string $twb
   *   Tagui_web_browser token for constructing test header casperjs   
   * @param boolean $sikuli
   *   if input is meant for sikuli visual automation 
   *
   */
  public function parseIntent($intent, $raw_intent, $twb, $sikuli=FALSE) {     

  $params = trim(substr($raw_intent." ",1+strpos($raw_intent." "," ")));
  // TODO: $params is passed as an array but sent to casperjs code and sikuli output as a string   
    if ($sikuli) {
      $abs_params = abs_file($params); 
      $abs_intent = str_replace($params,$abs_params,$raw_intent);
      $parsed_code =  call_sikuli($abs_intent,$abs_params);
    } else {
      $parsed_code = "{techo('".$raw_intent."');".beg_tx($params).$twb.".click(tx('" . $params . "'));".end_tx($params);       
    }    
    return $parsed_code;
  } 

  public function getHeaderJs() {
    $js = <<<TAGUI
function tap_intent(raw_intent) {
var params = ((raw_intent + ' ').substr(1+(raw_intent + ' ').indexOf(' '))).trim();
if (is_sikuli(params)) {var abs_params = abs_file(params); var abs_intent = raw_intent.replace(params,abs_params);
return call_sikuli(abs_intent,abs_params);} 
if (params == '') return "this.echo('ERROR - target missing for " + raw_intent + "')";
else if (check_tx(params)) return "this.click(tx('" + params + "'))";
else return "this.echo('ERROR - cannot find " + params + "')";}
TAGUI;
    return $js;
  }       
}
class_alias('tap', 'click');
`

By simply extending the class_alias, you can have:

class_alias('tap', 'click', 'cliquez', 'klicken', 'cliceail', 'clic');

Where:
"Cliquez" is the french for click
"klicken" is the german for click
"cliceail" is the Irish (gaelic) for click - as an aside, I am Irish :0)
and "clic" is the Italian for click

Obviously that is scalable for other languages and logic can be added to handle the operators for some other steps. But the concept is simple enough.

@kensoh
Copy link
Member Author

kensoh commented Nov 9, 2017

Thanks Gus for your inputs! In above implementation, adding a new language would require updating all the individual step files for TagUI. And it can only be done by a developer. I'm looking at another direction, maybe a .csv file for each language or a combined .csv file for all languages.

step french german irish italian
click cliquez klicken cliceail clic
type
save
load
...

Probably one file per language is good in that individual languages can develop on its own optimal speed by users. This way, users can easily update their .csv file to correct words for their native languages and they don't have to be developers to do that.

But for this implementation, I'll have to write a separate engine to do the translation. That's ok I guess, and shouldn't be difficult. With that engine, a bonus is users can do translation of TagUI scripts from 1 language to another. For example, someone writing an automation flow in Chinese can translate it to Irish and it still works (of course, the supporting language .csv file definitions have to be there)

Also tagging the other issue which you shared your major refactoring implementation of separating steps into separate PHP class files, a lot of interesting ideas there - https://github.com/tebelorg/TagUI/issues/51

Another challenge that comes with multiple languages will be language-specific issues. For example, Hebrew reads from right to left, and some languages the order of noun and verb is opposite of English and languages with Germanic roots. I don't think it is easy to handle those. But this idea of multi-language flow files is so interesting to me that I'll go ahead with implementing some working cases, and users or myself can improve it later on if this use case is meaningful to them.

@dublindrupaller
Copy link

dublindrupaller commented Nov 9, 2017

Can I suggest you look at how .po or .pot files are used instead of .csv as a possible alternative?
it's, pretty much, a standard for storing multilingual strings for applications and there are lots of open source editors available.

the advantage is you can, long term, handle ALL text strings, steps/intents, debug messages etc, in multiple languages within your application and it is simple for non-devs to tweak the languages.

I've worked on projects in 80+ languages, including arabic, korean, hebrew, russian etc. as well as the standard european and it is not difficult to rejig your parsing logic based on a base language key ( using the ISO 639-1 Code) e.g. en for english, ar for arabic to handle right-to-left.

in simple PHP logic, it might look something like this:

switch ($base_lang) {
  case "en":
  // load en.po file
  //base language is english, so handle step intent left-to-right
  break;
 case "ar":
  // load ar.po file
  // base language is arabic, so handle intent right-to-left
  break;
}

@kensoh
Copy link
Member Author

kensoh commented Nov 9, 2017

Thanks Gus for your suggestion, adding on a link with more info for PO files.

Probably I'll still stick with .csv but will look more into this. Using .po file type is a standard for developers / translators, but most users' laptops won't have the software to view / edit those files. This adds another layer of friction for a user to modify the language config file.

My hunch now is deploying as .po will increase the time to roll-out this new feature, while making harder for a non-developer/translator user to modify. Possible benefits may be if there are existing .po file assets which can be made use of to facilitate building the language definitions.

Ideally if possible, I would want to release v3.0 with framework for multi-language + machine-learning integration with Yandex CatBoost by next weekend. Don't know if that's possible!

@dublindrupaller
Copy link

you can use notepad or TextEdit to edit .po files Kensoh or vi. furthermore, for tagui users who want to use multilingual capabilities, it's highly likely they will (a) be familiar with .po files and (b) probably already have one of the many open source .po editors available.

hth

@kensoh
Copy link
Member Author

kensoh commented Nov 10, 2017

Adding on with a link to the 10 most spoken languages in the world. Because I'm from Singapore (south-east asia), I would really hope to at least see Chinese, Hindi, Japanese, Malay and maybe Russian languages support.

For the release, at least there should be Chinese (manually verified), and maybe Google-translated versions of these languages used in my region. The .csv definition for Chinese is done, can start coding the translation engine and look for problems to solve to implement this.

kensoh added a commit that referenced this issue Nov 10, 2017
- initial working commit for translation engine
- to translate to and from a different language
- base on language definition .csv spreadsheet
- created simple language file for chinese
- simple logic for this initial skeleton
- probably need to consider restricting accidentally replacing
identifiers, and many other considerations
- after refining, integrate into tagui execution flow
@kensoh
Copy link
Member Author

kensoh commented Nov 10, 2017

Made initial working commit for translation engine with following comments -

  • to translate to and from a different language
  • base on language definition .csv spreadsheet
  • created simple language file for chinese language
  • simple translation logic used for this initial skeleton
  • need many other considerations such as restricting accidentally replacing identifiers
  • after refining, integrate into tagui execution flow

At the moment, this skeleton translation engine can translate below example flow from A. Chinese to B. English and also from B to A. This opens up the use case where an english flow file can be translated for sharing, updating or automation execution by non-english speaking users.

It also means with the language definition files ready, a Chinese flow file can potentially be translated into a working French flow file for example (by using English as the intermediary reference language).

A. sample chinese flow

https://ca.yahoo.com
输入 search-box 为 github
显示 search-box
点击 search-button

等待 6.6

number = 1
如果 number 多过或等于 1
快照 page

让 n 从 1 到 10
快照 logo

text = 'abcde'
如果 text 包括 'bcd'
快照 page 到 results.png

快照 logo 到 logo.png

https://duckduckgo.com

输入 search_form_input_homepage 为 The search engine that doesn\'t track you.
快照 page 到 duckduckgo.png

等待 4.4 秒

B. sample translated flow

https://ca.yahoo.com
type search-box as github
show search-box
click search-button

wait 6.6

number = 1
if number more than or equals to 1
snap page

for n from 1 to 10
snap logo

text = 'abcde'
if text contains 'bcd'
snap page to results.png

snap logo to logo.png

https://duckduckgo.com

type search_form_input_homepage as The search engine that doesn\'t track you.
snap page to duckduckgo.png

wait 4.4 seconds

kensoh added a commit that referenced this issue Nov 10, 2017
- 2nd iteration reduces mistakes by translating contextually
- 1st iteration is simply search and replace matches
- this iteration uses TagUi’s internal model of TagUI steps and default
syntax in english to make translations only when the translation fits
the internal model
- for example, translating the first phrase/keyword only if the
translated word is part of TagUI steps
- eg translating conditions only if that flow line starts with
condition-starting keywords
- eg translating separators keywords only if they are valid separators
for that corresponding step
- with this commit translating to and from different languages can
retain accuracy much better
@kensoh
Copy link
Member Author

kensoh commented Nov 10, 2017

Committed 2nd iteration with following comments -

  • 1st iteration is simply search and replace matches (higher chance of false-positives)
  • this iteration uses an internal model of TagUI steps and default syntax in english to make translations only when the translation fits the internal model (restricts false-positives)
  • for example, translating the first phrase/keyword only if the translated word is part of TagUI steps
  • eg translating conditions only if that flow line starts with condition-starting keywords
  • eg translating separators keywords only if they are valid separators for that corresponding step
  • with this commit translating to and from different languages can retain accuracy much better

Should be ready to start integrating into tagui / tagui.cmd execution flow, whether by file naming convention, configuration, option, and whether to create additional translate / translate.cmd to support direction translation of automation flow files from language X to language Y (eg Chinese to Hindi).

3rd iteration, added below commit with following comments -

  • translation logic for helper functions - title(), url(), text(), timer(), count(), present(), visible()
  • useful in conditions, check step, variable assignments, or steps such as echo / dump / write
  • the translation logic allows users to call these helper functions in their native language

4th iteration, added another commit below with following notes -

  • with this commit, supporting components are there to allow taking automation flow base on different languages and showing automation execution in the same or other languages
  • ported translation engine from tagui_parse.php to tagui_header.js
  • flow language display can be controlled by the variable flow_language
  • open the way for showing automation execution in different languages
  • updated language definitions to windows format .csv for cross-platform viewing
  • added english and template definitions that can be used for translations or reference

kensoh added a commit that referenced this issue Nov 11, 2017
- adding translation logic for helper functions such as title(), url(),
text(), timer(), count(), present(), visible()
- these functions can be useful in conditions, check step, variable
assignments, or some steps such as echo / dump / write
- with the translation logic, users can call these functions in their
native language by configuring the language definition file
kensoh added a commit that referenced this issue Nov 12, 2017
- with this commit, supporting components are there to allow taking
automation flow in multiple languages and showing execution in other
languages
- ported translation engine from tagui_parse.php to tagui_header.js
- flow language display can be controlled by the variable flow_language
- open the way for showing automation execution in different languages
- updated language definitions to windows format .csv
- added english and template definitions that can be used for
translations or reference
kensoh added a commit that referenced this issue Nov 12, 2017
- move flow_language variable from tagui_header.js to tagui_config.txt
for easy configuration
- fix regression on test mode string replacement due to multi-language
change in tagui_header.js
kensoh added a commit that referenced this issue Nov 14, 2017
- completed translation engine for TagUI, more details here -
https://github.com/tebelorg/TagUI/issues/68

- working multi-language flow support for 21 languages (english and
chinese language definitions are manually edited, 19 others are
automatically created)

- created language build automation flow that ‘self-builds’ language
definition csv files by using google translate

- use tagui_language variable in tagui_config.txt to set the flow file
language

- use variable tagui_language instead of flow_language to track
execution output language

- parsing engine (tagui_parse.php) to use translation engine
(translate.php) during initial parsing

- let translate.php accept internal calls from tagui_parse.php during
parsing before execution

- log translation of other languages to english reference language for
easier troubleshooting

- various improvements and bug fixes in translation engine

- print step to print output on the next line instead of combining into
1 line, for consistency with other steps behavior
@kensoh
Copy link
Member Author

kensoh commented Nov 14, 2017

5th iteration commit with following notes -

  • working support for multi-language flow for 21 languages (english and chinese language definitions are manually edited, 19 others are automatically created). created language build automation flow that ‘self-builds’ language definition csv files by using google translate

  • use tagui_language variable in tagui_config.txt to set the input flow file language. use variable tagui_language instead of flow_language to track execution output language.

  • parsing engine (tagui_parse.php) to use translation engine (translate.php) during initial parsing. let translate.php accept internal calls from tagui_parse.php during parsing before execution. log translation of other languages to english reference language for easier troubleshooting. various improvements and bug fixes in translation engine

  • print step to print output on next line instead of same line, for consistency with other steps

@kensoh kensoh changed the title Multi-language flow files / automation scripts (experimental) Write and run automation flows in different native languages Nov 14, 2017
@kensoh
Copy link
Member Author

kensoh commented Nov 14, 2017

Working feature in master branch (can try by overwriting your existing packaged installation). Above comment has more details on the 21 languages (mostly auto-built using google translate).

  1. set your default flow language with tagui_language variable in tagui_config.txt
  2. write automation flow in native language base on language definition .csv files
  3. optionally set tagui_language in flow to any other languages as output language

Below are some of the stuffs that can be done. Taking in flow files written in different native languages. Running them and displaying the steps in the same or other languages. Translating flow files to and from english, etc. Of course, most of the language definitions are automatically self-built, and would be wrong without understanding UI interaction context. Native language users can update the language definition csv themselves or submit PRs with correct words to be used.

original automation flow file in chinese

https://ca.yahoo.com
输入 search-box 为 github
显示 search-box
点击 search-button

等待 6.6

number = 1
如果 number 多过或等于 1
快照 page

让 n 从 1 到 3
快照 logo

text = 'abcde'
如果 text 包括 'bcd'
快照 page 到 results.png

https://duckduckgo.com

输入 search_form_input_homepage 为 The search engine that doesn\'t track you.
快照 page 到 duckduckgo.png

等待 4.4 秒

running with tagui_language = 'chinese' in tagui_config.txt

START - automation started - Wed Nov 15 2017 03:43:02 GMT+0800 (SGT)
https://ca.yahoo.com - Yahoo

输入 search-box 为 github
显示 search-box
github
点击 search-button
等待 6.6
快照 page
快照 logo
快照 logo
快照 logo
快照 page 到 results.png
https://duckduckgo.com - DuckDuckGo
输入 search_form_input_homepage 为 The search engine that doesn't track you.
快照 page 到 duckduckgo.png
等待 4.4 秒

https://duckduckgo.com/ - DuckDuckGo
FINISH - automation finished - 16.6s

add tagui_language = 'russian' in 2nd line of flow file and run again

START - automation started - Wed Nov 15 2017 03:46:14 GMT+0800 (SGT)
https://ca.yahoo.com - Yahoo

тип search-box в виде github
показать search-box
github
щелчок search-button
подождите 6.6
щелчок page
щелчок logo
щелчок logo
щелчок logo
щелчок page в results.png
https://duckduckgo.com - DuckDuckGo
тип search_form_input_homepage в виде The search engine that doesn't track you.
щелчок page в duckduckgo.png
подождите 4.4 секунд

https://duckduckgo.com/ - DuckDuckGo
FINISH - automation finished - 17.3s

from command prompt, php translate.php flow_file from chinese

https://ca.yahoo.com
tagui_language = 'russian'
type search-box as github
show search-box
click search-button

wait 6.6

number = 1
if number more than or equals to 1
snap page

for n from 1 to 3
snap logo

text = 'abcde'
if text contains 'bcd'
snap page to results.png

https://duckduckgo.com

type search_form_input_homepage as The search engine that doesn\'t track you.
snap page to duckduckgo.png

wait 4.4 seconds

from command prompt, php translate.php flow_file_translated to french

https://ca.yahoo.com
tagui_language = 'russian'
type search-box comme github
montrer search-box
cliquez search-button

attendez 6.6

number = 1
si number plus ou égal à 1
casser page

pour n de 1 à 3
casser logo

text = 'abcde'
si text contient 'bcd'
casser page à results.png

https://duckduckgo.com

type search_form_input_homepage comme The search engine that doesn\'t track you.
casser page à duckduckgo.png

attendez 4.4 secondes

Though this multi-language flow support only takes a few days to implement, it is one of the most interesting features of TagUI to me. It somewhat shows TagUI as a DSL (domain specific language) for UI interaction that can be easily translated to and from different languages. For creation, maintenance and execution of automation flows. Really like this possibility of hyper-localization.

Now can switch back to experimenting on AI / ML integration with Yandex CatBoost.

@kensoh
Copy link
Member Author

kensoh commented Nov 27, 2017

Published TagUI v3.0 - Native Languages Release, keeping issue open for related issues.

Usage details in readme - https://github.com/tebelorg/TagUI#native-languages

@kensoh kensoh changed the title Write and run automation flows in different native languages Write and run automation flows in different human languages Nov 30, 2017
@kensoh kensoh closed this as completed Dec 16, 2017
@kensoh
Copy link
Member Author

kensoh commented Apr 16, 2018

Note to track new TagUI steps not yet included in language definitions -

  • ask, rtap, rclick, dtap, dclick, tagui, r, py

(some of these new steps are irrelevant to have multi-language functionality)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants