-
Notifications
You must be signed in to change notification settings - Fork 513
How Brython works
A typical Brython-powered HTML page looks like this :
<html>
<head>
<script src="/path/to/brython.js"></script>
</head>
<body>
<script type="text/python">
...
</script>
</body>
</html>
brython.js is the minified concatenation of individual scripts that
handle specific tasks, either at compile time (generation of Javascript code
from Python source) or at run time (for instance,
implementation of all Python built-in objects, eg in py_list.js for
list
and tuple
, py_string.js for str
, etc). The development is
done on the individual scripts ; brython.js is generated by the script
/scripts/make_dist.py.
brython.js exposes 2 names in the global Javascript namespace : brython
(the function called on page load) and __BRYTHON__
, an object that holds all
internal objects needed to run Python scripts.
Once the page is loaded, the function brython()
inspects all the scripts in
the page ; for those that have the type text/python
, it reads the Python source
code, translates it to Javascript, and runs this script by a Javascript eval()
or new Function(func_name, source)(module)
(second form to avoid memory leaks
on some browsers).
If the <script>
tag has an attribute src
, an Ajax call is performed to
get the content of the file at the specified url, and its source is converted
and executed as above. Note that this is possible only if the page is loaded
from a web server (protocol http), not with the File/Open browser menu.
These tasks are performed by the following functions:
- function
brython()
inpy2js.js
gets the source code of Python scripts and builds a list of tasks to execute. This list is managed by functionloop()
inloaders.js
- the main tasks are to run the scripts; this is done by function
_run_script()
inpy2js.js
- the translation to Javascript is managed by the script
gen_parse.js
. This script is generated by a set of scripts adapted from the CPython parser generator - the source code is split in tokens by function
tokenize()
inpython_tokenizer.js
. These tokens are described in standard Python documentation -
gen_parse.js
creates an Abstract Syntax Tree AST - the script
symtable.js
, adapted from CPython symtable.c, analyses the AST tree. It detects some syntax errors, and builds a tree of lexical scopes (modules, classes, functions), each with a set of symbols. Cf. symtable - the AST and the symbol table are passed to function
js_from_root()
inast_to_js.js
, which is in charge of generating the Javascript code - at this stage,
_run_script()
puts the task "execute the Python script with its Javascript translation" on top of the tasks list and calls functionloop()
inloaders.js
- this function executes the script by calling
new Function(script.js)
- if an exception is raised during execution, it is managed by function
handle_error()
inpy_exceptions.js
__BRYTHON__
has an attribute builtins
that stores all the built-in
Python names (classes, functions, exceptions, objects), usually with the
same name : for instance, the built-in class int
is stored as
__BRYTHON__.builtins.int
. Only names that conflict with Javascript naming
rules must be changed, eg super()
is implemented as __BRYTHON__.builtins.$$super
.
Python strings are implemented as Javascript strings, except those that have special characters such as 🐑, which are implemented with a custom class.
Python lists and tuples are implemented as Javascript arrays.
Python integers are implemented as Javascript numbers if they are in the range
of Javascript "safe integers", ie [-(2^53-1), 2^53-1] ; outside of this range
they are implemented as objects of the form {__class__: long_int, value: bigint}
where long_int
is an internal class with the same methods as int
, and
bigint
is a Javascript Big Integer.
Python floats are implemented as objects of the form {__class__: float, value: num}
where float
is a class with all the float methods, and num
is the Javascript
number for the float.
All other Python classes (builtin or user-defined) are implemented as a Javascript object that holds the class attributes and methods.
A minimal implementation of a class is done by such code :
$B.make_class = function(name, factory){
// Builds a basic class object
var A = {
__class__: _b_.type,
__mro__: [object],
__name__: name,
$is_class: true
}
A.$factory = factory
return A
}
factory is the function that creates instances of the class. The instances
have an attribute __class__
set to the class object.
The class dictionary has an attribute __mro__
, a list of the classes
used for attribute resolution on instances of the class.
Python functions are implemented as Javascript functions, but there are many differences, both with function definition and function calls.
To define a Python function, its parameters can be specified in many ways : named parameters, eg def f(x):
; with default values : def f(x=1):
; holders for additional positional and keyword arguments : def f(*x, **y):
A Python function can be called with positional arguments : f(2)
, keyword arguments : f(y=1)
, packed iterables : f(*args)
and packed dictionaries : f(**kw)
.
Javascript also has a variety of ways to handle parameters : named parameters : function f(x)
, and a way to handle arguments with the object arguments
that can be used inside the function, more or less like a list : function f(){var x=arguments[0]}
. Function calls can be done with named arguments : f(x)
, or with the methods call
and apply
.
For function calls, the arguments passed to the Python function are translated this way :
- positional arguments are kept unmodified
- packed tuples are unpacked and added to the positional arguments
- all keyword arguments (including packed dictionaries) are grouped in a single argument put at the end of the argument list. It is a Javascript object with 2 keys:
$nat
set to"kw"
andkw
set to an object indexed by the keyword arguments keys
For instance, the call
f(1, *t, x=2, **d)
is translated to
f.apply(null, [1].concat(list(t)).concat([{$kw:[{x: 2}, d]}]))
Python function definitions are translated to a Javascript function definition that takes no significant parameters ; the arguments values are set at the beginning of the function body, using the object argument
and the function $B.args
defined in py_utils.js. This function takes the following parameters, initialised from the Python function parameters :
$B.args = function(fname, argcount, slots, var_names, args, dobj,
extra_pos_args, extra_kw_args)
-
fname
is the function name -
argcount
is the number of named parameters expected by the function, not counting the holders for extra positional or keyword arguments -
slots
is a Javascript object indexed by the expected named parameters, with value set to "null" -
var_names
is a list of expected named parameters. It is the equivalent ofObject.keys(slots)
, but for performance reasons the list is explicitely created in the function body, instead of being created at each function call -
args
is the iterable holding the arguments passed to the function, generally set to the Javascript built-inarguments
-
dobj
is a Javascript object for the named arguments that take default values ; set to{}
if no default value is specified -
extra_pos_args
is the name of the holder for extra positional arguments, ornull
-
extra_kw_args
is the name of the holder for extra keyword arguments, ornull
A few examples :
for def f(x):
the Javascript function starts with
var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, null, null)
for def f(x, y=1):
var $ns = $B.args("f", 2, {x:null: y:null}, ['x', 'y'], arguments,
{y: 1}, null, null)
for def f(x, *t)
:
var $ns = $B.args("f", 1, {x:null}, ['x'], arguments, {}, "t", null)
for def f(x, y=1, *t, **d)
:
var $ns = $B.args("f", 2, {x:null, y:null}, ['x', 'y'], arguments,
{y: 1}, "t", "d")
$B.args
checks the arguments passed to the function and raises
exceptions if there are missing or unexpected arguments. Otherwise, the
object returned is indexed by the name of the arguments passed and, if
specified, the name of the holders for extra arguments.
For instance, in the last example above, $ns
will have the keys x, y, t
and d
.
A Python program is divided in blocks : modules, functions, classes, comprehensions. For each block, Brython defines a Javascript variable that will hold all the names bound in the block (we call it the "block names object").
Based on lexical analysis, including the global
and nonlocal
keywords, it
is generally possible to know in which block a name is bound. It is translated
as the attribute of the same name of the block names object.
When the name is referenced (eg print(x)
) and not bound (eg x = 1
), the translation
is actually a call to a function that checks if the object referenced by the name is undefined, and
if so, throws a NameError
or UnboundLocalError
for the name. This is done
because if a name is bound somewhere in a block, it may not have yet been bound
when it is referenced, for instance in examples like :
# example 1 : raises NameError
def f():
a
a = f()
# example 2 : raises NameError
class A:
def __init__(self):
a
a = A()
# example 3 : raises NameError
if False:
a = 0
a
# example 4 : raises UnboundLocalError
def f():
if False:
a = 9
a
f()
If lexical analysis shows that a referenced name is certainly defined,
it is simply translated to X['a']
: this is the case when the name has
been bound in a previous line in the block, at the block level, not in
an indented level. For instance in this case :
x = 0
print(x)
The only case when the block can't be determined is when the program imports
names by from some_module import *
. In this case :
- it is impossible to know if a name like
range
referenced in the script is the built-in classrange
or if it was among the names imported fromsome_module
- if a name which is not explicitely bound in the script is referenced,
lexical analysis can't determine if it should raise a
NameError
In this case, the name is translated to a call to a function that will select
at run time the value based on the names actually imported by the module, or
raise a NameError
.
Brython handles the execution frames in a linked list of frame objects. There is a frame object for the module level, and one for each function, class and a few other objects defined in Python documentation.
The attribute __BRYTHON__.frame_obj
is an object of the form
{prev, frame, count}
:
-
prev
references the value of__BRYTHON__.frame_obj
at the closest upper level -
frame
is the frame object for the current module / class / function (a list with information about the global and local environment) -
count
is the depth of the frame (incremented by 1 each time a frame is entered)
__BRYTHON__.frame_obj
is used notably
- for built-in functions
globals()
andlocals()
- for
exec()
andeval()
if no explicit global or local namespace are provided - to raise a
RecursionError
when the depth of the frame exceeds a threshold - to generate the traceback in case of exceptions
- for functions such as
sys._getframe()
__BRYTHON__.frame_obj
is managed by inserting calls to the internal
functions enter_frame()
and leave_frame()
in the generated Javascript
code.
This feature is used under 2 conditions :
- the browser must support the indexedDB database engine (most of them do, including on smartphones)
- the Brython page must use brython_stdlib.js, or the reduced version brython_modules.js generated by the CPython brython module
The main idea is to store the Javascript translation of stdlib modules in an indexedDB database : the translation is done only once for each new version of Brython ; the generated Javascript is stored on the client side, not sent over the network, and indexedDB can easily handle a few Mb of data.
Unfortunately, indexedDB works asynchronously, while import is blocking. With this code:
import datetime
print(datetime.datetime.now())
using indexedDB at runtime to get the datetime module is not possible, because the code that follows the import statement is not in a callback function that could be called when the indexedDB asynchronous request completes.
The solution is to scan the script at translation time. For each import statement in the source code, the name of the module to import is stored in a list. When the translation is finished, the Brython engine enters an execution loop (defined in function loop()
in loaders.js) that uses a tasks stack. The possible tasks are:
- call function
inImported()
that checks if the module is already in the imported modules. If so, the control returns toloop()
- if not, add a task to the stack : a call to function
idb_get()
that makes a request to the indexedDB database to see if the Javascript version of the Python module is already stored ; when the task is added, control returns toloop()
- in the callback of this request (function
idb_load()
) :- if the Javascript version exists in the database, it is stored in a Brython variable (
__BRYTHON__.precompiled
) and the control returns toloop()
- otherwise, the Python source for the module (found in brython_stdlib.js) is translated and another task is added to the stack : a request to store the Javascript code in the indexedDB database. The callback of this request adds another task : a new call to
idb_get()
, that is sure to succeed this time
- if the Javascript version exists in the database, it is stored in a Brython variable (
- the last task on the stack is the execution of the original script
At run time, when a module in the standard library is imported, the Javascript translation stored in __BRYTHON__.precompiled
is executed : the Python to Javascript translation has been made previously.
Cache update
The indexedDB database is associated with the browser and persists between browser requests, when the browser is closed, when the PC is restarted, etc. The process described above must define a way to update the Javascript version stored in the database when the Python source code in the stdlib is changed, or when the translation engine changes.
To achieve this, cache update relies on a timestamp. Each version of Brython is marked with a timestamp, updated by the script make_dist.py. When a script in the stdlib is precompiled and stored in the indexedDB database, the record in the database has a timestamp field set to this Brython timestamp. If a new version of Brython is used in the HTML page, it has a different timestamp and in the result of idb_load()
, a new translation is performed.
A complementary timestamp is defined if brython_modules.js is used instead of brython_stdlib.js.
Limitations
The detection of the modules to import is made by a static code analysis, relying on import moduleX
of from moduleY import foo
. It cannot work for imports performed with the built-in function __import__()
, or for code passed to exec()
. In these cases, the previous solution of on-the-fly compilation at each page load is used.
The mechanism is only implemented for modules in the standard library, or those in brython_modules.js. Using it for modules in site-packages or in the application directory is not implemented at the moment.
Pseudo-code
Below is a simplified version of the cache implementation, written in a Python-like pseudo code.
def brython():
<get Brython scripts in the page>
for script in scripts:
# Translate Python script source to Javascript
root = __BRYTHON__.py2js(script.src)
js = root.to_js()
if hasattr(__BRYTHON__, "VFS") and __BRYTHON__.has_indexedDB:
# If brython_stdlib.js is included in the page, the __BRYTHON__
# object has an attribute VFS (Virtual File System)
for module in root.imports:
tasks.append([inImported, module])
tasks.append(["execute", js])
loop()
def inImported(module_name):
if module_name in imported:
pass
elif module_name in stdlib:
tasks.insert(0, [idb_get, module_name])
loop()
def idb_get(module_name):
request = database.get(module_name)
request.bind("success",
lambda evt: idb_load(evt, module_name))
def idb_load(evt, module_name):
result = evt.target.result
if result and result.timestamp == __BRYTHON__.timestamp:
__BRYTHON__.precompiled[module] = result.content
for subimport in result.imports:
tasks.insert(0, [inImported, subimport])
else:
# Not found or outdated : precompile source code found
# in __BRYTHON__.VFS
js = __BRYTHON__.py2js(__BRYTHON__.VFS[module]).to_js()
tasks.insert(0, [store_precompiled, module, js])
loop()
def store_precompiled(module, js):
"""Store precompiled Javascript in the database."""
request = database.put({"content": js, "name": module})
def restart(evt):
"""When the code is inserted, add a new request to idb_get (this time
we are sure it will find the precompiled code) and call loop()."""
tasks.insert(0, [idb_get, module])
loop()
request.bind("success", restart)
def loop():
"""Pops first item in tasks stack, run task with its arguments."""
if not tasks:
return
func, *args = tasks.pop(0)
if func == "execute":
js_script = args[0]
<execute js_script>
else:
func(*args)