Python type hints and migration to Python 3

Migration to Python 3

See these Google Slides for our plan overview for migrating to Python 3 and adding mypy type hints to help catch problems in the migration and elsewhere (esp. with the string to unicode vs. bytes change).

Overview: Making things compatible with both Python 2 and 3

from __future__ import absolute_import, division, print_function in every non-empty source file
Syntax, e.g. raise ValueError('RNAP protein counts must be positive.')
String types, string.encode('utf-8'), bytestring.decode('utf-8')
Dict accessors, range(), zip()
I/O
Moved & combined packages, six.moves.cPickle, six.moves.zip
def __next__(self), __hash__, __matmul__
The time package has better clock functions replacing time.clock(); see wholecell.utils.py3
Numpy structured fields of type 'a' or 'S' contain 8-bit NUL-terminated bytes; switch to 'U'
and more...
a bunch of the changes were made by the Python-Modernize tool but reject some of its edits like making a list from an iterable before iterating it

Test that the changes didn't break anything

pytest
mypy type checks (now done in CI)
PyCharm inspections (set to check for compatibility with 2.7 and 3.8)
Compare parca outputs and sim outputs
Review analysis plots

Recommended: Add assert statements, pytest cases, and type hints.

String types

	Python 2	Python 2+3	Python 3
basestring	basestring	--	--
unicode	unicode	--	--
typing.Text	unicode	typing.Text	str
typing.AnyStr # any type of string but not mixed	typing.AnyStr	typing.AnyStr	typing.AnyStr
six.string_types # for instanceof()	(basestring,)	six.string_types	(str,)
six.text_type	unicode	six.text_type	str
String = Union[str, Text] # type alias	unicode or str [or bytes]	a text string	str

See Six: Python 2 and 3 Compatibility Library.

Dict access

	Python 2	Python 2+3	Python 3
test a key	key in d d.has_key(key)	key in d	key in d

snapshot as a list	list(d) d.keys() list(d.keys()) # extra list copy	list(d)	list(d) list(d.keys())
	d.values() list(d.values()) # extra list copy	list(six.viewvalues(d)) list(d.values())	list(d.values())
	d.items() list(d.items()) # extra list copy	list(six.viewitems(d)) list(d.items())	list(d.items())

iterable view	d.viewkeys()	six.viewkeys(d)	d.keys() d # if the context will call iter() on it
	d.viewvalues()	six.viewvalues(d)	d.values()
	d.viewitems()	six.viewitems(d)	d.items()

iterator	for key in d: ... iter(d) d.iterkeys()	for key in d: ... iter(d) # six.iterkeys(d)	for key in d: ... iter(d) iter(d.keys())
	d.itervalues()	# six.itervalues(d)	iter(d.values())
	d.iteritems()	# six.iteritems(d)	iter(d.items())

See PEP 469 -- Migration of dict iteration code to Python 3.
"Snapshot as a list" takes more RAM but isn't always slower and it lets you modify the dict while iterating through the snapshot. Even when it is slower, that might not matter in a unit test or a development utility.
A dictionary "View" has set operations, in, iter, and reversed iter. It's Iterable, which means it can construct an iterator on demand, and in that sense it can be iterated multiple times although each iterator is one-shot. While iterating, it can handle dict value changes but not size changes.
Stop using the d.iterxyz() methods. They aren't in Python 3 since the View methods fill that role and more. Most code (like for...in) that needs an Iterator will accept an Iterable. If you really need to pass an Iterator to a function, then call e.g. iter(a_view).

Main References

style-guide.md#type-hints -- our notes on using Python type hints including numpy stubs. [Note: The *.md docs need updating about the type hint np.ndarray (no [element_type]) and about building your python runtime environment.]
Porting Python 2 Code to Python 3.
What's New in Python.
The "six" compatibility library.
The mypy type checker
The Python-Modernize conversion tool.

FYI: Additional References

Why Python 3 exists
- Text vs. binary data in Python 2 is error prone. Mixing encoded and unencoded text is unreliable and confusing, e.g. both str and unicode types have .encode() and .decode() methods. Python predates the Unicode standard. Inconsistent unicode handling, e.g. in a script vs. interactive interpreter; also open().read().
- Python 3 fixes that problem by distinguishing unicode text from binary bytes as separate types. The new approach is known the "Unicode sandwich": "use bytes in I/O; unicode in all the app code in between." However, the Python team threw in a lot of other incompatible changes. Big mistake!
The Unicode HOWTO.
The Story of Python 2 and 3.
Ned Batchelder’s Pragmatic Unicode talk/essay
Supporting Python 3: An in-depth guide.
differences.
@twouters’s old TransitionToPython3 wiki.
What's New in Python for really comprehensive and exhaustive documentation about all language changes since Python 2.7.
Conservative Python 3 Porting Guide..
From Dropbox, Incrementally migrating over one million lines of code from Python 2 to Python 3
- Don't use unicode literals.
- Use Mypy type checks, unit tests, and Py2 -3 in CI to check for backsliding esp. on unicode/bytes types.
Python 3 for Scientists.

Strategy

Set PyCharm's inspector to check compatibility with Python 2.7 and 3.8.
- Preferences/Editor/Inspections/Python/Code compatibility inspection/Options: Check for compatibility with 2.7, 3.8, 3.9
Adopt all the __future__ imports.
- Division is the challenging one. It's mostly in use already, with the big exception that wholecell/utils/units.py has truediv turned off for its callers due to Issue #433.
Adopt Python 3 compatible libraries.
- Finish adopting subprocess32 in place of subprocess. It's a back-port of the Python 3 subprocess with improvements and bug fixes in process launching.
Incrementally convert to Python 3 compatible syntax and semantics. Use a tool like "python-modernize" to do much of the conversion. As we ratchet up the Python 3 compatibility, let everyone know and update the checker tool configuration.
- Use a checker tool in CI to catch backsliding on Python 3 compatibility changes.
Add type hints, esp. for the str, bytes, unicode, and basestring types and the AnyStr type hint.
- Add a type checker in CI (see below).
Drop support for Python 2.
Phase out use of the "six" compatibility library.

Type hints

Type hints look like this:

def emphasize(message):
  # type: (str) -> str
  """Construct an emphatic message."""
  return message + '!'

A few type hints -- esp. one per function definition -- can go a long way to catching problems and documenting types.

PyCharm checks types interactively while you edit. You don't need any other tools to check types. See Python Type Checking (Guide).

Batch programs mypy and pytest are other ways to check types, particularly in Continuous Integration builds (CI). (pytest does not yet support Python 3.7 or 3.8.)

Typeshed is a repository for "stub" files that associate type definitions with existing libraries. It's bundled with PyCharm, mypy, and pytype. It does not have types for Numpy.

Types for Numpy

There are experimental type stubs in the numpy repo numpy-stubs that define types for dtype and ndarray. It's not fancy but it does catch some mistakes and it improves PyCharm autocompletion. The numpy team might improve these stubs but numpy, scipy, and matplotlib use types more flexibly than type checker tools can handle.

With this stub file, you can write type hints like np.ndarray ~~and np.ndarray[int]~~. It has no way to express the element type or array shape so use docstrings for that.

import numpy as np

def f(a):
    # type: (np.ndarray) -> np.ndarray
    return np.asarray(a, dtype=int)

The wcEcoli project includes numpy-stubs.

To install more stub files:

Copy them into the stubs/ directory in the project.
~~Mark the stubs/ directory as a source root in PyCharm.~~

A short list of Nifty Python 3 features

class C: # simpler than class C(object):
super().__init__() # simpler than super(C, self).__init__()
class C(metaclass=abc.ABCMeta):
matrix_A @ matrix_B # matrix multiply
f'{name=} {len(name)=}'
{**x, **y}, [*l, '=', *l] # unlimited * and ** unpacking
dictionaries are kept in insertion order
nonlocal x
breakpoint()
UTF-8 source code and Unicode identifiers
math.inf, math.nan, math.isclose(), ...
Regex match object mo['name'] ≡ mo.group('name')
New IO model, fast os.scandir()
multiprocessing improvements
__iter__ = None # declare that a special method is unavailable

Python 3.7 features

dataclasses.
context variables, which are like thread-local variables that support asynchronous code correctly.
nanosecond resolution time functions.
datetime.fromisoformat().
Process.close() method.

Python 3.8 features

:= assignment expression: if (n := len(a)) > 10: ...
Positional-only parameters: def f(a, b, /, c, d): ...
"=" format in f-strings: `print(f'{variable=})' for quick debugging printouts.

Also see Python changes since 3.8 (summary).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly