Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run on Pyodide #803

Open
tomwhite opened this issue Jan 17, 2022 · 1 comment
Open

Run on Pyodide #803

tomwhite opened this issue Jan 17, 2022 · 1 comment

Comments

@tomwhite
Copy link
Collaborator

Pyodide uses WebAssembly to run Python in the browser. It has support for a lot of the PyData stack, so I wondered how easy it would be to get sgkit running on it. It would be a nice way to share demos and notebooks (see JupyterLite). (This is work I did last year but didn't get round to sharing.)

The following libraries are not supported yet:

  • Dask distributed. There is some discussion on Install with pyodide dask/dask#7764. The synchonous scheduler does work though, with a small workaround.
  • Numba. Ideally numba decorators would be ignored, but for the demo below I just commented them out. (There is a problem with doing this for guvectorize since it generates code with a new signature, so anything that uses these functions won't work.)
  • IO libraries. In principle these could be submitted to Pyodide as a new package.

I created a branch with the above changes (and a few others), then built a wheel and uploaded to GCP in order to load it with micropip. Then using https://pyodide.org/en/latest/console.html, I managed to create an sgkit Dataset:

Welcome to the Pyodide terminal emulator 🐍
Python 3.9.5 (default, Jan 17 2022 04:07:25) on WebAssembly VM
Type "help", "copyright", "credits" or "license" for more information.
>>> import micropip
>>> import zarr # not sure why this is needed before installing sgkit
>>> import sklearn # needed since sgkit doesn't explicitly declare it as a dependency (need to fix)
>>> await micropip.install("https://storage.googleapis.com/tomwhite_test/sgkit-0.3.1.dev5%2Bg59736c0-py3-none-any.whl")
>>> # needed to import dask, see https://github.com/pyodide/pyodide/issues/1603
>>> import sys
sys.modules['_multiprocessing'] = object
>>> 
>>> import dask
>>> dask.config.set(scheduler='synchronous')
<dask.config.set object at 0x347de48>
>>> import sgkit as sg
/lib/python3.9/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma com
pression will result in a RuntimeError.
  warnings.warn(msg)
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1000, n_sample=250, n_contig=23, missing_pct=.1)
>>> ds
<xarray.Dataset>
Dimensions:             (variants: 1000, alleles: 2, samples: 250, ploidy: 2)
Dimensions without coordinates: variants, alleles, samples, ploidy
Data variables:
    variant_contig      (variants) int32 0 0 0 0 0 0 0 ... 22 22 22 22 22 22 22
    variant_position    (variants) int32 0 1 2 3 4 5 6 ... 36 37 38 39 40 41 42
    variant_allele      (variants, alleles) |S1 b'G' b'A' b'T' ... b'A' b'T'
    sample_id           (samples) <U4 'S0' 'S1' 'S2' ... 'S247' 'S248' 'S249'
    call_genotype       (variants, samples, ploidy) int8 0 0 1 0 1 ... 0 0 0 0 1
    call_genotype_mask  (variants, samples, ploidy) bool False False ... False
Attributes:
    contigs:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
>>> 
 
@tomwhite
Copy link
Collaborator Author

I updated this to use the latest code, and stubbed out some of the numba calls: https://github.com/tomwhite/sgkit/tree/pyodide-latest. This simplifies its usage a bit:

Welcome to the Pyodide terminal emulator 🐍
Python 3.9.5 (default, Jan 17 2022 04:07:25) on WebAssembly VM
Type "help", "copyright", "credits" or "license" for more information.
>>> import micropip
>>> await micropip.install("https://storage.googleapis.com/tomwhite_test/sgkit-0.4.1.dev20%2Bg839eb9a9-py3-none-any.whl")
>>> import sys
sys.modules['_multiprocessing'] = object
>>> import dask
dask.config.set(scheduler='synchronous')
<dask.config.set object at 0x2071538>
>>> import sgkit as sg
/lib/python3.9/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma com
pression will result in a RuntimeError.
  warnings.warn(msg)
>>> sg.simulate_genotype_call_dataset(n_variant=1000, n_sample=250, n_contig=23, missing_pct=.1)
<xarray.Dataset>
Dimensions:             (variants: 1000, alleles: 2, samples: 250, ploidy: 2)
Dimensions without coordinates: variants, alleles, samples, ploidy
Data variables:
    variant_contig      (variants) int32 0 0 0 0 0 0 0 ... 22 22 22 22 22 22 22
    variant_position    (variants) int32 0 1 2 3 4 5 6 ... 36 37 38 39 40 41 42
    variant_allele      (variants, alleles) |S1 b'G' b'A' b'T' ... b'A' b'T'
    sample_id           (samples) <U4 'S0' 'S1' 'S2' ... 'S247' 'S248' 'S249'
    call_genotype       (variants, samples, ploidy) int8 0 0 1 0 1 ... 0 0 0 0 1
    call_genotype_mask  (variants, samples, ploidy) bool False False ... False
Attributes:
    contigs:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
    source:   sgkit-0.4.1.dev20+g839eb9a9
>>> 
 

I built the wheel with

python setup.py bdist_wheel

And I used GCS since it makes it easy to set CORS:

gsutil cors set cors.json gs://tomwhite_test

Where cors.json is

[
    {
      "origin": ["https://pyodide.org/"],
      "method": ["GET"],
      "responseHeader": ["Content-Type"],
      "maxAgeSeconds": 3600
    }
]

One day it might be possible just to use the standard sgkit wheel from PyPi, in which case there would be no need to worry about CORS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant