Skip to content

GSoC 2024 projects

Osvaldo A Martin edited this page Feb 20, 2024 · 6 revisions

ArviZ

ArviZ is a project dedicated to promoting and building tools for exploratory analysis of Bayesian models. It currently has a Python and a Julia interface.

ArviZ aims to integrate seamlessly with established probabilistic programming languages like PyStan, PyMC, Turing, Soss, emcee, Pyro... and to be easily integrated with novel or bespoke Bayesian analyses. Where the aim of the probabilistic programming languages is to make it easy to build and solve Bayesian models, the aim of the ArviZ libraries is to make it easy to process and analyze the results from those Bayesian models.

Timeline

The timeline of the GSoC internships is available at the GSoC website

Projects

Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Gitter (we won't accept proposals on topics outside this idea list from people who hasn't contacted us before). Keep in mind that these are only ideas and that some of them can't be solved entirely in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out to Gitter. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.

Each project also lists some specific requirements needed to be able to successfully complete the project, general requirements are listed below.

Note that these requirements can be learned while writing the proposal and during the community bonding period. You should feel confident to work on any project whose requirements are interesting to you and you would like to learn about them, they are not skills all that you are expected to know before writing your proposal. We aim for GSoC to provide a win-win scenario where you benefit from an inclusive and thriving environment in which to learn and the library benefits from your contributions.

All projects require being comfortable using ArviZ and understanding the relations between its 3 main modules: plots, stats, and data. However, unless specified otherwise, no specific knowledge of inference libraries or about the internals of from_xyz converter functions is needed.

Python

Students working on Python projects should be familiar with Python, numpy, and scipy and have basic xarray/InferenceData knowledge. They should also be able to write unit tests for the added functionality using pytest and be able to enforce development conventions and use black, pylint, and pydocstyle for code style and linting.

Julia

Students working on Julia projects should be familiar with Julia, Statistics/StatsBase, Tables, and the basic of DimensionalData. They should also be able to write unit tests for the added functionality using Test.

Expected benefits of working on ArviZ

Students who work on ArviZ can expect their skillset to grow in

  • Bayesian Inference libraries
  • Bayesian modeling workflow and model criticism
  • Matplotlib and/or bokeh usage (depending on the project)
  • Xarray usage (depending on the project)
  • Numba or Dask optimization (depending on the project)

ArviZ Dashboards (Python)

The objective is to construct dashboards featuring interconnected plots to facilitate the inspection of various dimensions. The dashboards should have the capability to dynamically switch between plot types, and allow manual selection and saving of data. We will use the library Panel.

Required skills

People working on this project will need to be familiar with Panel and with ArviZ plotting and stats module.

Expected outcome

The expected outcome of this project will be the creation of at least one or two dashboards and accompanying documentation that demonstrates how they can be effectively integrated into a Bayesian workflow.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential Mentors: Ari Hartikainen, Osvaldo Martin, Andy Maloney

Plotting refactoring (Python)

We are brainstorming ideas to refactor the plotting module that would allow better composability and extensibility of ArviZ plotting. We have some prototypes at https://xrtist.readthedocs.io/en/latest/, and also some older brainstorming docs in other wiki pages: ArviZ 1.0 ideas and Plot hierarchy.

This idea has two possible subprojects: data organization and multiple backend support.

Data organization

Expected output

The expected output is API improvement suggestions, extensive testing, and documentation of PlotCollection and PlotMuseum classes.

Required skills

People working on this project should be familiar with plot facetting, grammar of graphics, and be comfortable with xarray (our plan is to use its latest features). Only basic familiarity with plotting libraries like matplotlib and bokeh is needed.

Multiple backend support

Expected output

The expected output is the implementation of several functions into the xrtist.backend module, API suggestions (particularly on which aesthetics should be part of the common interface layer) and, depending on how advanced the project is by GSoC coding period, contributing to packaging, publishing docs... for alpha releases of the library.

Required skills

People working on this project should be familiar with visual encoding of data, and be comfortable working with both matplotlib and bokeh. Only basic familiarity with xarray and ArviZ itself is needed.

Info

  • Expected size: 350h
  • Difficulty rating:
    • Data Organization: hard
    • Multiple backend support: medium
  • Potential mentors: Oriol Abril, Osvaldo Martin

Prior elicitation (Python)

PreliZ currently supports eliciation on the observed space (unidimensional) and a few experimental function on the observed space (predictive elicitation). The objective is to expand these features and make them more robust. For instance for predictive elicitation models needs to be defined in PreliZ, with only limited support to models written in other PPLs like Bambi and PyMC. An alternative route for feature expansion is to provide elicitation for multivariate distribution, mainly Dirichlet and MVNormal.

Required skills

People working on this project will need to be familiar with PreliZ and possible also ipywidgets.

Expected outcome

The expected outcome of this project will be the implementation of new features and accompanying documentation that demonstrates how they can be effectively integrated into a Bayesian workflow.

Info

  • Expected size: 350h
  • Difficulty rating: Medium
  • Potential Mentors: Osvaldo Martin

reloo for Turing models (Julia)

Sometimes approximate Leave-One-Out cross-validation (LOO-CV) gives poor estimates, and explicit cross-validation holding out a few data points is necessary for model comparison. In this project, you will design a PPL-agnostic API for performing cross-validation with resampling, implement the API for Turing models, and use it in a new reloo function.

Requirements

Familiarity with ArviZ.jl and Turing.jl.

Expected outcome

  • A PPL-agnostic API for model resampling, added to PosteriorStats.jl
  • An implementation of reloo using this API, added to PosteriorStats.jl
  • An implementation of the API for Turing models.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential Mentors: Seth Axen

Model diagnostics plots (Julia)

The objective of this project is to port several model checking, model comparison, or inference diagnostics plots from Python ArviZ to ArviZ.jl using Makie.jl.

Requirements

Familiarity with ArviZ.jl and Makie.jl.

Expected outcome

  • Implementations of several plotting recipes of the applicant’s choice in a new package ArviZMakie.jl.

Info

  • Expected size: 350h
  • Difficulty rating: medium
  • Potential Mentors: Seth Axen

Model summary plots (Julia)

The objective of this project is to port several distribution plots or mixed plots from Python ArviZ to ArviZ.jl using Makie.jl.

Requirements

Familiarity with ArviZ.jl and Makie.jl.

Expected outcome

  • Implementations of several plotting recipes of the applicant’s choice in a new package ArviZMakie.jl.

Info

  • Expected size: 350h
  • Difficulty rating: medium
  • Potential Mentors: Seth Axen