Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayes Factor Estimation #665

Closed
cameron-sql opened this issue Apr 7, 2023 · 5 comments
Closed

Bayes Factor Estimation #665

cameron-sql opened this issue Apr 7, 2023 · 5 comments

Comments

@cameron-sql
Copy link

Hi,

I've been a PyMC user for a while who has been getting into Bambi a bit more (and loving it, really great stuff!). In PyMC at the moment, if I want to calculate the Bayes factor between two models, I would use the:

trace = pymc.smc.sample_smc

function and access the marginal likelihoods like:

trace.report.marginal_likelihood.

Is there a similar functionality in Bambi or is this reliant on the use of the PyMC Sequential Monte Carlo sampler? Given two Bambi fit() functions, can the resulting ArviZ inference objects even be used to calculator marginal likelihoods / Bayes factors?

Thank you so much for your consideration and time.

@ColCarroll
Copy link
Collaborator

Hey! I don't know much about bayes factors, but I might point you towards this work from @karink520 (building off some of @junpenglao's work). It has been held up for a while trying to figure out the right library to live in and the abstractions that library would need to support, but it should mostly work with just posterior samples (and access to the transformations that were used).

@cameron-sql
Copy link
Author

Hey @ColCarroll !

Thanks so much for sending this my way, I was having a look for something similar but hadn't had much luck. I am going to give it a go this afternoon, I really appreciate it. :)

@tomicapretto
Copy link
Collaborator

@cameron-sql I'm not very familiar with bayes factors either. If you can share an example of what you do in PyMC I could give more help.

However, if you know how to get what you do with a PyMC model, you can also do that with a Bambi model. A Bambi model always holds an instance of a PyMC model in model.backend.model where the first model is the model you got with model = bmb.Model(...)

So if you want to use SMC (which I'm not familiar with) you can access the underlying PyMC model with model.backend.model

@aloctavodia
Copy link
Collaborator

What kind of model do you have in mind?

ArviZ supports Computing Bayes Factor https://python.arviz.org/en/stable/api/generated/arviz.plot_bf.html from the docs

Approximated Bayes Factor for comparing hypothesis of two nested models.
The Bayes factor is estimated by comparing a model (H1) against a model in which the parameter of interest has been restricted to be a point-null (H0). This computation assumes the models are nested and thus H0 is a special case of H1.

The main motivation for the SMC implementation in PyMC was dealing with multimodal posteriors and models for which gradients were not available making NUTS an invalid option. SMC uses an (Independent Metropolis-Hastings) kernel, while usually it is much better than the MH sampler, still inherits some of its limitations and compared to NUTS it can have a harder time fitting some complex geometries like we usually observed for hierarchical models. To some extent, this can be alleviated by increasing the number of draws/particles. We are working on bringing to PyMC, an SMC with a Hamiltonian Monte Carlo kernel, this will make SMC more robust.

@cameron-sql
Copy link
Author

Hey @tomicapretto and @aloctavodia, thanks for getting back to me -- sorry, I don't check my account as much as I should!

I didn't know about the backend using model.backend.model, thank you so much for bringing that to my attention. That really has quite a lot of functionality! I am continually impressed by how much Bambi has built in. Transitioning my work from R to Python has been really great because of it.

Similarly, thank you so much for the reference to the ArviZ documentation, that is actually exactly what I was looking for! In terms of the model / data, it is nothing very high level, we just have many separate samples of financial data that we are looking to compare to a larger reference population. We have been exploring a couple of different approaches and this was brought up as a potential option.

Thank you all again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants