Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute pointwise log-likelihood for each observation #1300

Closed
DavAug opened this issue Feb 21, 2021 · 7 comments
Closed

Compute pointwise log-likelihood for each observation #1300

DavAug opened this issue Feb 21, 2021 · 7 comments
Labels

Comments

@DavAug
Copy link
Member

DavAug commented Feb 21, 2021

ArviZ provides a simple API to compute the LOO or WAIC for performance assessment of models, see https://arviz-devs.github.io/arviz/api/generated/arviz.waic.html.

What this would require however is the pointwise log-likelihood scores of the parameters in a chain for each observation. So for N obervations and M iterations and K chains, we would need to store NMK log-pdf values.

The computationally most efficient way to generate the pointwise log-likelihoods would potentially be to store the while running the chain before summing them up across observations. That would require some changes in our pints.LogPDF, pints.LogPosterior and the pints.MCMCSampler / pints.MCMCController though.

Alternatively, we could consider to implement a routine that takes the LogPDF of a problem and the chains and then computes the log-pdfs for the observations again. This would still require us to implement an additional method for the LogPDFs which returns the pointwise log-pdfs.

@DavAug DavAug added the feature label Feb 21, 2021
@MichaelClerx
Copy link
Member

Discussed in meeting today:

  • Pointwise log-likelihood = log likelihood of every point in a ProblemLogLikelihood, before summing
  • Logical entry to add this in would be somewhere in ProblemLogLikelihood ?

@ben18785
Copy link
Collaborator

I've actually realised that Stan doesn't save a point-wise log-likelihood as it runs. Instead, it computes it afterwards using each posterior sample. I think, however, that we should probably try to improve on this since our models are generally more expensive to run.

@Rebecca-Rumney
Copy link

Rebecca-Rumney commented Mar 5, 2021

I've been looking into how to do this and this is my idea:
It requires changing the __call__ function of each ProblemLogLikelihood and adding 2 new functions so that:

def __call__(self, x):
    pointwise = self.create_pointwise_loglikelihoods(x)
    self._last_pointwise_loglikelihoods = pointwise
    return np.sum(pointwise)

def create_pointwise_loglikelihoods(self, parameters):
    """
    Returns a matrix of size nt x no containing the log likelihood of each observation and at each time point 
    with the given parameters
    """

def get_last_pointwise_loglikelihoods(self):
    return self._last_pointwise_loglikelihoods

This allows there to be not much change to code already written but if you want to get the pointwise log likelihoods using the ask and tell interface you use get_last_pointwise_loglikelihoods at each step without doing the calculations again. I believe this will also work with using the LogPosterior or similar for the telling. You can also choose to do it the stan way as well if you need to, using the create_pointwise_loglikelihoods.

@DavAug
Copy link
Member Author

DavAug commented Mar 5, 2021

I think this looks really good and fits very nicely into the pints interface @Rebecca-Rumney !

A little bit unrelated to the API, I am wondering whether it is actually a good idea to store the pointwise log-pdfs always, as for large autocorrelations we may want to throw out a majority of the samples and the memory requirements can be quite large for larger datasets (so we might not actually save the energy needed for the computation as we need it for storage). So it's probably good to be able to switch storing of the pointwise log-pdfs off if we want. But I guess that will be a switch in the MCMCController?

@Rebecca-Rumney
Copy link

Rebecca-Rumney commented Mar 5, 2021

@DavAug That's a good point. What I've written there only saves the last step's log-likelihoods (so of size N) rather than the whole N x M x K matrix and it is up to the user to store it somewhere. I'm personally not sure how large N is likely to get. If we have it as an option to turn on then it may make it harder to access if we are only calling for the posterior. We would then have to alter LogPosterior and anything else that calls the log posterior to have an option of saving the pointwise likelihoods.

@DavAug
Copy link
Member Author

DavAug commented Mar 6, 2021 via email

@MichaelClerx
Copy link
Member

Good start! But probably it'd be more efficient to have __call__ just assume you don't want to save, and have some alternative method like evaluateS1 that can be called if you really want to store each sample?

(I imagine there's some loss of performance if we do this by default, but we might want to benchmark that)

@pints-team pints-team locked and limited conversation to collaborators Jun 23, 2024
@MichaelClerx MichaelClerx converted this issue into discussion #1672 Jun 23, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

4 participants