Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weights in multinomial #142

Open
nilsbecker opened this issue Feb 5, 2016 · 8 comments
Open

weights in multinomial #142

nilsbecker opened this issue Feb 5, 2016 · 8 comments

Comments

@nilsbecker
Copy link

a nitpick: the weights vector is summed up and tested for summing to 1. if you're gonna sum it up anyway, why not allow arbitrary positive weights and normalize the weight vector?

oml/src/lib/stats/sampling.ml line 64

@rleonid
Copy link
Owner

rleonid commented Feb 6, 2016

I think that you're raising a good point, but in this case the weight array represents the probabilities of a multinomial distribution, so I think the correct 'contract' for this method to enforce would be to check that the weights are between 0 and 1 (and sum to 1). For comparison softmax allows arbitrary weights because the method is one way of transforming an arbitrary set of values into a probability distribution.

I've been thinking a lot recently about making sure that the type interface is as accurate as possible to the underlying mathematical representation (without going "full" Haskell on the library). How can we make this clearer?

  • Could we define a probability type so that multinomial becomes val multinomial : ?seed:int array -> probability array -> int generator
  • Maybe renaming "weights" to "probabilities" in the documentation?

@nilsbecker
Copy link
Author

hmm, i do see your point of enforcing the input to be a type that it naturally should be. i was more thinking about efficiency. now, the client has to generate the weights, and presumably, has to normalize them to 1 to comply with the interface. then the function does the same computation again. waste of effort.

i guess one could invent a weightvector type which when constructed, normalizes itself. sounds reasonable. and it would obviate the need to check again in the function. but this could not be a 'a array since there would be no way to enforce the normalization (right?). at the same time one would probably also want to do some arithmetic with such a normalized weight vector. for that maybe an array_of_weightvector would be enough. or a private type? (i don't have experience with those)

@rleonid
Copy link
Owner

rleonid commented Feb 7, 2016

Those are good points, let me ruminate on a solution. I think that a private float array might be what we're looking for.

@rleonid
Copy link
Owner

rleonid commented Feb 7, 2016

@nilsbecker Check out the prob_module branch:

Specifically there is now a method Probability.normalize to wrap the weights, so that multinomial now takes a Probability.s.

There is probably a bit more work of adding methods to the main Probability module

  • all the uses in Classification should go here
  • convenience functions like map2 and fold2 for using these weights.
    but in general, I think a private float array should work here and allow users to want more flexibility in how they define these weights to access them via casting.

@struktured Any thoughts?

@struktured
Copy link
Contributor

I have thoughts but only somewhat cohesive. Maybe you'll find some of it useful. I like the idea of a probability module. I've implemented a few over the years in other languages, although I'd hesitate to say I've ever seen a perfect version of one in code yet. My favorite representation is typically

  • polymorphic
  • immutable
  • sorted by likelihood
  • preserves a scale value

I understand why making it polymorphic could be troublesome for consuming apis but I really do enjoy a strong coupling between datatypes and probabilities. It's also very human readable. I rather see a Coin.t Probability.t opposed to some opaque indexed structure.

Sorted by likelihood is probably too expensive for large arrays, but maybe a convenience function to copy / transform the existing weight vector into a sorted structure when the user needs to would be helpful.

Also another thing to consider I as write this is that what you defined is a discrete multinomial probability vector, but a probability itself could be some sort of density function, if we want to pedantic about it. Maybe probability should be an abstract sig for which multinomial is one such implementation?

Preserving the scale value means keeping the weights always normalized, but maintaining one extra constant to rescale the weight vector into frequency counts. The pro of this is the vector is always normalized and thus cheap to query. The con is updating the distribution is more expensive (eg. rescale to frequencies, add the new frequency counts, unscale) and capable of numerical instability.
And do we actually want to preserve the scale at all in a probability? Personally I find it more convenient to have some way to update the distribution when a new observation is given, and you must have it unnormalized or the scale factor must be preserved for this to possible, although the information doesn't have to be bound to the weight vector.

And this leads to final thought on this- should we have different "views" of a distribution, with types to indicate this? You could think of one representation leading itself well to mutation- imagine a distribution being repeatedly updated every time some event is fired, but that you want to take a snapshot of the distribution at some time slice and normalize it. Eg. you query a "distribution" for a (normalized?) "probability"- the distribution is the unnormalized entity and the probability is a normalized view.

Regarding map2/fold2, I also really think filter and filter_map are quite useful for probability vectors (in the normalized version filter should maintain a sum of 1.0 of course...)

Side note: @rleonid Great talk at compose on thursday!

@rleonid
Copy link
Owner

rleonid commented Feb 9, 2016

We might be talking about 2 different modules. I am proposing that

  • Probability have two types t (for one prob) and an s (for many).
  • a Distribution would encapsulate probability distributions and probably return the appropriate types from the Probability module.

Some of the issues that this raises:

  1. It makes the signatures for some of the methods obvious.
  2. It allows us to have one place to contain the float -> Probability.t transformation. Leading to hopefully (a) more efficient code and (b) more correct code.
  3. It forces Oml user to use provided functions or
  4. Access the underlying float or float array via casting from private types.
  5. It will have knock-on effects in things like Distribution or the Probability logic found in Classification.

@struktured
Copy link
Contributor

So one of my points was do you want probability to be (scaleValue * float array) where sum(float array)=1, or an unnormalized float array? Or just a normalized float array? What is the value of having it unnormalized by default? What types of algorithms will benefit from this? In order to sample, you would at least need a scale factor to avoid doing a full pass on it.

As I noted before, one advantage to unnormalized is its easier to update the distribution when getting new observations. But if that's not the case here, it makes for a stronger case to normalize it.

@rleonid
Copy link
Owner

rleonid commented Feb 9, 2016

I'm imagining just a normalized float array. By default we should not provide a mechanisms to update this distribution (so map or filter wouldn't make sense as they might break the normalization) . We would keep the type private so if someone really wanted to, they could change the values back to a regular float array with casting. Scanning through the code I think we change the signature for

  • Sampling.multinomail (already present in the prob_module branch).
  • Distributions.dirichlet_pdf
  • Distributions.ln_dirichlet_pdf

We would have an analogous type Probability.t, all of the _cdf functions in Distributions.t would return that.

Lastly, we should add a "non-parametric" distribution data structure to accommodate polymorphic "pdf"'s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants