Use the AbstractWeights system #2

nalimilan · 2017-11-28T09:54:02Z

I've just discovered this package. Very interesting!

I have a suggestion regarding the handling of weights: it would make sense to use the AbstractWeights types defined in StatsBase rather than the custom normalize argument. That would help making the ecosystem consistent, and increase the clarity of the definition of weights. We've used the same terminology as Stata so that people can more easily find references about them.

The text was updated successfully, but these errors were encountered:

lbittarello · 2017-11-30T16:58:12Z

Great idea!

I've been thinking about the best way to handle missing weight data. Microdata should maybe accept special keywords aweights, fweights and pweights, which take a string and create the corresponding weight vector from non-missing data in the DataFrame. Does it sound reasonable?

Would it be a good idea to make weight types a parameter? All estimation commands currently check if the Microdata has weights, like so:

r2(obj::Micromodel) = (checkweight(obj) ? _r2(obj, getvector(obj, :weight)) : _r2(obj))

Parametrizing Microdata would facilitate dispatch. I'm not sure which approach is more efficient.

nalimilan · 2017-11-30T17:04:31Z

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get. That way, people can choose the type once for all when creating a dataset, and don't need to repeat it.

Regarding dispatch, a possible trick is to use a UnitWeights pseudo-vector type internally, which would return 1 for all observations (see JuliaStats/StatsBase.jl#135). That way you can handle the unweighted case just like the weighted cases, without any special code.

lbittarello · 2017-11-30T17:26:54Z

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get.

Such that a user would pass weights = fweights(DF[:weight)? What if there are missing observations in DF[:weight]?

nalimilan · 2017-11-30T17:32:32Z

IIRC, missing values are not allowed in weight vectors, they should be set to 0 instead. Do you know cases where it's legitimate to have missing weights?

lbittarello · 2017-11-30T17:42:22Z

As far as I understand, we should drop observations with missing weight data (or, equivalently, give them zero weight).

My point is: If the user must pass weights = fweights(DF[:weight]), they will have to check DF[:weight] and replace missing weights before creating the Microdata. If the user must pass fweights = "weight" or fweights = :weight, we can internally check DF[:weight] for them and drop offending observations before creating the ModelFrame.

nalimilan · 2017-11-30T20:13:44Z

If in your experience missing weights are so common, I guess we could allow them with *weights functions. Do you have cases in mind? In the kind of databases I use, weights are never missing.

lbittarello · 2017-12-05T04:09:01Z

I don't think that missing weights are common. We can leave it as it is.

I've updated the package with improved weight management based on StatsBase. I'll soon update the documentation.

nalimilan · 2017-12-05T22:27:18Z

Cool!

lbittarello added the enhancement label Nov 30, 2017

lbittarello closed this as completed Dec 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the AbstractWeights system #2

Use the AbstractWeights system #2

nalimilan commented Nov 28, 2017

lbittarello commented Nov 30, 2017

nalimilan commented Nov 30, 2017

lbittarello commented Nov 30, 2017 •

edited

Loading

nalimilan commented Nov 30, 2017

lbittarello commented Nov 30, 2017

nalimilan commented Nov 30, 2017

lbittarello commented Dec 5, 2017

nalimilan commented Dec 5, 2017

Use the AbstractWeights system #2

Use the AbstractWeights system #2

Comments

nalimilan commented Nov 28, 2017

lbittarello commented Nov 30, 2017

nalimilan commented Nov 30, 2017

lbittarello commented Nov 30, 2017 • edited Loading

nalimilan commented Nov 30, 2017

lbittarello commented Nov 30, 2017

nalimilan commented Nov 30, 2017

lbittarello commented Dec 5, 2017

nalimilan commented Dec 5, 2017

lbittarello commented Nov 30, 2017 •

edited

Loading