Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the AbstractWeights system #2

Closed
nalimilan opened this issue Nov 28, 2017 · 8 comments
Closed

Use the AbstractWeights system #2

nalimilan opened this issue Nov 28, 2017 · 8 comments

Comments

@nalimilan
Copy link

I've just discovered this package. Very interesting!

I have a suggestion regarding the handling of weights: it would make sense to use the AbstractWeights types defined in StatsBase rather than the custom normalize argument. That would help making the ecosystem consistent, and increase the clarity of the definition of weights. We've used the same terminology as Stata so that people can more easily find references about them.

@lbittarello
Copy link
Owner

Great idea!

I've been thinking about the best way to handle missing weight data. Microdata should maybe accept special keywords aweights, fweights and pweights, which take a string and create the corresponding weight vector from non-missing data in the DataFrame. Does it sound reasonable?

Would it be a good idea to make weight types a parameter? All estimation commands currently check if the Microdata has weights, like so:

r2(obj::Micromodel) = (checkweight(obj) ? _r2(obj, getvector(obj, :weight)) : _r2(obj))

Parametrizing Microdata would facilitate dispatch. I'm not sure which approach is more efficient.

@nalimilan
Copy link
Author

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get. That way, people can choose the type once for all when creating a dataset, and don't need to repeat it.

Regarding dispatch, a possible trick is to use a UnitWeights pseudo-vector type internally, which would return 1 for all observations (see JuliaStats/StatsBase.jl#135). That way you can handle the unweighted case just like the weighted cases, without any special code.

@lbittarello
Copy link
Owner

lbittarello commented Nov 30, 2017

It would probably be better to have a single weights argument, and choose the kind of weights depending on the kind of AbstractVector subtype you get.

Such that a user would pass weights = fweights(DF[:weight)? What if there are missing observations in DF[:weight]?

@nalimilan
Copy link
Author

IIRC, missing values are not allowed in weight vectors, they should be set to 0 instead. Do you know cases where it's legitimate to have missing weights?

@lbittarello
Copy link
Owner

As far as I understand, we should drop observations with missing weight data (or, equivalently, give them zero weight).

My point is: If the user must pass weights = fweights(DF[:weight]), they will have to check DF[:weight] and replace missing weights before creating the Microdata. If the user must pass fweights = "weight" or fweights = :weight, we can internally check DF[:weight] for them and drop offending observations before creating the ModelFrame.

@nalimilan
Copy link
Author

If in your experience missing weights are so common, I guess we could allow them with *weights functions. Do you have cases in mind? In the kind of databases I use, weights are never missing.

@lbittarello
Copy link
Owner

I don't think that missing weights are common. We can leave it as it is.

I've updated the package with improved weight management based on StatsBase. I'll soon update the documentation.

@nalimilan
Copy link
Author

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants