You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 9, 2024. It is now read-only.
Right now, a feature extraction is limited to the use of a single language. For example, revscoring.features.diff.badwords_added depends on the language utility languages.is_badword. as a result, a feature list can only have a count of "badwords_added" as identified by one "language". The result is that we have a lot of mixture in our badwords sets and we're not poised to support multi-lingual wikis like Commons and WikiData.
I propose that we convert the concept of a languages from a context (in which feature extraction happens) to a feature set with the necessary context baked in. This would mean that we can use multiple language features in parallel. E.g.
This would also mean that we wouldn't need to associate a revscoring.languages.Language with a model -- just the set of features that were used to build the model. That would substantially reduce the complication and potential mistakes involved in generating and using model files.
The text was updated successfully, but these errors were encountered:
Right now, a feature extraction is limited to the use of a single language. For example,
revscoring.features.diff.badwords_added
depends on the language utilitylanguages.is_badword
. as a result, a feature list can only have a count of "badwords_added" as identified by one "language". The result is that we have a lot of mixture in our badwords sets and we're not poised to support multi-lingual wikis like Commons and WikiData.I propose that we convert the concept of a languages from a context (in which feature extraction happens) to a feature set with the necessary context baked in. This would mean that we can use multiple language features in parallel. E.g.
This would also mean that we wouldn't need to associate a
revscoring.languages.Language
with a model -- just the set of features that were used to build the model. That would substantially reduce the complication and potential mistakes involved in generating and using model files.The text was updated successfully, but these errors were encountered: