Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]Distributed TF-IDF Tranformer #2598

Closed
VibhuJawa opened this issue Jul 23, 2020 · 0 comments · Fixed by #2698
Closed

[FEA]Distributed TF-IDF Tranformer #2598

VibhuJawa opened this issue Jul 23, 2020 · 0 comments · Fixed by #2698
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@VibhuJawa
Copy link
Member

Is your feature request related to a problem? Please describe.

We should have a distributed TF-IDF Transformer coupled with our hashing-vectorizer will be a useful feature for distributed text pre-processing.

Additional context
We will probably need to get below working for a distributed sparse vector everything else can be the same as the existing implementation.

def _sparse_document_frequency(X):
"""Count the number of non-zero values for each feature in sparse X."""
if cp.sparse.isspmatrix_csr(X):
return cp.bincount(X.indices, minlength=X.shape[1])
else:
return cp.diff(X.indptr)

@VibhuJawa VibhuJawa added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant