Add mxnet.text APIs #8763

astonzhang · 2017-11-22T02:07:12Z

Description

Add mxnet.text APIs (new features). This is intended to be used in natural language processing applications.

Text processing utilities.
- count_tokens_from_str
Text indexer class
- Build indices for the unknown token, reserved tokens, and input counter keys. Indexed tokens can be used by instances of mxnet.text.embeddings.TextEmbedding, such as instances of mxnet.text.glossary.Glossary.
Text pre-trained embedding class.
- This is the text embedding base class. To load text embeddings from an externally hosted pre-trained text embedding file, such as those of GloVe and FastText, use TextEmbedding.create(embedding_name, pretrained_file_name). To get all the available embedding_name and pretrained_file_name, use TextEmbedding.get_embedding_and_pretrained_file_names().
- Alternatively, to load embedding vectors from a custom pre-trained text embedding file, use mxnet.text.embeddings.CustomEmbedding.
- For the same token, its index and embedding vector may vary across different instances of mxnet.text.embedding.TextEmbedding.
GloVe pre-trained text embedding
- GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. (Source from https://nlp.stanford.edu/projects/glove/)
The fastText pre-trained text embedding
- FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. (Source from https://fasttext.cc/)
Custom pre-trained text embedding
- This is to load embedding vectors from a user-defined pre-trained text embedding file.
Text glossary class.
- This provides indexing and embedding for text and special tokens in a glossary. For each indexed token in a glossary, an embedding vector will be associated with the it. Such embedding vectors can be loaded from externally hosted or custom pre-trained text embedding files, such as via instances of mxnet.text.embedding.TextEmbedding.

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Text utils, tests, and API doc
Text indexer, tests, and API doc
Text embedding, tests, and API doc
FastText embedding, tests, and API doc
Glove embedding, tests, and API doc
Custom embedding, tests, and API doc
Text glossary, tests, and API doc
get_registry(base_class) in mxnet.registry and API doc. Tested in the test cases of Text embeddings.

Comments

New feature: this is the first version of mxnet.text APIs.
Extend use cases in mxnet.registry

szha · 2017-11-22T02:47:27Z

python/mxnet/text/__init__.py

+"""Text utilities."""
+
+from . import text
+from .text import *


I just follow images/images. Which one is better?

I think these are utility functions, so a name space like mx.text.utils seems to be a good fit.

Completely agree. Maybe images.images needs to change to images.utils

szha · 2017-11-22T02:53:42Z

python/mxnet/text/text.py

+
+
+def count_tokens_from_str(tokens, token_delim=" ", seq_delim="\n",
+                          to_lower=False):


consider adding the counter as an optional argument, with the default value being an empty counter. this way, the same function can be used to either create new counter or update existing counter, which effectively removes the assumption of having to store a whole corpus in memory.

piiswrong · 2017-11-22T21:01:13Z

These looks too adhoc to be put in to the repo.

I'm considering making a text preprocessing package similar to part of NLTK and pytorch's text package.

szha · 2017-11-22T21:05:27Z

Aston is already doing it.

KellenSunderland · 2017-12-07T07:01:03Z

tests/python/unittest/test_text.py

+    seqs = _get_test_str_of_tokens(token_delim, seq_delim)
+
+    with open(os.path.join(path, '1.txt'), 'w') as fout:
+        fout.write(seqs)


Please try to mock all file operations in unit tests. See https://docs.python.org/3/library/unittest.mock.html#mock-open.

I tried to be consistent with our existing unit tests:
https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_image.py#L108

KellenSunderland · 2017-12-07T07:09:47Z

tests/python/unittest/test_text.py

+
+from mxnet.test_utils import *
+from mxnet.text import utils as tu
+from mxnet.text import glossary as glos


Are there some missing tests for the Glossary class?

Yes, working in progress.

marcoabreu · 2017-12-08T10:50:09Z

Hello @astonzhang, please rebase your PR.

szha · 2017-12-08T19:38:09Z

python/mxnet/text/__init__.py

+from . import utils
+from .utils import *
+from . import glossary
+from .glossary import *


Remove from .glossary import * unless the classes are intended to exist in mx.text namespace.

Thanks. I probably will import it because mxnet.text.Embedding, mxnet.text.Glossary looks fine and are consistently used in the documentation.

If so, then I think the utils and glossary namespaces are unnecessary. There’s no point in keeping both.

marcoabreu · 2017-12-08T20:39:15Z

@astonzhang Again, please rebase your PR as it creates invalid CI requests

astonzhang · 2017-12-08T22:42:25Z

@marcoabreu resolved.

szha · 2017-12-09T00:22:32Z

python/mxnet/text/glossary.py

+    top_k_freq : None or int, default None
+        The number of top frequent tokens in the keys of `counter` that will be
+        indexed. If None, all the tokens in the keys of `counter` will be
+        indexed.


What’s the behavior when counter size is smaller than k?

szha · 2017-12-09T00:28:29Z

python/mxnet/text/glossary.py

+    ----------
+    counter : collections.Counter
+        Counts text and special token frequencies in the text data, where
+        special token frequency is clear to zero.


Given that this came from user, it’s probably not necessary to expose the counter as a property. Otherwise we need to define how this property should change based on topk or other constructor arguments, and define how mutating this property means.

szha · 2017-12-09T00:33:03Z

python/mxnet/text/glossary.py

+        assert self.idx_to_vec is not None, \
+            'mxnet.text.Glossary._idx_to_vec has not been initialized.  Use ' \
+            'mxnet.text.Glossary.__init__() or ' \
+            'mxnet.text.Glossary.set_idx_to_embed() to initialize it.'


The assertion message includes internal implementation details. Consider removing such reference and change to something like “Glossary has not been initialized. Do X...”

szha · 2017-12-09T00:42:15Z

python/mxnet/text/glossary.py

+                                 'token, please specify it explicitly as the '
+                                 'unknown special token %s in tokens. This is '
+                                 'to avoid unintended updates.' %
+                                 (token, self.idx_to_token[Glossary.unk_idx()]))


It’s common to use one set (i.e. training set) for generating the vocabulary and reuse the same vocabulary on another set for indexing. Returning the index for unknown and warning the user would likely make this interface easier to use.

Thanks. Since update_idx_to_vec is a member function of a specific Glossary instance, it is fair to assume that the input tokens match the token indices of this specific Glossary instance.

szha · 2017-12-09T00:44:08Z

python/mxnet/text/glossary.py

+        self._idx_to_vec[nd.array(indices)] = new_vectors
+
+
+class Embedding(object):


Given that embedding is already the name of layer in gluon, using the same name could result in unintended name collision, especially when doing wild-card imports. Consider using another name.

szha · 2017-12-09T19:26:41Z

python/mxnet/text/glossary.py

+        # data format is changed.
+        assert check_sha1(download_file_path, expected_download_hash), \
+            'The downloaded file %s does not match its expected SHA-1 ' \
+            'hash. This is caused by the changes at the externally ' \


“This is caused” -> “This is likely caused” since failed/partial/interrupted download can also cause this.

Thanks. The immediately previous if-block makes sure that it can only caused by external changes.

szha · 2017-12-09T19:29:30Z

python/mxnet/text/glossary.py

+        vector for every special token, such as an unknown token and a padding
+        token.
+        """
+        with open(pretrain_file_path, 'r', encoding='utf8') as f:


Make sure code is tested on python2

Thanks. Will test on Py2 when adding test cases.

szha · 2017-12-09T19:30:35Z

python/mxnet/text/glossary.py

+        vec_len = None
+        all_elems = []
+        idx_to_token = []
+        for line in tqdm(lines, total=len(lines)):


Let’s avoid adding dependencies such as tqdm.

Thanks. Since https://github.com/apache/incubator-mxnet/blob/master/example/gluon/tree_lstm/dataset.py uses tqdm, I assume that tqdm dependency is added. Let me know if you still prefer removing such dependency.

Resolved. (tqdm is removed)

szha · 2017-12-09T19:32:30Z

python/mxnet/text/glossary.py

+                                                     for i in elems[1:]]
+
+            if len(elems) == 1:
+                logging.warning('WARNING: Token %s with 1-dimensional vector '


Warnings.warn

piiswrong · 2017-12-12T05:01:32Z

can we move this into gluon.data and make use of Dataset?

astonzhang · 2017-12-12T15:40:43Z

Thanks. Embedding vectors is a little different from text data (training corpus data). This package is expecting all text-related utilities such as indexer and embedding.

szha · 2017-12-31T04:15:41Z

python/mxnet/text/embedding.py

+        if reserved_tokens is not None:
+            for reserved_token in reserved_tokens:
+                assert reserved_token != unknown_token, \
+                    '`reserved_token` cannot contain `unknown_token`.'


assert unknown_token not in reserved_token.

szha · 2017-12-31T04:16:54Z

python/mxnet/text/embedding.py

+            self._reserved_tokens = None
+        else:
+            # Python 2 does not support list.copy().
+            self._reserved_tokens = reserved_tokens[:]


remove comment. comment should be about what's in the code instead of what's absent.

szha · 2017-12-31T04:29:34Z

python/mxnet/text/embedding.py

+        `counter` that can be indexed. Note that this argument does not count
+        any token from `reserved_tokens`. If this argument is None or larger
+        than its largest possible value restricted by `counter` and
+        `reserved_tokens`, this argument becomes positive infinity.


this argument becomes positive infinity -> it has no effect.

So the intention of having this argument is to put a maximum size limit on the index. This would incur some complexity, especially when there is a tie in the counter. For example, suppose you want to limit it to 3 in the case where reserved_tokens = []; counter = {'a': 5, 'b': 5, 'c': 3, 'd': 3}, you would need to further clarify whether 'c' or 'd' is kept. The secondary alphabetic ordering should be documented.

szha · 2017-12-31T04:42:10Z

python/mxnet/text/embedding.py

+            # 1 is the unknown token count.
+            token_cap = 1 + len(reserved_tokens) + len(counter)
+        else:
+            token_cap = 1 + len(reserved_tokens) + most_freq_count


token_cap = 1 + len(reserved_tokens) + (most_freq_count if most_freq_count else len(counter))

szha · 2017-12-31T04:42:56Z

python/mxnet/text/embedding.py

+            token_cap = 1 + len(reserved_tokens) + most_freq_count
+
+        for token, freq in token_freqs:
+            if freq < min_freq or len(self._idx_to_token) == token_cap:


You can use for i in range(token_cap) in the loop to avoid evaluating the second condition token_cap times.

Thanks. Due to if token not in reserved_tokens: condition where self._idx_to_token length may not always self-increment by 1, we probably cannot use for i in range(token_cap) here.

szha · 2017-12-31T05:14:02Z

python/mxnet/text/embedding.py

+            'The length of new_vectors must be equal to the number of tokens.'
+        assert new_vectors.shape[1] == self.vec_len, \
+            'The width of new_vectors must be equal to the dimension of ' \
+            'embeddings of the glossary.'


assert new_vectors.shape == (len(tokens), self.vec_len)

szha · 2017-12-31T05:16:19Z

python/mxnet/text/embedding.py

+            else:
+                raise ValueError('Token %s is unknown. To update the embedding '
+                                 'vector for an unknown token, please specify '
+                                 'it explicitly as the `unknown_token` %s in '


How can a user add a new token in embedding? Should there be a separate method for that?

Thanks. Embedding class always loads from pre-trained files. User can add/set new tokens via glossary rather than via embedding.

from mxnet import ndarray as nd
y = nd.array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.]])

x = nd.array([1, 3])

%timeit y[x]
%timeit nd.Embedding(x, y, 4, y.shape[1])

196 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
83.5 µs ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

y[x] is advanced indexing which is usually slower than other indexing operations due to two reasons.

Overhead of sanity checking and preprocessing advanced indices before calling backend ops.

In this case, the backend op used is gather_nd, which is expected for retrieving scattered elements from an ndarray. For a regular shape index like [1, 3] indexing the first dimension, operators such as take or slice are much more efficient than gather_nd.

szha · 2017-12-31T05:18:38Z

python/mxnet/text/embedding.py

+        self._idx_to_vec[nd.array(indices)] = new_vectors
+
+    @staticmethod
+    def register(embed_cls):


Looks like this is following what was done in optimizer module. I think both embedding and optimizer should reuse mx.registry here instead of creating a new one. See example in mx.initializer

szha · 2017-12-31T05:18:50Z

python/mxnet/text/embedding.py

+        return embed_cls
+
+    @staticmethod
+    def create(embed_name, **kwargs):


Looks like this is following what was done in optimizer module. I think both embedding and optimizer should reuse mx.registry here instead of creating a new one. See example in mx.initializer

szha · 2017-12-31T05:24:09Z

python/mxnet/text/embedding.py

+                            ', '.join(embed_cls.pretrain_file_sha1.keys())))
+
+    @staticmethod
+    def get_embed_names_and_pretrain_files():


list_pretrained_embeddings?

Thanks. It returns a string rather than prints the string. Thus I guess "get_" is better than "list_"

szha · 2018-01-11T07:57:00Z

@astonzhang great contribution. This should help greatly reduce the commonly repeated effort of text indexing and embedding. Thanks for going through these many iterations and keeping pushing for higher quality.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

astonzhang · 2018-01-11T17:45:51Z

@szha Thank you very much for repeatedly going through this PR with me!

This reverts commit 6c1f4f7.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

This reverts commit 6c1f4f7.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

This reverts commit 6c1f4f7.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

This reverts commit 6c1f4f7.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

This reverts commit 6c1f4f7.

* Add text utils * Leftovers * revise * before load embeddings * glossary done * Add/revise text utils, revise test cases * Add docstrings * clean package init * remove play * Resolve issues and complete docstrings * disable pylint * Remove tqdm dependency * Add encoding utf8 utf utf utf * remove non-ascii * fix textcase * remove decode in glossary * py2 unicode * Fix py2 error * add tests * Test all embds * test some embeds * Add getter for glossary * remove util from path, revise interfaces of glossary * skip some test, before major revise * Add TextIndexer, only TextEmbed needs revised before major revise * before major revise * minor update * Revise TextIndexer with test * lint * lint * Revise TextEmbed, FastText, Glove, CustmonEmbed with test * Revision done except for docstr * Add unit tests for utils * almost no pylint disable, yeah * doc minor updates * re-run * re-run * except for register * except for register * Revise register/create, add get_registry * revise * More readability * py2 compatibility * Update doc * Revise based on feedbacks from NLP team * add init * Support indexing for any hashable and comparable token * Add test cases * remove type cmp * Fix doc error and add API descriptions * Fix api doc error * add members explicitly * re-order modules in text.md * url in one line * add property desc for all inherited classes for rst parsing * escape \n * update glossary example * escape \n * add use case * Make doc more user-friendly * proper imports, gluon.nn.Embedding use case * fix links * re-org link level * tokens_to_indices * to_indices, to_tokens

This reverts commit 6c1f4f7.

szha reviewed Nov 22, 2017

View reviewed changes

KellenSunderland suggested changes Dec 7, 2017

View reviewed changes

szha self-assigned this Dec 7, 2017

szha reviewed Dec 8, 2017

View reviewed changes

szha reviewed Dec 9, 2017

View reviewed changes

astonzhang changed the title ~~[WIP] Add text apis~~ Add mxnet.text APIs Dec 11, 2017

szha mentioned this pull request Dec 13, 2017

[Feature Request][Gluon] Add text processing utilities for Gluon like PyTorch/Text #8278

Closed

szha reviewed Dec 31, 2017

View reviewed changes

Zhang and others added 6 commits January 5, 2018 13:17

Add text utils

a2d6caf

clean package init

4e0b916

Leftovers

74c09d5

revise

d69339c

before load embeddings

891d81f

glossary done

360b3de

Aston Zhang added 2 commits January 10, 2018 18:03

tokens_to_indices

8a1ec2d

to_indices, to_tokens

295b9bf

szha merged commit 6c1f4f7 into apache:master Jan 11, 2018

astonzhang deleted the text branch January 12, 2018 15:39

piiswrong added a commit to piiswrong/mxnet that referenced this pull request Jan 12, 2018

Revert "Add mxnet.text APIs (apache#8763)"

565ce45

This reverts commit 6c1f4f7.

szha pushed a commit that referenced this pull request Jan 12, 2018

Revert "Add mxnet.text APIs (#8763)" (#9401)

67cfbc8

This reverts commit 6c1f4f7.

szha mentioned this pull request Jan 12, 2018

Text API (Resubmit #8763 #9394) #9406

Merged

7 tasks

szha added a commit to szha/mxnet that referenced this pull request Jan 12, 2018

resubmit apache#8763 apache#9394 to contrib

b86d1f6

szha added a commit to szha/mxnet that referenced this pull request Jan 12, 2018

resubmit apache#8763 apache#9394 to contrib

1545f01

szha added a commit to szha/mxnet that referenced this pull request Jan 12, 2018

resubmit apache#8763 apache#9394 to contrib

8635899

szha added a commit to szha/mxnet that referenced this pull request Jan 12, 2018

resubmit apache#8763 apache#9394 to contrib

17a9343

szha added a commit to szha/mxnet that referenced this pull request Jan 12, 2018

resubmit apache#8763 apache#9394 to contrib

90a0eb0

piiswrong pushed a commit that referenced this pull request Jan 13, 2018

resubmit #8763 #9394 to contrib (#9406)

e4957c3

CodingCat pushed a commit to CodingCat/mxnet that referenced this pull request Jan 16, 2018

Revert "Add mxnet.text APIs (apache#8763)" (apache#9401)

1a46fbe

This reverts commit 6c1f4f7.

CodingCat pushed a commit to CodingCat/mxnet that referenced this pull request Jan 16, 2018

resubmit apache#8763 apache#9394 to contrib (apache#9406)

a826dc5

yuxiangw pushed a commit to yuxiangw/incubator-mxnet that referenced this pull request Jan 25, 2018

Revert "Add mxnet.text APIs (apache#8763)" (apache#9401)

bfd3688

This reverts commit 6c1f4f7.

yuxiangw pushed a commit to yuxiangw/incubator-mxnet that referenced this pull request Jan 25, 2018

resubmit apache#8763 apache#9394 to contrib (apache#9406)

ed81686

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Revert "Add mxnet.text APIs (apache#8763)" (apache#9401)

f76c97f

This reverts commit 6c1f4f7.

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

resubmit apache#8763 apache#9394 to contrib (apache#9406)

35f0d23

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

Revert "Add mxnet.text APIs (apache#8763)" (apache#9401)

c52c90b

This reverts commit 6c1f4f7.

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

resubmit apache#8763 apache#9394 to contrib (apache#9406)

d350fcc



		def count_tokens_from_str(tokens, token_delim=" ", seq_delim="\n",
		to_lower=False):

		self._idx_to_vec[nd.array(indices)] = new_vectors


		class Embedding(object):

Add mxnet.text APIs #8763

Add mxnet.text APIs #8763

Conversation

astonzhang commented Nov 22, 2017 • edited by szha Loading

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Nov 22, 2017

szha commented Nov 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu commented Dec 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu commented Dec 8, 2017

astonzhang commented Dec 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha Dec 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Dec 12, 2017

astonzhang commented Dec 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha commented Jan 11, 2018

astonzhang commented Jan 11, 2018

astonzhang commented Nov 22, 2017 •

edited by szha

Loading

szha Dec 9, 2017 •

edited

Loading