Following the Zipf distribution, the frequency of the
Therefore, given
Given
Then, finding the index of the words which are drawn the same number of times
as less than
Let’s note
This is a 2nd degree polynomial which roots are:
Therefore, given the initial approximation, the index of the first word (ranked
by frequency) that belongs to a set of at least
For