Sorted containers -> new iteration protocol

On branch newiterationsortedcontainers Changes to be committed: modified: ../docs/src/sorted_containers.md modified: DataStructures.jl modified: balanced_tree.jl modified: container_loops.jl modified: ../test/test_sorted_containers.jl This commit updates container_loops.jl to use the new iteration protocol (introduced in 0.7.0-DEV). It should be backwards compatible with 0.6.2. In addition, it fixes a bug in container_loops.jl in which the length() function when applied to subranges (i.e., inclusive(a,b,c) or exclusive(a,b,c)) returned the length of the whole container instead of the length of the subrange. (There should be no value returned for the length of the subrange since the data structure does not support an O(1) algorithm or even O(log n) algorithm to compute the length.) Some other smaller changes in this commit are as follows. - IntSet was changed to BitSet (name change in 0.7.0-DEV) - Small updates to documentation - Some assert statements in balanced_tree.jl that were present during development are deleted. Add more tests for new length and eltype methods
JuliaCollections · May 25, 2018 · 43425a7 · 43425a7
1 parent a8d9477
commit 43425a7
Show file tree

Hide file tree

Showing 6 changed files with 423 additions and 180 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,6 @@
 doc/build
 docs/build/
 docs/site/
+.gitignore
+*~
+
diff --git a/docs/src/sorted_containers.md b/docs/src/sorted_containers.md
@@ -71,7 +71,7 @@ then there may be a loss of performance compared to:
 k,v = deref((sc,st))
 ```
 
-because the former needs an extra heap allocation step for `tok`.
+because the former may need an extra heap allocation step for `tok`.
 
 The notion of token is similar to the concept of iterators used by C++
 standard containers. Tokens can be explicitly advanced or regressed
@@ -406,8 +406,8 @@ past-end token. Time: O(1)
 ## Iteration Over Sorted Containers
 
 As is standard in Julia, iteration over the containers is implemented
-via calls to three functions, `start`, `next` and `done`. It is usual
-practice, however, to call these functions implicitly with a for-loop
+via calls to the function `Base.iterate`. It is usual
+practice, however, to call this function implicitly with a for-loop
 rather than explicitly, so they are presented here in for-loop notation.
 Internally, all of these iterations are implemented with semitokens that
 are advanced via the `advance` operation. Each iteration of these loops
@@ -454,11 +454,13 @@ end
 ```
 
 Here, `st1` and `st2` are semitokens that refer to the container `sc`.
+Token `(sc,st1)` may not be the before-start token and
+token `(sc,st2)` may not be the past-end token.  
 It is acceptable for `(sc,st1)` to be the past-end token or `(sc,st2)`
-to be the before-start token (in these cases, the body is not executed).
+to be the before-start token or both (in these cases, the body is not executed).
 If `compare(sc,st1,st2)==1` then the body is not executed. A second
-calling format for `inclusive` is `inclusive(sc,(st1,st2))`. One purpose
-for second format is so that the return value of `searchequalrange` may
+calling format for `inclusive` is `inclusive(sc,(st1,st2))`. With
+the second format, the return value of `searchequalrange` may
 be used directly as the second argument to `inclusive`.
 
 One can also define a loop that excludes the final item:
@@ -771,10 +773,10 @@ Lt((x,y) -> isless(lowercase(x),lowercase(y)))
 The ordering object is indicated in the above list of constructors in
 the `o` position (see above for constructor syntax).
 
-This approach suffers from a performance hit (10%-50% depending on the
-container) because the compiler cannot inline or compute the correct
-dispatch for the function in parentheses, so the dispatch takes place at
-run-time. A more complicated but higher-performance method to implement
+This approach suffers may suffer from a performance hit because
+higher performance may be possibility if equality is available
+as well as less-than.
+A more complicated but higher-performance method to implement
 a custom ordering is as follows. First, the user creates a singleton
 type that is a subtype of `Ordering` as follows:
 
@@ -798,7 +800,7 @@ container also needs an equal-to function; the default is:
 eq(o::Ordering, a, b) = !lt(o, a, b) && !lt(o, b, a)
 ```
 
-For a further slight performance boost, the user can also customize this
+The user can also customize this
 function with a more efficient implementation. In the above example, an
 appropriate customization would be:
 

diff --git a/src/DataStructures.jl b/src/DataStructures.jl
@@ -16,6 +16,12 @@ module DataStructures
                  union, intersect, symdiff, setdiff, issubset,
                  searchsortedfirst, searchsortedlast, in
 
+    if VERSION >= v"0.7.0-DEV.5126"
+        import Base: iterate, IteratorSize, HasLength, SizeUnknown,
+                   IteratorEltype, HasEltype
+    end
+
+
     using Compat
     using Compat.InteractiveUtils # for methodswith
     import Compat: lastindex, pushfirst!, popfirst!

diff --git a/src/balanced_tree.jl b/src/balanced_tree.jl
@@ -17,7 +17,7 @@
 ##  d: the data of the node
 ##  parent: the tree leaf that is the parent of this
 ##    node.  Parent pointers are needed in order
-##    to implement indices.
+##    to implement tokens.
 ##  There are two constructors, the standard one (first)
 ##  and the incomplete one (second).  The incomplete constructor
 ##  is needed because when the data structure is first created,
@@ -80,14 +80,14 @@ end
 
 
 ## Type BalancedTree23{K,D,Ord} is 'base class' for
-## SortedDict.
+## SortedDict, SortedMultiDict and SortedSet.
 ## K = key type, D = data type
 ## Key type must support an ordering operation defined by Ordering
 ## object Ord.
 ## The default is Forward which implies that the ordering function
 ## is isless (see ordering.jl)
 ## The fields are as follows.
-## ord:: The ordering object.  Often the ordering type
+## ord: The ordering object.  Often the ordering type
 ##   is a singleton type, so this field is empty, but it
 ##   is still necessary to direct the multiple dispatch.
 ## data: the (key,data) pairs of the tree.
@@ -104,10 +104,10 @@ end
 ##    tree array (locations are freed due to deletion)
 ## freedatainds: Array of indices of free locations in the
 ##    data array (locations are freed due to deletion)
-## useddatacells: IntSet (i.e., bit vector) showing which
+## useddatacells: BitSet (i.e., bit vector) showing which
 ##    data cells are taken.  The complementary positions are
 ##    exactly those stored in freedatainds.  This array is
-##    used only for error checking (only present at debug level 1 and 2)
+##    used only for error checking.
 ## deletionchild and deletionleftkey are two work-arrays
 ## for the delete function.
 
@@ -119,7 +119,7 @@ mutable struct BalancedTree23{K, D, Ord <: Ordering}
     depth::Int
     freetreeinds::Array{Int,1}
     freedatainds::Array{Int,1}
-    useddatacells::IntSet
+    useddatacells::BitSet
     # The next two arrays are used as a workspace by the delete!
     # function.
     deletionchild::Array{Int,1}
@@ -129,7 +129,7 @@ mutable struct BalancedTree23{K, D, Ord <: Ordering}
         initializeTree!(tree1)
         data1 = Vector{KDRec{K,D}}(undef, 2)
         initializeData!(data1)
-        u1 = IntSet()
+        u1 = BitSet()
         push!(u1, 1, 2)
         new{K,D,Ord}(ord1, data1, tree1, 1, 1, Vector{Int}(), Vector{Int}(),
                      u1,
@@ -631,30 +631,30 @@ function compareInd(t::BalancedTree23, i1::Int, i2::Int)
     i2a = i2
     p1 = t.data[i1].parent
     p2 = t.data[i2].parent
-    curdepth = t.depth
+    # curdepth = t.depth
     while true
-        @assert(curdepth > 0)
+        # @assert(curdepth > 0)
         if p1 == p2
             if i1a == t.tree[p1].child1
-                @assert(t.tree[p1].child2 == i2a || t.tree[p1].child3 == i2a)
+                # @assert(t.tree[p1].child2 == i2a || t.tree[p1].child3 == i2a)
                 return -1
             end
             if i1a == t.tree[p1].child2
                 if (t.tree[p1].child1 == i2a)
                     return 1
                 end
-                @assert(t.tree[p1].child3 == i2a)
+                # @assert(t.tree[p1].child3 == i2a)
                 return -1
             end
-            @assert(i1a == t.tree[p1].child3)
-            @assert(t.tree[p1].child1 == i2a || t.tree[p1].child2 == i2a)
+            # @assert(i1a == t.tree[p1].child3)
+            # @assert(t.tree[p1].child1 == i2a || t.tree[p1].child2 == i2a)
             return 1
         end
         i1a = p1
         i2a = p2
         p1 = t.tree[i1a].parent
         p2 = t.tree[i2a].parent
-        curdepth -= 1
+        # curdepth -= 1
     end
 end