Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RealSet: Faster operations by scan-line (merging) techniques #32181

Closed
mkoeppe opened this issue Jul 11, 2021 · 73 comments
Closed

RealSet: Faster operations by scan-line (merging) techniques #32181

mkoeppe opened this issue Jul 11, 2021 · 73 comments

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Jul 11, 2021

Adapt from:

  1. Extend Union and Intersection from pairwise computation to multiple realsets with scan-line algorithm. In order to compare the performance of the scan-line and orginial methods, we use different dataset to test the CPU time the random generator and timing test code can find in this page:
Function Dataset Scan-line original
union 1000 closed bounded disjoint interval 0.005281 0.00749
union 1000 open bounded disjoint interval 0.004339 0.006646
union 1000 mixed bounded disjoint interval 0.005025 0.006646
intersection 300 closed bounded disjoint interval 0.001182 4.376048
intersection 300 open bounded disjoint interval 0.001225 4.359314
intersection 300 mixed bounded interval 0.001537 4.407478
  1. We used scan-line methods to replace the original normalize and also make normalize optionally. After replacing it, we test it with 500,000 random intervals, the original methods take 151 second to construct it; scan-line methods only took 18 seconds.
  2. Replace difference,symmetric_difference and are_pairwise_disjoint with scan-line methods. We use 1000 realsets each realsets containing 1000 random intervals as the testing dataset. The original methods found the difference between two real sets took 11 seconds on average, and the new methods only took 0.021 seconds on average; symmetric_differnce also has a similar result.
  3. Implement convex_hull and is_connected in RealSet.

Depends on #34252

CC: @yuan-zhou

Component: geometry

Author: Yueqi Li, Yuan Zhou

Branch/Commit: 1ef7dd1

Reviewer: Matthias Koeppe

Issue created by migration from https://trac.sagemath.org/ticket/32181

@mkoeppe mkoeppe added this to the sage-9.4 milestone Jul 11, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.4, sage-9.5 Jul 19, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.5, sage-9.6 Dec 14, 2021
@mkoeppe mkoeppe modified the milestones: sage-9.6, sage-9.7 Mar 5, 2022
@NicoleYueqiLi
Copy link
Mannequin

NicoleYueqiLi mannequin commented May 19, 2022

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 18, 2022

Commit: 625ac58

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 18, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

@NicoleYueqiLi

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 26, 2022

Changed commit from 625ac58 to 33385c8

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 26, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

31613391. fixed pycode test fail
33385c8Adding examples for helper funtion `_scan_line_union`, `_scan_line_intersection`, `_scan_line_difference`, `_scan_line_symmetric_difference`

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 27, 2022

comment:8

If you merge the most current beta (9.7.beta6), you can use https://trac.sagemath.org/wiki/ReleaseTours/sage-9.7#Validatingdocstringwith.sage-tox-erst

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 28, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

61862a9Change scan-line methods to tree structure, improve inner function's documentation

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 28, 2022

Changed commit from 33385c8 to 61862a9

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 28, 2022

comment:10

tox -e relint will tell you that EXAMPLE:: should be EXAMPLES:: according to our style guide (whether there is a single or there are multiple examples)

@yuan-zhou
Copy link

comment:11

The current code does not pass the doc-tests.
Also, please follow https://doc.sagemath.org/html/en/developer/coding_basics.html. For example, in the text part, convex_hull should be fully-referenced so that it is clickable after build, i.e. :meth:~sage.sets.real_set.RealSet.convex_hull

@yuan-zhou
Copy link

comment:12

I took a look through the code, but didn't run the code.
I have some first comments.

  • The patchbot currently shows 31 doctests failures and 4 lint errors. To be fixed!
  • In the docstrings of _scan_left_endpoint and _scan_right_endpoint: What does an event mean. Reformat the text after output. Describe the value of delta. Include example(s) where tag is given.
  • RealSet constructor with option normalized=True. What does this assume? Describe it in the docstring of __classcall__ or that of RealSet. Mention that is does not apply to the manifold case. Include doctests/tests where normalized=True is given, and examples where arguments are of type InternalRealInterval.
  • I have doubts about where if not normalized is treated. (But of course, this depends on how you define normalized=True). I thought if normalized=True and it's not of the manifold case (lines 1267-1283), one can just return UniqueRepresentation.__classcall__(cls, *intervals) before line 1199 intervals = [], which skips all treatments on args.
  • When normalized is not True, may simplify and insert the code in def normalize here, then remove the normalize function as it is only used once here. Alternatively, keep def normalize and move all treatment of args from __classcall__ to normalize.
  • RealSet.__classcall__: Line 1188-1190 change to just: normalized = kwds.pop('normalized', False). L1194: "... take no keyword arguments other than 'normalized'.""
  • RealSet._prep: L1710 elif. L1717 elif
  • RealSet doesn't need methods _scan_left_endpoint and _scan_right_endpoint. These are methods of the class InternalRealInterval.
  • RealSet._scan_interval Rename the function, to something like _scan, or _scan_intervals, or _generate_events. Rewrite the function: for i in self._intervals: yield i._scan_left_endpoint(tag); yield i._scan_right_endpoint(tag). No more merge/sorting needed.
  • Ideally, all scan methods are generators. Currently, some are generators, some return lists.
  • In _scan_line_union, lines 2104-2122, the two while loops look very suspicious to me. I think they do not belong here. Moreover, the lines 2118-2119 in the second while loop might be wrong. You claimed that these two while loops take care of the special cases RealSet(x=-infinity) and RealSet(x<=-infinity), but the input scan to this function should never be in such form. Such empty (nonsense) special cases should already been eliminated in around line 1209 in __classcall__
  • In union_of_realsets and other similar functions: Lines 2175 to 2178 can be simplified to scan = merge(RealSet(s)._scan() for s in real_set_collection)
  • _scan_line_intersection. Add more special examples to doctest to cover the cases where intersection is just one point, or just empty. For example: [1,2] ∩ [2,3], [2,3] ∩ (3,4), (3,4) ∩ [4,5), [4,5) ∩ (5,6). Lines 2252-2261 (and other similar places) can be simplified to
if now_on:
  (on_x, on_epsilon) = (x, epsilon) 
else: 
  if was_on and (on_x, on_epsilon) < (x, epsilon): 
    yield InternalRealInterval(...)
  (on_x, on_epsilon) = (None, None)
  • Questions about 'tag'. It is said to be a real number in the docstrings of many function, but is used as True or False for the difference function only. I think tag can be eliminated throughout the code (and do some special treatment for the difference.) Also, why is it called index at some places? That's not used either, so can be removed as well. On a related note, given a collection of real sets, after generating and merging the their scan, you can no longer match the list of events back to the original real sets, because index was not passed to tag in the events. I do not like the current design.
  • _scan_line_union, _scan_line_intersection, _scan_line_symmetric_difference (and part of are_pairwise_disjoint) all look redundant. They all can be replaced by a new method RealSet._scan_to_intervals(scan, indicator) that takes scan as a generator of sorted events and indicator as a number 1 or n (or perhaps as a lambda function for are_pairwise_disjoint), and outputs a tuple intervals of type InternalRealInterval.
  • According to your ticket description, the old union method is slower than the new one. Is this true for the union of two real sets? If so, why is the old one kept in the code? If not, please update the description of the ticket, and include comments in the code. In general, are there examples where the original method is faster? Please specify the examples, and change the code to choose strategically between the two versions according to the input.
  • def convex_hull: correct but wasteful. One does not to sort all intervals, only the left most and right most intervals of each real set matter.
  • def is_connected docstring: Return whether self is a connected set. OUTPUT: Boolean. True if self is a single interval.
  • def are_pairwise_disjoint: Revise.
  • Throughout the code, (for example in _scan_line_union and _scan_line_intersection and many other functions,) the docstrings need significant improvements! The current ones make little sense. They are often copy-pasted from other places which do not correspond to the function. Input and output descriptions are not exact, variable undefined, variable mistaken (eg. realsets v.s real_set_collection) type not specified, code-style incorrect. Often bad and incomplete examples. EXAMPLES:: plural. At the beginning of the file, "Yueqi Li (2022-06-27): ..." Rephrase to one or two lines. Some places in the docstrings ":meth:~sage.sets.real_set.RealSet.convex_hull"
  • Please make sure that your docstrings follow https://doc.sagemath.org/html/en/developer/coding_basics.html.
  • Throughout the code: clean up the commented-out code.
  • Throughout the code: :class:RealInterval should be :class:InternalRealInterval
  • New method: relative closure in a superset?

@yuan-zhou
Copy link

comment:14

Merge-in SageMath version to 9.7.beta6.


New commits:

ed5c560Merge branch 'develop' into t/32181/realset__faster_operations_by_scan_line__merging__techniques

@yuan-zhou
Copy link

Changed commit from 61862a9 to ed5c560

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 31, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

cd9cc4ctrivial typo fix
f6b334fremove outdated patch for the broken comparison between infinity and RLF (fixed in ticket 11506)
58aa1efadd bug example regarding empty InternalRealInterval. Will not fix for this internal class
7a65f08fix bug regarding empty real sets

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 31, 2022

Changed commit from ed5c560 to 7a65f08

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 31, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

1bbb04facknowledge authors, reference

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 31, 2022

Changed commit from 7a65f08 to 1bbb04f

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 31, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

dda8f6brewrite implementation of scan-line algo for real set methods, by adapting the code from https://github.com/mkoeppe/cutgeneratingfunctionology

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 4, 2022

Changed commit from 523b49a to 3f6a0ac

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:41

Replying to @yuan-zhou:

Do you mean bool(i==1)? I don't know if it's always needed, but I thought there is nothing wrong to add it.

i is just an integer, right? Converting to bool is only needed in comparison when you are comparing SR elements, which have overloaded comparison operators for forming relational expressions.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:42

If you want, you can add a deprecation for the normalize method

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 4, 2022

Changed commit from 3f6a0ac to 79f2014

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 4, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

79f2014revise the docstring of RealSet.is_connected

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 4, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

1ef7dd1remove unnecessary bool()

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 4, 2022

Changed commit from 79f2014 to 1ef7dd1

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:45

Here's a microbenchmark:

Before:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
82.9 µs ± 3.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
210 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

With this ticket:

sage: from itertools import count
sage: c = count()
sage: %timeit RealSet((0, next(c)))
86.2 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sage: %timeit RealSet((0, next(c))).union([next(c), next(c)])
257 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@yuan-zhou
Copy link

comment:46

Does the overhead look acceptable on the above examples?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:47
     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

@NicoleYueqiLi

This comment has been minimized.

@yuan-zhou
Copy link

comment:49

Somewhere in the original code (mul ?), the empty interval is set to (0, 0)
Replying to @mkoeppe:

     def __init__(self, lower, lower_closed, upper, upper_closed, check=True):
         """
         Initialize ``self``.
[...]
             if (upper_closed and upper == infinity):
                 raise ValueError('interval cannot be closed at +oo')
+            # TODO: take care of the empty set case.

What do you have in mind there?

@NicoleYueqiLi

This comment has been minimized.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:51

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:

@yuan-zhou
Copy link

comment:52

I didn't like it either, since the new code spends a lot of effect to take care of that. I wanted to eliminate that. However, the original code allowed for it (contained doctests of that kind).
Replying to @mkoeppe:

Is this syntax self.intersection(0,1) new? Is it really needed?

+        elif len(real_set_collection) == 2:
+            a, b = real_set_collection
+            # allow self.intersection(0,1) syntax
+            try:
+                a.n()
+                b.n()
+                sets.append(RealSet(a, b))
+            except (AttributeError, ValueError, TypeError):
+                sets.append(RealSet(a))
+                sets.append(RealSet(b))
+        else:

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:53

OK, you are right, the old code did allow it. So we have to keep it.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:54
-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """

I think you can simplify it further by making it

     def intersection(*real_set_collection)

When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:55

(Just a suggestion, it's fine as it)

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

comment:56

Replying to @yuan-zhou:

Does the overhead look acceptable on the above examples?

Yes, I think it's fine.

My guess is that it could be improved by a custom version of heapq with a fast path for near-trivial cases, or even cythonized.

Let's keep this for a possible follow-up ticket.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 4, 2022

Reviewer: Matthias Koeppe

@yuan-zhou
Copy link

comment:58

That's a good suggestion, but I incline to leave it as is on this ticket. The reason is the treatments for special cases such as RealSet.union(RealSet(0,1)), RealSet.union((0,1)), RealSet.union(0, 1), and RealSet(0,1).union(1, 3)==RealSet.union(RealSet(0,1), 1, 3) could be too complicated at the moment. Will discuss the deprecation of the use of RealSet(0,1).union(1, 3) on another ticket later.
Replying to @mkoeppe:

-    def intersection(self, *other):
+    def intersection(self, *real_set_collection):
         """

I think you can simplify it further by making it

     def intersection(*real_set_collection)

When used as a bound method, RealSet(1, 3).intersection(RealSet(2, 4)), the self would just be the first element of real_set_collection.

But it could also be used as an unbound method RealSet.intersection(RealSet(1, 3), RealSet(2,4)), and you could also handle the case RealSet.intersection(), which should give the real line as the intersection over an empty family

@yuan-zhou
Copy link

comment:59

Thank you! Replying to @mkoeppe:

@yuan-zhou
Copy link

comment:60

Should def normalize(intervals) be @staticmethod

@yuan-zhou
Copy link

comment:61

Never mind. Without @staticmethod, the following can apply.

sage: r = RealSet(RealSet(2, 6)[0], RealSet(4, 10)[0], normalized=True)
sage: r
(2, 6) ∪ (4, 10)
sage: r.normalize()
((2, 10),)

Replying to @yuan-zhou:

Should def normalize(intervals) be @staticmethod

@vbraun
Copy link
Member

vbraun commented Aug 6, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants