Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path.Order() #796

Merged
merged 11 commits into from
Oct 8, 2019
Merged

path.Order() #796

merged 11 commits into from
Oct 8, 2019

Conversation

iddan
Copy link
Collaborator

@iddan iddan commented Jul 8, 2019

  • Basic implementation
  • Tests and bug fixes

This change is Reviewable

@iddan iddan mentioned this pull request Jul 8, 2019
@iddan iddan changed the title Path Order path.Order() Jul 8, 2019
@iddan
Copy link
Collaborator Author

iddan commented Jul 8, 2019

Works:

g.V().Order().All()
{
	"result": [
		{
			"id": "smart_person"
		},
		{
			"id": "<alice>"
		},
		{
			"id": "<are>"
		},
		{
			"id": "<bob>"
		},
		{
			"id": "<charlie>"
		},
		{
			"id": "<dani>"
		},
		{
			"id": "<emily>"
		},
		{
			"id": "<follows>"
		},
		{
			"id": "<fred>"
		},
		{
			"id": "<greg>"
		},
		{
			"id": "<predicates>"
		},
		{
			"id": "<smart_graph>"
		},
		{
			"id": "<status>"
		}
	]
}

@iddan
Copy link
Collaborator Author

iddan commented Jul 8, 2019

g.V("<bob>").Out().Order().All()

Retruns:

{
	"result": [
                {
			"id": "cool_person"
		},
		{
			"id": "<fred>"
		}
	]
}

@iddan
Copy link
Collaborator Author

iddan commented Jul 8, 2019

Works:

g.V().Tag("entity").Out("<status>").Order().All()
{
	"result": [
		{
			"entity": "<bob>",
			"id": "cool_person"
		},
		{
			"entity": "<dani>",
			"id": "cool_person"
		},
		{
			"entity": "<greg>",
			"id": "cool_person"
		},
		{
			"entity": "<emily>",
			"id": "smart_person"
		},
		{
			"entity": "<greg>",
			"id": "smart_person"
		}
	]
}

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good, I don't see any obvious flaws in it. Well done! :)

As for the tests, I propose to add them to Gizmo first. Just add a few cases to the test table and we should be good.

The By() implementation can be in a separate PR, by the way, as well as any backend-specific optimizations.

Reviewed 6 of 7 files at r1, 1 of 1 files at r3.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @iddan)


graph/iterator/order.go, line 32 at r3 (raw file):

}

func getOrderedValues(it *Order) *values {

Maybe accept a (qs QuadStore, it Iterator) and pass (it.qs, it.subIt) when calling it? This may be useful for other iterators later.


graph/iterator/order.go, line 34 at r3 (raw file):

func getOrderedValues(it *Order) *values {
	var qs, subIt = it.qs, it.subIt
	var results = make([]result, 0)

Nit: var results []result


graph/iterator/order.go, line 42 at r3 (raw file):

		var tags = make(map[string]graph.Value)
		subIt.TagResults(tags)
		vals.results = append(vals.results, result{id, tags})

I think it would be better to also call qs.NameOf(id) here and store it in the structure similar to the result. This way you won't need to call it while sorting which will be much faster. Maybe even call .String() as well to reduce allocations.

Also, this will allow removing qs from values.


graph/iterator/order.go, line 44 at r3 (raw file):

		vals.results = append(vals.results, result{id, tags})
	}

You should also check the error from the sub-iterator here:

err := subIt.Err()

and store it in Order.err.

So the function should return an error as well.


graph/iterator/order.go, line 49 at r3 (raw file):

	subIt.Reset()

	return &vals

Assuming my suggestion above, the function can be changed to return a []result instead of *values.


graph/iterator/order.go, line 52 at r3 (raw file):

}

func NewOrder(qs graph.QuadStore, subIt graph.Iterator) *Order {

It seems like you could minimize the interface from graph.QuadStore to graph.Namer. This will make it easier to test.


graph/iterator/order.go, line 79 at r3 (raw file):

}

// SubIterators returns a slice of the sub iterators. The first iterator is the

I believe the second sentence is not relevant to this iterator.


graph/iterator/order.go, line 82 at r3 (raw file):

// primary iterator, for which the complement is generated.
func (it *Order) SubIterators() []graph.Iterator {
	return nil

The function should return a it.subIt. I noticed you removed it in one of the commits. I wonder why.


graph/iterator/order.go, line 88 at r3 (raw file):

// has not previously seen.
func (it *Order) Next(ctx context.Context) bool {
	it.runstats.Next += 1

++, same in other places


graph/iterator/order.go, line 139 at r3 (raw file):

	subStats := it.subIt.Stats()
	return graph.IteratorStats{
		NextCost:     1,

subStats.NextCost


graph/iterator/order.go, line 141 at r3 (raw file):

		NextCost:     1,
		ContainsCost: subStats.ContainsCost,
		Size:         int64(len(it.ordered.results)),

It may not be populated yet (Next not called).


graph/iterator/order.go, line 155 at r3 (raw file):

func (it *Order) String() string {
	return "Order"

"Order{"+it.subIt.String()+"}"


graph/kv/quad_iterator.go, line 223 at r3 (raw file):

}

func (it *QuadIterator) Ordered() bool { return true }

I would actually prefer to name it Sorted.


graph/shape/shape.go, line 1427 at r3 (raw file):

}

type Order struct {

Can we name it Sort? It works exactly like stdlib's sort.Sort, so makes sense to name it the same way internally.

The Path.Order and Gizmo's path.Order can be named as-is, because those need to match Gremlin.

@iddan
Copy link
Collaborator Author

iddan commented Sep 11, 2019

I am going to fix all the issues once this feature gets priority in my company's projects. Probably in less than a month.

@iddan iddan self-assigned this Oct 4, 2019
@iddan iddan marked this pull request as ready for review October 4, 2019 19:56
@iddan
Copy link
Collaborator Author

iddan commented Oct 5, 2019

Updated the code to the new Iterator interface. Ready for rereview

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 9 files at r4, 2 of 2 files at r5.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @iddan)


graph/iterator/sort.go, line 26 at r5 (raw file):

func (v values) Swap(i, j int) { v[i], v[j] = v[j], v[i] }

type sortNext struct {

Minor, but can you please order things the same way as in other iterators? E.g. first, all the exported methods/types, so a public API will be more visible, than the sortIt (v2 implementation), than sortNext and sortContains (specific implementations of the iterator depending on the query plan).


graph/iterator/sort.go, line 29 at r5 (raw file):

	namer   graph.Namer
	subIt   graph.Scanner
	ordered *values

You can safely remove pointer here. Slices in Go can be nil as well.


graph/iterator/sort.go, line 44 at r5 (raw file):

func (it *sortNext) TagResults(dst map[string]graph.Value) {
	if it.subIt != nil {
		it.subIt.TagResults(dst)

This won't work. You drain it.subIt in getSortedValues, so TagResults will return tags related to the last result in the iterator. You should recall tags from value.result.tags.


graph/iterator/sort.go, line 57 at r5 (raw file):

func (it *sortNext) Next(ctx context.Context) bool {
	if it.ordered == nil {

This should be after the if it.err != nil. If getSortedValues fails, v may be nil, so calling Next multiple times will try fetching values again, probably leading to the same error. Instead we can first check the saved error and return earlier without touching the datastore.


graph/iterator/sort.go, line 75 at r5 (raw file):

func (it *sortNext) NextPath(ctx context.Context) bool {
	return false

I think this is incorrect. NextPath is not implemented on Sort itself (which is correct), but it may be implemented on the subIt. So you should call it in getSortedValues as well (since we don't know if it will be called by the user or not).


graph/iterator/sort.go, line 109 at r5 (raw file):

func (it *sortIt) Lookup() graph.Index {
	return newSortContains(it.subIt.Lookup())

It's may not be obvious, but:

// we can ignore the order when doing Contains, so we return a subiterator directly
return it.subIt.Lookup()

And we can remove sortContains completely. One of the benefits of a new iterator interface.


graph/iterator/sort.go, line 123 at r5 (raw file):

	subStats, err := it.subIt.Stats(ctx)
	return graph.IteratorCosts{
		NextCost:     subStats.NextCost,

subStats.NextCost*2, // TODO(dennwc): better cost calculation

The real cost is still subStats.NextCost, but we need to somehow make the cost appear larger, since we will pull all values into memory which may be expensive for large datasets.


graph/iterator/sort.go, line 147 at r5 (raw file):

}

// NewSort creates a new Sort iterator
// NewSort creates a new Sort iterator.
// TODO(dennwc): This iterator must not be used inside And: in this cases it may be moved to a Contains branch and won't do anything. We should make And account for this.

graph/iterator/sort.go, line 161 at r5 (raw file):

func getSortedValues(namer graph.Namer, it graph.Scanner) (*values, error) {
	var v values
	var ctx = context.TODO()

Please don't fake the context. From the code, I see that you can pass a real one it in Next.


query/gizmo/gizmo_test.go, line 656 at r5 (raw file):

			"smart_person",
		},
	},

Another test case worth considering:

	{
		message: "Use order tags",
		query: `
			g.V().Tag("target").order().all()
		`,
		tag: "target",
		expect: []string{
			"<alice>",
			"<are>",
			"<bob>",
			"<charlie>",
			"<dani>",
			"<emily>",
			"<follows>",
			"<fred>",
			"<greg>",
			"<predicates>",
			"<smart_graph>",
			"<status>",
			"cool_person",
			"smart_person",
		},
	},

It checks if the iterator can correctly propagate tags.

@iddan
Copy link
Collaborator Author

iddan commented Oct 6, 2019

Done everything except:

graph/iterator/sort.go, line 75 at r5 (raw file):

func (it *sortNext) NextPath(ctx context.Context) bool {
return false
I think this is incorrect. NextPath is not implemented on Sort itself (which is correct), but it may be implemented on the subIt. So you should call it in getSortedValues as well (since we don't know if it will be called by the user or not).
Which I don't understand

@iddan
Copy link
Collaborator Author

iddan commented Oct 7, 2019

@dennwc I noticed a bug that after calling order() back() isn't working as expected. Does it has to do with NextPath()?

@dennwc
Copy link
Member

dennwc commented Oct 7, 2019

It might be. You can test it with CAYLEY_DEBUG_OPTIMIZER=true, and if you see it inside Quads and then NodesFrom at some point, than it is the case. As a rule of thumb, right now it should always be at the top level.

@iddan
Copy link
Collaborator Author

iddan commented Oct 7, 2019

Then I'll check if the bug persists after we fix NextPath()

@dennwc
Copy link
Member

dennwc commented Oct 7, 2019

@iddan I rebased the branch and implemented NextPath in it. Please test if it solves the issue you mentioned.

Note that ordering may dissappear in some circumstances when using Paths. So changes to the optimizer must be made to ensure that it works in all cases.

@iddan
Copy link
Collaborator Author

iddan commented Oct 8, 2019

Looks like the issue I reported was solved.

@iddan iddan merged commit f449116 into cayleygraph:master Oct 8, 2019
@iddan iddan deleted the order branch October 8, 2019 13:27
@dennwc dennwc added this to the v0.7.6 milestone Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants