Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go statement query slows down when the number of outgoing edges increases #2515

Closed
jordandandan opened this issue Aug 12, 2021 · 6 comments
Closed
Assignees
Labels
community Source: who proposed the issue priority/med-pri Priority: medium type/feature req Type: feature request
Milestone

Comments

@jordandandan
Copy link

问题:
语句:go from 0 over belongs yield $$.user.deviceId as deviceId | limit startOffset, size
每次查询size为1万条
user节点100万时每次查询为秒级别,当增大到1000万时需要1~2分钟,耗时是否正常,如何优化

@jordandandan
Copy link
Author

jordandandan commented Aug 12, 2021

而且查询的时候内存飙升,并且查询完内存不会释放,是否有方法不取全部数据再分页

@jordandandan
Copy link
Author

image

@yixinglu
Copy link
Contributor

问题:
语句:go from 0 over belongs yield $$.user.deviceId as deviceId | limit startOffset, size
每次查询size为1万条
user节点100万时每次查询为秒级别,当增大到1000万时需要1~2分钟,耗时是否正常,如何优化

看这里的查询是从单个 user 出发找 device,这个语句应该跟整个数据量没有关系才对。因为每次查询到的点边只跟起点有关,与总的 user 数量无关。

内存管理是由 JEMalloc 库完成的,不一定所有使用完的内存会立即还给操作系统,有些内存可以给下次的 query 使用,不用再向系统申请新的内存。

能否根据 deviceId 通过 WHERE 过滤来达到你的要求,比如:

go from 0 over belongs WHERE belongs._dst >= 100 AND belongs._dst < 200 yield $$.user.deviceId as deviceId

@jordandandan
Copy link
Author

这样查询满足需求,但是耗时仍然是分钟级别,您之前提到的:“因为每次查询到的点边只跟起点有关,与总的 user 数量无关“,这里测试的是从起点到1000万个终点每个点都有一条边,相当于有1000万条边;此外测试了:belongs._dst >= 0 AND belongs._dst < 100和belongs._dst >= 0 AND belongs._dst < 10000两种条件的耗时 ,结论为查询100个与10000个查询耗时都是1分钟,感觉是和整体的边的数量有关

@yixinglu
Copy link
Contributor

这样查询满足需求,但是耗时仍然是分钟级别,您之前提到的:“因为每次查询到的点边只跟起点有关,与总的 user 数量无关“,这里测试的是从起点到1000万个终点每个点都有一条边,相当于有1000万条边;此外测试了:belongs._dst >= 0 AND belongs._dst < 100和belongs._dst >= 0 AND belongs._dst < 10000两种条件的耗时 ,结论为查询100个与10000个查询耗时都是1分钟,感觉是和整体的边的数量有关

刚看了下类似的语句的执行计划,目前还是没能够把 where 中的 filter 下推到存储层中,所以这里的 100 和 10000 的执行结果差别不大。这块的优化正在开发过程中。

vesoft-inc/nebula-graph#1263 PR 合入之后, @czpmango 看看是否可以提供一个 OptRule 解决一下这个下推的问题。

@Sophie-Xie Sophie-Xie added the type/feature req Type: feature request label Aug 26, 2021
@Sophie-Xie Sophie-Xie changed the title go语句查询当出边数量增长时查询速度变慢 Go statement query slows down when the number of outgoing edges increases Sep 6, 2021
@Sophie-Xie Sophie-Xie added the community Source: who proposed the issue label Sep 13, 2021
@Sophie-Xie Sophie-Xie added need info Solution: need more information (ex. can't reproduce) priority/med-pri Priority: medium and removed need info Solution: need more information (ex. can't reproduce) labels Sep 14, 2021
@Sophie-Xie Sophie-Xie added this to the v3.0.0 milestone Oct 15, 2021
@Sophie-Xie Sophie-Xie moved this to Backlog in Nebula Graph v3.0.0 Oct 28, 2021
@CPWstatic CPWstatic moved this from Backlog to Coding in Nebula Graph v3.0.0 Nov 19, 2021
@Sophie-Xie Sophie-Xie moved this from Coding to Reviewing in Nebula Graph v3.0.0 Dec 25, 2021
@CPWstatic
Copy link
Contributor

This case could be written in match. We will work on the performance of match on the next roadmap.

Repository owner moved this from Reviewing to Done in Nebula Graph v3.0.0 Dec 30, 2021
yixinglu pushed a commit to yixinglu/nebula that referenced this issue Sep 14, 2023
Co-authored-by: Doodle <13706157+critical27@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue priority/med-pri Priority: medium type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

5 participants