-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLE-8861] feat(vector): built-in TG function for pairwise vector embedding; #175
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😛 [sv4] WIP#58069 on tg_4.2.0_dev(gle#5691 engine#2772 product#279 er#1219 regress#1181 gsql-graph-algorithms#175) FAIL
Jenkins Job: Check jenkins job
Result: Job failed! Error: Tests Failed with no_fail option enabled
Instance Type: cluster
Tested OS: ubuntu18,centos7,ubuntu20,centos8
Requestor: jue.yuan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:beach: [sv4] build_job#58883 of WIP#58069 on tg_4.2.0_dev(gle#5691 engine#2772 product#279 er#1219 regress#1181 gsql-graph-algorithms#175) at 2024/12/03 16:55:36 (PST) BUILT
Reason: Build successful
Jenkins Job: Check jenkins job
Requestor: jue.yuan
masterUrl: http://192.168.99.101:30080/job/wip_test/58069
Comment: Your binary is ready in download site, please click here or use curl/wget to download it, also on nfs
gds/vector/cosine_distance.gsql
Outdated
@@ -0,0 +1,56 @@ | |||
CREATE FUNCTION gds.vector.cosine_distance(list<double> list1, list<double> list2) RETURNS(float) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "/gds" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be short for graph database system
, Neo4j used this as their built-in library.
https://neo4j.com/docs/graph-data-science/current/algorithms/similarity-functions/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GDS means Graph Data Science.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would returning double be better since float has lower precision? Same applies below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will introduce some type checking error since they are mainly used for float types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuanleilin please review these vector functions
FOREACH i IN RANGE [0, @@myList1.size() - 1 ] DO | ||
@@sqrSum += (@@myList1.get(i) - @@myList2.get(i)) * (@@myList1.get(i) - @@myList2.get(i)); | ||
END; | ||
@@myResult = sqrt(@@sqrSum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the cases where the sizes of myList1 and myList2 are 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be good since the @@sqrSum
should always be not less than 0.
PR Summary
PR Checklist
What is the behavior change?
Other information:
Related PRs: