Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLE-8861] feat(vector): built-in TG function for pairwise vector embedding; #175

Merged
merged 5 commits into from
Dec 10, 2024

Conversation

jue-yuan
Copy link
Collaborator

@jue-yuan jue-yuan commented Dec 2, 2024

PR Summary

  • GLE-8861
    • Built-in GSQL functions for vector embedding.

PR Checklist

  • Added comments for more comprehensible code.
  • Added unit and/or regression tests.
  • (Optional) Notified Doc/Prod team, if this PR contains any new features.

What is the behavior change?

Other information:

Related PRs:

Copy link
Collaborator

@qe-tigergraph qe-tigergraph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😛  [sv4] WIP#58069 on tg_4.2.0_dev(gle#5691 engine#2772 product#279 er#1219 regress#1181 gsql-graph-algorithms#175) FAIL
Jenkins Job: Check jenkins job
Result: Job failed! Error: Tests Failed with no_fail option enabled
Instance Type: cluster
Tested OS: ubuntu18,centos7,ubuntu20,centos8
Requestor: jue.yuan

Copy link
Collaborator

@qe-tigergraph qe-tigergraph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:beach:  [sv4] build_job#58883 of WIP#58069 on tg_4.2.0_dev(gle#5691 engine#2772 product#279 er#1219 regress#1181 gsql-graph-algorithms#175) at 2024/12/03 16:55:36 (PST) BUILT
Reason: Build successful
Jenkins Job: Check jenkins job
Requestor: jue.yuan
masterUrl: http://192.168.99.101:30080/job/wip_test/58069
Comment: Your binary is ready in download site, please click here or use curl/wget to download it, also on nfs

@qe-tigergraph qe-tigergraph dismissed their stale review December 2, 2024 21:43

Information updated

@jue-yuan jue-yuan marked this pull request as ready for review December 3, 2024 18:44
@@ -0,0 +1,56 @@
CREATE FUNCTION gds.vector.cosine_distance(list<double> list1, list<double> list2) RETURNS(float) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "/gds" mean here?

Copy link
Collaborator Author

@jue-yuan jue-yuan Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be short for graph database system, Neo4j used this as their built-in library.
https://neo4j.com/docs/graph-data-science/current/algorithms/similarity-functions/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GDS means Graph Data Science.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would returning double be better since float has lower precision? Same applies below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will introduce some type checking error since they are mainly used for float types.

Copy link

@dadongwang-tg dadongwang-tg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanleilin please review these vector functions

dadongwang-tg
dadongwang-tg previously approved these changes Dec 4, 2024
@dadongwang-tg dadongwang-tg dismissed their stale review December 4, 2024 09:25

commented above

gds/vector/distance.gsql Outdated Show resolved Hide resolved
FOREACH i IN RANGE [0, @@myList1.size() - 1 ] DO
@@sqrSum += (@@myList1.get(i) - @@myList2.get(i)) * (@@myList1.get(i) - @@myList2.get(i));
END;
@@myResult = sqrt(@@sqrSum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the cases where the sizes of myList1 and myList2 are 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be good since the @@sqrSum should always be not less than 0.

gds/vector/norm.gsql Outdated Show resolved Hide resolved
@MichaelZengTG MichaelZengTG changed the base branch from tg_4.2.0_dev to CORE-4047 December 10, 2024 18:14
@MichaelZengTG MichaelZengTG merged commit d89c11c into CORE-4047 Dec 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants