Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join #72

Closed
julianhyde opened this issue Oct 5, 2021 · 3 comments
Closed

Join #72

julianhyde opened this issue Oct 5, 2021 · 3 comments

Comments

@julianhyde
Copy link
Collaborator

Add clauses to from to support inner and outer joins. Allow allow cross joins (comma syntax) at any position in the from clause, not just at the start.

We currently support cross join via comma syntax:

from e in emps,
    d in depts
  where e.deptno = d.deptno
  andalso e.job = "CLERK"

Technically, comma join is a cross join (cartesian join) because it does not allow a condition. We add several new types of join:

  • join is an inner join
  • left join is a left outer join, with optional rows on the right
  • right join is a right outer join, with optional rows on the left
  • full join is a full outer join, with optional rows on the left and right

All of these are followed by on and a boolean condition. For example,

from e in emps
  join d in depts on e.deptno = d.deptno
  where e.job = "CLERK"

In the first example, note that the two 'scans' occur before other clauses (where). After this change we will still allow comma syntax, but also after other clauses:

from e in emps
  where e.job = "CLERK",
  d in depts
  where e.deptno = d.deptno

where and , have the same precedence and are left-associative, so the second where applies to from e in emps where e.job = "CLERK", d in depts, not just depts.

If you want to filter just depts, use a parenthesized from expression:

from e in emps
  where e.job = "CLERK"
  left join d in (from d2 in depts
      where d2.location = "San Francisco")
    on e.deptno = d.deptno

These precedence and associativity are different from SQL. In SQL, , has lower precedence than JOIN, so SELECT ... FROM a CROSS JOIN b, c CROSS JOIN d becomes the expression tree "(a . b) . (c. d)". But in Morel, from a in as, b in bs join c in cs on true, d in ds becomes the expression tree "((a . b) . c) . d".

In SQL, the various kinds of outer join invent 'outer' rows whose column values are all null. Morel does not have null, so instead wraps rows in Option:

- from e in scott.emp
=   right join d in scott.dept
=   on e.deptno = d.deptno andalso e.job = "CLERK";
val it = [
   {d={deptno=20,dname="RESEARCH",loc="DALLAS"},
    e=SOME {comm=0.0,deptno=20,empno=7369,ename="SMITH",hiredate="1980-12-16", job="CLERK",mgr=7902,sal=800.0}},
   {d={deptno=20,dname="RESEARCH",loc="DALLAS"},
    e=SOME {comm=0.0,deptno=20,empno=7876,ename="ADAMS",hiredate="1987-05-22",job="CLERK",mgr=7788,sal=1100.0}},
   {d={deptno=30,dname="SALES",loc="CHICAGO"},
    e=SOME {comm=0.0,deptno=30,empno=7900,ename="JAMES",hiredate="1981-12-02",job="CLERK",mgr=7698,sal=950.0}},
   {d={deptno=10,dname="ACCOUNTING",loc="NEW YORK"},
    e=SOME {comm=0.0,deptno=10,empno=7934,ename="MILLER",hiredate="1982-01-22",job="CLERK",mgr=7782,sal=1300.0}},
   {d={deptno=40,dname="OPERATIONS",loc="BOSTON"},
    e=NONE}]
  : {d:{deptno:int, dname:string, loc:string}, e:{comm:real, deptno:int, empno:int, ename:string, hiredate:string, job:string, mgr:int, sal:real} option} list

This query is right join, and therefore the rows on the left side (e) become optional. In left join, rows on the right become optional, and in full join rows on both sides become optional.

Note that Morel prints d before e, because fields are in alphabetical order. But e is considered to be the left input to the join, because it occurs earlier in the from expression, and d is the right input. The variable name is irrelevant.

There is no equivalent of SQL's USING clause.

There is no syntax for left, right or full cartesian join, but you can achieve the same effect by writing on true:

(*) Return each employee matched with their department, or NONE if
(*) the department number is invalid.
from e in scott.emp
  left join d in (from d2 in scott.dept
      where d2.deptno = e.deptno) on true
@julianhyde
Copy link
Collaborator Author

Oops. We can't have , d in depts after group because the group and compute clauses are comma-separated. The following are rather too close for the parser to tell apart:

from e in emps group e.deptno, d in depts;
from e in emps group e.deptno, e.job, d in depts;

I now think we should allow , variable in list and , variable = expression only before the first where, group, order, yield. After that, you will need to write cross join. Examples:

(*) Valid
from e in emps, d in depts where e.deptno = d.deptno;
from e in emps cross join d in depts where e.deptno = d.deptno;
from e in emps join d in depts on e.deptno = d.deptno;
from e in emps where e.name = "Shaggy" join d in depts on e.deptno = d.deptno;
from e in emps where e.name = "Shaggy" cross join d in depts where e.deptno = d.deptno;
from e in emps group e.deptno, e.job join d in depts on deptno = d.deptno;
from e in emps group e.deptno, e.job cross join d in depts where deptno = d.deptno;
from e in emps order e.deptno, e.job cross join d in depts where e.deptno = d.deptno;
from e in emps order e.deptno, e.job cross join fullName = e.first ^ e.last where e.deptno = d.deptno;

(*) Invalid
from e in emps where e.name = "Shaggy", d in depts where e.deptno = d.deptno;
from e in emps group e.deptno, e.job, d in depts where e.deptno = d.deptno;
from e in emps order e.deptno, e.job, d in depts where e.deptno = d.deptno;

@julianhyde
Copy link
Collaborator Author

Iterating again. Let's make on optional in join, left join, right join, full join. Then we don't need a cross join.

And the syntax for = changes from cross join v = expr to join v = expr, which is more concise. The revised examples are as follows:

(*) Valid
from e in emps, d in depts where e.deptno = d.deptno;
from e in emps join d in depts where e.deptno = d.deptno;
from e in emps join d in depts on e.deptno = d.deptno;
from e in emps where e.name = "Shaggy" join d in depts on e.deptno = d.deptno;
from e in emps where e.name = "Shaggy" join d in depts where e.deptno = d.deptno;
from e in emps group e.deptno, e.job join d in depts on deptno = d.deptno;
from e in emps group e.deptno, e.job join d in depts where deptno = d.deptno;
from e in emps order e.deptno, e.job join d in depts where e.deptno = d.deptno;
from e in emps order e.deptno, e.job join fullName = e.first ^ " " ^ e.last;

(*) Invalid
from e in emps where e.name = "Shaggy", d in depts where e.deptno = d.deptno;
from e in emps group e.deptno, e.job, d in depts where e.deptno = d.deptno;
from e in emps order e.deptno, e.job, d in depts where e.deptno = d.deptno;

@julianhyde
Copy link
Collaborator Author

Fixed in ab10217.

This time we support inner join (the join and on keywords); outer join (left, right and full keywords) will come later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant