Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator name canonicalization #54

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Add operator name canonicalization #54

wants to merge 15 commits into from

Conversation

GeorgeR227
Copy link
Contributor

This is meant to address #34, to allow for better support of using Unicode and Ascii operator names. The idea here is to allow the user to choose from a range of supported operator names but have the underlying codebase only work on the canon name.

An example would be to have type inference rules carry a canon name, instead of an array of supported names, and have inference rules simply check that their name matches the converted user name.

These canon names should be carefully chosen to be easy to work with and parse. Some rules are included but are liable to change.

These tables are for both canon op1 operator names and their formless counterparts
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.

Project coverage is 85.96%. Comparing base (12edc3a) to head (df5d2c5).
Report is 3 commits behind head on main.

Files Patch % Lines
src/deca/deca_op_names.jl 75.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #54      +/-   ##
==========================================
+ Coverage   84.68%   85.96%   +1.28%     
==========================================
  Files          12       14       +2     
  Lines         764      905     +141     
==========================================
+ Hits          647      778     +131     
- Misses        117      127      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Works so far but more testing needed. Also believe that the Halfar test in language is broken.
The HeatXfer test is confirmed broken. The only reason why it failed now and not before is because negation inference broke slightly and is fixed now.
The problem now is that operators will be always be overloaded with our unicode names.
Before the code would remove any form information from an operator name. However, this would mean that if a user mistyped an operator, type inference might've silently worked if another overloading of that operator was valid. This now prevents that by respecting the given type of the operator.
Problem right now is that operators that get overloaded resolve into our unicode operators. However, already typed operators that aren't in the unicode stay untouched. This is intended behavior but means downstream packages need to use canon names to work with all supported names.
This needs to be tested first
If the generic type is unicode then use unicode name, if ascii then use ascii. Some special cases still apply for consistency with previous versions but should be changed. Also moved canon into its own file.
Also update rest of resolution rules
These describe how to add proper inference and resolution rules for new DEC operators
@GeorgeR227 GeorgeR227 marked this pull request as ready for review July 1, 2024 15:20
@GeorgeR227 GeorgeR227 requested a review from lukem12345 July 1, 2024 15:25
end

@testset "Typed Function Observance" begin
let # Typing respects typed exterior derivative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test supposed to show?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first implemented canon name typing, if a user typed in A == d_0(B) but said B::Form1, which is wrong cause the operator doesn't match with the input, infer would see d_0 as d, the canon generic name, and type A as a Form2.

When I realized that, I changed the code so if the operator is already typed it can only use its own specific rule and not the generic rules. These tests enforce this behavior because it felt like what was occurring above is a silent failing, leading to bad errors down the line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We should have a comment that explains this situation using e.g. that above summary. The "respects" feels a little overloaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test this behavior for all the other operators as well, or does this single test suffice?


@testset "Canon Inference and Overloading" begin
function check_canontyping(control::SummationDecapode, test::SummationDecapode)
infer_test = infer_types!(deepcopy(test))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use infer_types!(deepcopy(test)) == infer_types!(deepcopy(control))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the scope of this test is just to checks types and that names don't change. I guess we could change it to that but I want to limit the scope of the test to test what we actually want to check.

@test get_canon_name(over_test[:op1]) == get_canon_name(over_control[:op1])
end

setup_basecase(d::SummationDecapode) = resolve_overloads!(infer_types!(deepcopy(d)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Setup" is too generic of a function name here, and it looks like deepcopy, infer, resolve is used in the function above this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can change the name. I'm not too attached to it since they're only valid within the test anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should get renamed to theory.jl. Later, we can silo all of the information that would normally incorporate @theory or "Model" information. The reason we won't use @theory outright for the time being is that we want to have support for multiple names referring to the same operator, etc.

@lukem12345
Copy link
Member

This PR just uses the global names Dict for the type inference and function overloading, but it should be used throughout this codebase and Decapodes.jl wherever explicit symbols appear. Can you spot check the rest of this codebase for such updates? Where it is too tedious (e.g. unicode!), we can make an issue to refactor.

const DUALDERIV_0 = :dual_d_0
const DUALDERIV_1 = :dual_d_1

const HODGE_0 = :hdg_0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII-Unicode equivalents that we have now use star and star_inv instead of hdg.

ascii_to_unicode_op1 = Pair{Symbol, Any}[


CANON_NAMES = Dict{Symbol, Symbol}(
# Partial time derivative
:∂ₜ => PARTIAL_T,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not given a const UNICODE_PARTIALT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the unicode consts are only needed for overloading and this should never be overloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, unicode names should just be aliases for ascii names but I have the unicode consts for nicer function naming in overloading, so for example op is ∧ => ∧₀₁ but if is wdg => wdg_01.

@GeorgeR227
Copy link
Contributor Author

I've skimmed through the rest of the code and it seems like really the only other function that should use canon names, in DiagrammaticEquations, would be the find_chains code. So in this case, we can just check that the canon name of an operator matches the canon name of an operator in a black/whitelist.

Everything else seems to just be editing names or copying over raw names and so don't need this.

const UNICODE_AVG_01 = :avg₀₁

# Op2 names
const UNICODE_WEDGE_00 = :∧₀₀
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would group these blocks of operators such that all forms of wedge product (i.e.g including ASCII) are grouped together.

@lukem12345
Copy link
Member

AlgebraicJulia/Decapodes.jl#142 is a spiritual predecessor of this feature. The main distinctions are that the old PR "canonicalized" on Unicode, and this PR ASCII, and the old PR added a subroutine that edits the ACSet to use the canonicalized names.

@lukem12345
Copy link
Member

We note that incident(decapode, :L_1, :op1) needs special treatment if it is to be inter-operable with canonical representations. Of course, features such as incident(decapode, :L, :op1) (the intention being to get all Lie derivatives, typed or untyped, Unicode or ASCII) require more engineering anyway.

@lukem12345
Copy link
Member

lukem12345 commented Jul 1, 2024

As discussed off-line, we should:

  • Have a way of getting the Unicode of representation of an operator. (Currently partially handled by the Dict from Decapodes.jl PR 142)
  • '' ASCII ''. Currently handled by this PR.
  • We want to get the set of all aliases (for e.g. using incident in a more pain-free way). This information is stored implicitly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants