-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace some wrapper funcs in the deephaven.time module with Java wrapped method calls when used in formulas #4095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty nice!
engine/table/src/main/java/io/deephaven/engine/table/impl/lang/QueryLanguageParser.java
Outdated
Show resolved
Hide resolved
engine/table/src/main/java/io/deephaven/engine/table/impl/lang/QueryLanguageParser.java
Outdated
Show resolved
Hide resolved
engine/table/src/main/java/io/deephaven/engine/table/impl/lang/QueryLanguageParser.java
Outdated
Show resolved
Hide resolved
This script finishes in 3s running locally on MBP x86_64 with the changes. Running the same in the demo system takes 40s or so.
|
66ceb82
to
68c765e
Compare
b43036f
to
776f097
Compare
@@ -334,6 +334,26 @@ public static Instant now() { | |||
return currentClock().instantNanos(); | |||
} | |||
|
|||
@ScriptApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably we could take this out of the API in Python, instead of adding a new Java method. Presumably if we thought it was really necessary, we would already have a Java method like this. TimeUnit
is a much better way to express this, rather than a String.
@ScriptApi | ||
@Nullable | ||
public static Instant lowerBin(@Nullable final Instant instant, Object intervalNanos, Object offset) { | ||
long interval = getDurationNanos(intervalNanos); | ||
long off = getDurationNanos(offset); | ||
|
||
return lowerBin(instant, interval, off); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have big concerns about this:
- Are we going to have to implement every primitive as an object? This is not good.
- I have concerns that these object methods will break the query language because something ends up being ambiguous to the java compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new overload is to support Union
type in the Python API, it will not conflict with the original overload that takes primitive type parameters. The other approach would be to create multiple overload methods for all the possible combinations of parameter types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overload per type is the Java way. This current way is not usable from Java. If we want it, at least hide it in some other class that we never tell Java programmers about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the answer looks something like applying a conversion to the input and calling the existing overload? That said, if the input is from a column, for example, that doesn't really work.
*/ | ||
@ScriptApi | ||
@Nullable | ||
public static ZonedDateTime lowerBin(@Nullable final ZonedDateTime dateTime, Object intervalNanos, Object offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concerns here.
*/ | ||
@ScriptApi | ||
@Nullable | ||
public static Instant upperBin(@Nullable final Instant instant, Object intervalNanos, Object offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
*/ | ||
@ScriptApi | ||
@Nullable | ||
public static ZonedDateTime upperBin(@Nullable final ZonedDateTime dateTime, Object intervalNanos, Object offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
try: | ||
if resolution == "ns": | ||
if system: | ||
return _JDateTimeUtils.nowSystem() | ||
else: | ||
return _JDateTimeUtils.now() | ||
elif resolution == "ms": | ||
if system: | ||
return _JDateTimeUtils.nowSystemMillisResolution() | ||
else: | ||
return _JDateTimeUtils.nowMillisResolution() | ||
else: | ||
raise ValueError("Unsupported time resolution: " + resolution) | ||
return _JDateTimeUtils.now(system, resolution); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this rewrite everything as a single method call is going to be a fail as we get into more complex parts of the API. I know that we have a lot of things that do branching, possibly in ways where this is not nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really difficult to describe the branching logic in the Python code and then use the description to generate equivalent Java code in the QLP, that's why we want to move these branching logic to the Java API.
IMO, replacing Python wrapper functions with Java ones should only be for those wrapper functions that are expected to be used in query formulas.
if isinstance(interval, str): | ||
interval = parse_duration_nanos(interval) | ||
|
||
if isinstance(offset, str): | ||
offset = parse_duration_nanos(offset) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we've traded branching for rewriting the whole API with Object
. This is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about the branching problem, I think a big part of it is a dispatch problem. We aren't using the type information from the columns or input variables to execute the correct function. Instead, it is all getting forced into this object casting stuff, which I have big concerns about.
I very strongly feel that this should not be merged without corresponding passing tests in TestQueryLanguageParser.java. |
Unfortunately, TestQueryLanguageParser cannot use jpy, so none of this functionality can be accessed. It should however be possible to add a test in At this time however, all actual tests in that project are disabled (see #734), but the tests do build and run, successfully logging that everything was ignored, so new tests could be added in this way. ./gradlew :python-engine-test:check |
https://github.com/niloc132/deephaven-core/pull/new/revive-python-engine-test fixes the broken test env, and adds a sample test that clearly works. Note that an exec context is at least required to do anything interesting, haven't experimented beyond that. |
@@ -1140,6 +1205,16 @@ public Class<?> visit(NameExpr n, VisitArgs printer) { | |||
* throw them to 'findClass()'. Many details are not relevant here. For example, field access is handled by a | |||
* different method: visit(FieldAccessExpr, StringBuilder). | |||
*/ | |||
Map<String, String> pyConstantsMap = Map.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be very late in the method, so that we don't try to provide a type for an absent variable in a non-python language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be conditional on the expression coming from Python, I think.
@@ -465,6 +470,66 @@ private Class<?> findNestedClass(Class<?> enclosingClass, String nestedClassName | |||
return m.get(nestedClassName); | |||
} | |||
|
|||
private boolean pyToJavaReplaced(final Class<?> scope, final MethodCallExpr n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a JavaDoc, and clarify that it is mutating n
.
if (pyToJavaReplaced(scope, n)) { | ||
if (scope != null) { | ||
scope = null; | ||
innerPrinter.reset(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it looks to me like this means we can only replace things that are statically imported. Should change the scope in n
and re-order this to be before the block above? That way, we wouldn't be setting null
scope, or resetting our printer, and we might be able to handle more classes of expression.
@@ -1913,6 +1994,7 @@ public Class<?> visit(MethodCallExpr n, VisitArgs printer) { | |||
} | |||
} else { // Groovy or Java method call | |||
printer.append(innerPrinter); | |||
n.setName(method.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we doing all of our n
mutation in one place?
@@ -1865,6 +1940,13 @@ public Class<?> visit(MethodCallExpr n, VisitArgs printer) { | |||
return result; | |||
}).orElse(null); | |||
|
|||
if (pyToJavaReplaced(scope, n)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename method, maybeReplace
?
@@ -2538,6 +2620,12 @@ public boolean hasStringBuilder() { | |||
return builder != null; | |||
} | |||
|
|||
public void reset() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully my proposed change enables us to delete this.
@ScriptApi | ||
@Nullable | ||
public static Instant lowerBin(@Nullable final Instant instant, Object intervalNanos, Object offset) { | ||
long interval = getDurationNanos(intervalNanos); | ||
long off = getDurationNanos(offset); | ||
|
||
return lowerBin(instant, interval, off); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the answer looks something like applying a conversion to the input and calling the existing overload? That said, if the input is from a column, for example, that doesn't really work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class didn't need to move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should find a way to not move this. How about we have an interface in engine.util
PythonMethodExpressionResolver
or something more appropriate, that just takes a pair of strings?
public interface PythonMethodExpressionResolver {
PyObject resolve(@NotNull final String scope, @NotNull final String name);
}
Then, just have the PythonDeephavenScriptSession
implement that interface.
|
||
def test_epoch_micros_to_zdt(self): | ||
tz = time_zone("ET") | ||
nanos = 12345678987654321 | ||
micros = nanos // 10**3 | ||
dt = dtypes.Instant.j_type.ofEpochSecond(0, micros * 10**3).atZone(tz) | ||
micros = nanos // 10 ** 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can we start using black or some other code beautifier?
@@ -95,7 +95,7 @@ private boolean isValid(String name) { | |||
} | |||
|
|||
private static final Set<String> QUERY_LANG_RESERVED_VARIABLE_NAMES = | |||
Stream.of("in", "not", "i", "ii", "k").collect( | |||
Stream.of("in", "not", "i", "ii", "k", "True", "False", "None").collect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're leaking your Python into my Java....
Just to elaborate on this a bit further — the point of TestQueryLanguageParser is to test the behavior of the parser without the added overhead and complexity of actually evaluating formulas. This should apply to Python as well — we should be able to verify the parser's handling of Python variables, transformation of Python-related formulas (including/especially vectorization and Java function substitution), and return types of formulas using Python methods/variables without actually running a Python environment. Anything that actually requires crossing over to Python should be mocked or refactored to provide interfaces that support no-Python-required implementations we can use for testing from Java. We should be able to know from Java that, given a hypothetical Python function and hypothetical Python variables defined in a test case, the parser would transform a formula the way we expect and find the return type we expect. The more we evolve the parser to improve Python support, the more important this becomes — otherwise, every Python change decreases the overall coverage we get from TestQueryLanguageParser, which increases the odds of introducing issues that can only be caught from Python tests. This gets expensive because the Python tests take way longer to run, are way harder to debug, etc. |
Made unnecessary by #4388 |
Fixes #2303
Partially fixes #2306