Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace some wrapper funcs in the deephaven.time module with Java wrapped method calls when used in formulas #4095

Closed

Conversation

jmao-denver
Copy link
Contributor

@jmao-denver jmao-denver commented Jun 28, 2023

Fixes #2303
Partially fixes #2306

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty nice!

@jmao-denver jmao-denver changed the title Replace Python wrapper function calls with Java wrapped method calls in formulas Swap Python wrapper function calls with Java wrapped method calls in formulas Jun 30, 2023
@jmao-denver
Copy link
Contributor Author

jmao-denver commented Jul 3, 2023

This script finishes in 3s running locally on MBP x86_64 with the changes. Running the same in the demo system takes 40s or so.

from deephaven import empty_table
import deephaven.time
import deephaven.time as dhtime
from deephaven.time import now as dhnow
from deephaven.time import now, MINUTE, lower_bin, upper_bin

from deephaven import garbage_collect
system_arg = True
resolution_arg = "ns"
nrow = 1_000_000
formulas = [
    # "deephaven.time.now()", 
    # "dhtime.now()", 
    # "dhnow()", 
    # "now()",
    # "now(system_arg)",
    # "now(system_arg, resolution_arg)",
    # Python built-in constants are now recognized.
    "now(True, `ns`)",
    # "now(true, `ns`)",
    # QLP convert it to system==True/true
    #"now(system=True)",
    ]
for f in formulas:
    t = empty_table(nrow).update(f"TS = {f}")
    t1 = t.update("LowerBined = lower_bin(TS, `PT00:00:00.001000000`, MINUTE)")
    t1 = t.update("LowerBined = lower_bin(TS, 1000000000)")
    t1 = t.update("UpperBinned = upper_bin(TS, `PT00:00:00.001000000`, MINUTE)")
    t1 = t.update("UpperBined = upper_bin(TS, 1000000000)")

@jmao-denver jmao-denver changed the title Swap Python wrapper function calls with Java wrapped method calls in formulas Replace Python wrapper function calls with Java wrapped method calls in formulas Jul 6, 2023
@jmao-denver jmao-denver force-pushed the 2306-pycall-to-jmtdcall branch 3 times, most recently from 66ceb82 to 68c765e Compare July 10, 2023 17:39
@jmao-denver jmao-denver requested a review from niloc132 July 10, 2023 18:59
@jmao-denver jmao-denver changed the title Replace Python wrapper function calls with Java wrapped method calls in formulas Replace some wrapper funcs in deephaven.time module with Java wrapped method calls when used in formulas Jul 10, 2023
@jmao-denver jmao-denver marked this pull request as ready for review July 10, 2023 19:01
@jmao-denver jmao-denver force-pushed the 2306-pycall-to-jmtdcall branch from b43036f to 776f097 Compare July 10, 2023 19:10
@jmao-denver jmao-denver changed the title Replace some wrapper funcs in deephaven.time module with Java wrapped method calls when used in formulas Replace some wrapper funcs in the deephaven.time module with Java wrapped method calls when used in formulas Jul 10, 2023
@@ -334,6 +334,26 @@ public static Instant now() {
return currentClock().instantNanos();
}

@ScriptApi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably we could take this out of the API in Python, instead of adding a new Java method. Presumably if we thought it was really necessary, we would already have a Java method like this. TimeUnit is a much better way to express this, rather than a String.

Comment on lines +2830 to +2837
@ScriptApi
@Nullable
public static Instant lowerBin(@Nullable final Instant instant, Object intervalNanos, Object offset) {
long interval = getDurationNanos(intervalNanos);
long off = getDurationNanos(offset);

return lowerBin(instant, interval, off);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have big concerns about this:

  1. Are we going to have to implement every primitive as an object? This is not good.
  2. I have concerns that these object methods will break the query language because something ends up being ambiguous to the java compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new overload is to support Union type in the Python API, it will not conflict with the original overload that takes primitive type parameters. The other approach would be to create multiple overload methods for all the possible combinations of parameter types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An overload per type is the Java way. This current way is not usable from Java. If we want it, at least hide it in some other class that we never tell Java programmers about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the answer looks something like applying a conversion to the input and calling the existing overload? That said, if the input is from a column, for example, that doesn't really work.

*/
@ScriptApi
@Nullable
public static ZonedDateTime lowerBin(@Nullable final ZonedDateTime dateTime, Object intervalNanos, Object offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concerns here.

*/
@ScriptApi
@Nullable
public static Instant upperBin(@Nullable final Instant instant, Object intervalNanos, Object offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

*/
@ScriptApi
@Nullable
public static ZonedDateTime upperBin(@Nullable final ZonedDateTime dateTime, Object intervalNanos, Object offset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Comment on lines 89 to +90
try:
if resolution == "ns":
if system:
return _JDateTimeUtils.nowSystem()
else:
return _JDateTimeUtils.now()
elif resolution == "ms":
if system:
return _JDateTimeUtils.nowSystemMillisResolution()
else:
return _JDateTimeUtils.nowMillisResolution()
else:
raise ValueError("Unsupported time resolution: " + resolution)
return _JDateTimeUtils.now(system, resolution);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this rewrite everything as a single method call is going to be a fail as we get into more complex parts of the API. I know that we have a lot of things that do branching, possibly in ways where this is not nice.

Copy link
Contributor Author

@jmao-denver jmao-denver Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really difficult to describe the branching logic in the Python code and then use the description to generate equivalent Java code in the QLP, that's why we want to move these branching logic to the Java API.
IMO, replacing Python wrapper functions with Java ones should only be for those wrapper functions that are expected to be used in query formulas.

Comment on lines -1712 to -1717
if isinstance(interval, str):
interval = parse_duration_nanos(interval)

if isinstance(offset, str):
offset = parse_duration_nanos(offset)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we've traded branching for rewriting the whole API with Object. This is a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about the branching problem, I think a big part of it is a dispatch problem. We aren't using the type information from the columns or input variables to execute the correct function. Instead, it is all getting forced into this object casting stuff, which I have big concerns about.

py/server/tests/test_time.py Show resolved Hide resolved
@jmao-denver jmao-denver requested a review from chipkent July 13, 2023 18:41
@rbasralian
Copy link
Contributor

rbasralian commented Jul 13, 2023

I very strongly feel that this should not be merged without corresponding passing tests in TestQueryLanguageParser.java.

#3267

@niloc132
Copy link
Member

I very strongly feel that this should not be merged without corresponding passing tests in TestQueryLanguageParser.java.

Unfortunately, TestQueryLanguageParser cannot use jpy, so none of this functionality can be accessed. It should however be possible to add a test in :python-engine-test as a junit, so that we can run certain tests from an environment that has jpy and the deephaven wheel installed.

At this time however, all actual tests in that project are disabled (see #734), but the tests do build and run, successfully logging that everything was ignored, so new tests could be added in this way.

./gradlew :python-engine-test:check

@niloc132
Copy link
Member

https://github.com/niloc132/deephaven-core/pull/new/revive-python-engine-test fixes the broken test env, and adds a sample test that clearly works. Note that an exec context is at least required to do anything interesting, haven't experimented beyond that.

@jmao-denver jmao-denver requested a review from rbasralian July 14, 2023 17:19
@@ -1140,6 +1205,16 @@ public Class<?> visit(NameExpr n, VisitArgs printer) {
* throw them to 'findClass()'. Many details are not relevant here. For example, field access is handled by a
* different method: visit(FieldAccessExpr, StringBuilder).
*/
Map<String, String> pyConstantsMap = Map.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be very late in the method, so that we don't try to provide a type for an absent variable in a non-python language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be conditional on the expression coming from Python, I think.

@@ -465,6 +470,66 @@ private Class<?> findNestedClass(Class<?> enclosingClass, String nestedClassName
return m.get(nestedClassName);
}

private boolean pyToJavaReplaced(final Class<?> scope, final MethodCallExpr n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a JavaDoc, and clarify that it is mutating n.

Comment on lines +1943 to +1948
if (pyToJavaReplaced(scope, n)) {
if (scope != null) {
scope = null;
innerPrinter.reset();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it looks to me like this means we can only replace things that are statically imported. Should change the scope in n and re-order this to be before the block above? That way, we wouldn't be setting null scope, or resetting our printer, and we might be able to handle more classes of expression.

@@ -1913,6 +1994,7 @@ public Class<?> visit(MethodCallExpr n, VisitArgs printer) {
}
} else { // Groovy or Java method call
printer.append(innerPrinter);
n.setName(method.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we doing all of our n mutation in one place?

@@ -1865,6 +1940,13 @@ public Class<?> visit(MethodCallExpr n, VisitArgs printer) {
return result;
}).orElse(null);

if (pyToJavaReplaced(scope, n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename method, maybeReplace?

@@ -2538,6 +2620,12 @@ public boolean hasStringBuilder() {
return builder != null;
}

public void reset() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully my proposed change enables us to delete this.

Comment on lines +2830 to +2837
@ScriptApi
@Nullable
public static Instant lowerBin(@Nullable final Instant instant, Object intervalNanos, Object offset) {
long interval = getDurationNanos(intervalNanos);
long off = getDurationNanos(offset);

return lowerBin(instant, interval, off);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the answer looks something like applying a conversion to the input and calling the existing overload? That said, if the input is from a column, for example, that doesn't really work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class didn't need to move.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a way to not move this. How about we have an interface in engine.util PythonMethodExpressionResolver or something more appropriate, that just takes a pair of strings?

public interface PythonMethodExpressionResolver {
     PyObject resolve(@NotNull final String scope, @NotNull final String name);
}

Then, just have the PythonDeephavenScriptSession implement that interface.


def test_epoch_micros_to_zdt(self):
tz = time_zone("ET")
nanos = 12345678987654321
micros = nanos // 10**3
dt = dtypes.Instant.j_type.ofEpochSecond(0, micros * 10**3).atZone(tz)
micros = nanos // 10 ** 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can we start using black or some other code beautifier?

@@ -95,7 +95,7 @@ private boolean isValid(String name) {
}

private static final Set<String> QUERY_LANG_RESERVED_VARIABLE_NAMES =
Stream.of("in", "not", "i", "ii", "k").collect(
Stream.of("in", "not", "i", "ii", "k", "True", "False", "None").collect(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're leaking your Python into my Java....

@rbasralian
Copy link
Contributor

I very strongly feel that this should not be merged without corresponding passing tests in TestQueryLanguageParser.java.

#3267

Just to elaborate on this a bit further — the point of TestQueryLanguageParser is to test the behavior of the parser without the added overhead and complexity of actually evaluating formulas. This should apply to Python as well — we should be able to verify the parser's handling of Python variables, transformation of Python-related formulas (including/especially vectorization and Java function substitution), and return types of formulas using Python methods/variables without actually running a Python environment.

Anything that actually requires crossing over to Python should be mocked or refactored to provide interfaces that support no-Python-required implementations we can use for testing from Java. We should be able to know from Java that, given a hypothetical Python function and hypothetical Python variables defined in a test case, the parser would transform a formula the way we expect and find the return type we expect.

The more we evolve the parser to improve Python support, the more important this becomes — otherwise, every Python change decreases the overall coverage we get from TestQueryLanguageParser, which increases the odds of introducing issues that can only be caught from Python tests. This gets expensive because the Python tests take way longer to run, are way harder to debug, etc.

@pete-petey pete-petey modified the milestones: June 2023, August 2023 Jul 31, 2023
@jmao-denver
Copy link
Contributor Author

Made unnecessary by #4388

@jmao-denver jmao-denver closed this Sep 5, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2023
@jmao-denver jmao-denver deleted the 2306-pycall-to-jmtdcall branch September 5, 2023 14:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python Wrapper Properties include Java static method lower_bin should return Instant type
6 participants