[BUG] Unimplemented PPL Sort Syntax #3180

currantw · 2024-12-02T20:13:12Z

What is the bug?

PPL supports syntax for sorting numerically, lexicographically, or by IP address (e.g. sort num(field_name), sort str(field_name), and sort ip(field_name), respectively) -- but this syntax has no effect on the resulting sort order. It should be removed.

This syntax appears to have been replaced by the addition of data types, and sorting is always done according to the data type (i.e. numerical data types are sorted numerically, strings are sorted lexicographically), rather than as specified by str, num, or ip (i.e. these keywords have no effect on the result). Moreover, this syntax is not mentioned in the supporting user documentation for either OpenSearch SQL or Spark, and there is no actual implementation of the functionality in either code bases - the specified "sort type" is simply ignored.

How can one reproduce the bug?

Steps to reproduce the behaviour:

Create data set with numerical field
Sort using the str keyword (i.e. sort str(field_name))
Observed: syntax is valid, but sorting is still numerical

What is the expected behaviour?

As a user, you would likely expect the result to be sorted as specified (i.e. numerically, lexicographically, or by IP address), but the actual behaviour is to always sort by the field's data type.

If a user still wants to sort a numerical field lexicographically (for example) once this syntax is removed, they can still do so by first casting the field to a numerical data type before sorting it.

What is your host/environment?

N/A

Do you have any screenshots?

None

Do you have any additional context?

Related to #3145 (Add IP data type) and opensearch-project/opensearch-spark#963 (same issue for Spark).

The text was updated successfully, but these errors were encountered:

Swiddis · 2024-12-17T18:53:02Z

Thanks for the bug report!

Where exactly do we explain this syntax in the docs? I see from the Spark PR that there's some documentation examples without explanations, I guess those are vestigial examples that imply it's a leftover grammar quirk.

If it's a grammar quirk, I'm mostly concerned with the possibility of a breaking change when people previously used these queries fine (and perhaps expected them to work) after copying the examples. I think the "safe" route is to mark these functions as deprecated no-ops in the documentation/as part of the response for such queries, clean out other doc references to them, and then only remove them from the grammar next major release.

Swiddis · 2024-12-17T18:56:30Z

In line with the above, I think this is more a documentation issue than a bug, since there's a clear workaround for the faulty behavior and the old sort operations haven't been supported for a very long time (or perhaps ever?). I'm not sure there's significant harm in letting the no-ops exist in peace.

anasalkouz · 2024-12-17T19:07:33Z

I agree with @Swiddis. I think we can remove it from our documentation safely. Removing them from the grammar might cause a breaking change for customers, we can remove those changes as part of our next major release.

currantw · 2024-12-17T23:49:22Z

For clarity, here is a test case that illustrates the bug:

Data

Dataset: addresses

street_number (string)	street_name (string)
725	Main Street
1127	Arbutus Street
43	Victoria Drive

Query

search source=addresses | sort num(street_number) | fields street_number, street_name

Expected Output

street_number	street_name
43	Victoria Drive
725	Main Street
1127	Arbutus Street

Actual Output

street_number	street_name
1127	Arbutus Street
43	Victoria Drive
725	Main Street

currantw · 2024-12-17T23:57:22Z

I agree with @Swiddis. I think we can remove it from our documentation safely. Removing them from the grammar might cause a breaking change for customers, we can remove those changes as part of our next major release.

Thanks @anasalkouz and @Swiddis. Agreed, makes more sense to avoid introducing a breaking change.

The existing syntax seems to be modelled on Spunk (Splunk sort documentation). If it is desirable to keep this syntax, for ease of use for those familiar with Splunk, we could alternately fix it so that it does work. This would be relatively easily to implement, since these sort field keywords could simply work as a "shorthand" for casting (see below).

With Sort Field Keyword	With Cast Function
`sort auto(field_name)`	`sort field_name`
`sort num(field_name)`	`sort cast(field_name as double)`
`sort str(field_name)`	`sort cast(field_name as string)`
`sort ip(field_name)`	`sort cast(field_name as ip)`

Please let me know what you think!

currantw · 2024-12-18T00:20:15Z

Thanks for the bug report!

Where exactly do we explain this syntax in the docs? I see from the Spark PR that there's some documentation examples without explanations, I guess those are vestigial examples that imply it's a leftover grammar quirk.

If it's a grammar quirk, I'm mostly concerned with the possibility of a breaking change when people previously used these queries fine (and perhaps expected them to work) after copying the examples. I think the "safe" route is to mark these functions as deprecated no-ops in the documentation/as part of the response for such queries, clean out other doc references to them, and then only remove them from the grammar next major release.

@Swiddis I don't that this syntax is actually referenced anywhere in the current documentation for the SQL project, which makes it less likely that customers may depend on it/have unwittingly copied it. But I don't know if it may have been part of the documentation in the past, so it's probably best to, as you suggest, wait until the next major release to remove it (if we actually do want to remove it - see my comment above).

Swiddis · 2024-12-19T17:09:15Z

I like the idea of implementing it as a shorthand, type(var) == cast(var as type) is exactly how I had read the syntax in isolation anyways (as opposed to any extra semantics that may exist with "sort type"). I'll mark the issue as an enhancement, not sure what kind of timeline we're looking at to prioritize it.

I'm willing to mark this as help-wanted from external contributors since it shouldn't be very complex, if we add some links to the relevant code pieces we can also consider marking this as a good first issue.

currantw added bug Something isn't working untriaged labels Dec 2, 2024

This was referenced Dec 2, 2024

[BUG] Unimplemented PPL Sort Syntax opensearch-project/opensearch-spark#963

Open

[FEATURE] Add Support for IP Data Type #3145

Closed

currantw mentioned this issue Dec 16, 2024

#963 Deprecate Unimplemented PPL Sort Syntax opensearch-project/opensearch-spark#994

Open

5 tasks

acarbonetto mentioned this issue Dec 17, 2024

#3145 Add IP Address Data Type #3175

Merged

7 tasks

anasalkouz removed the untriaged label Dec 17, 2024

Swiddis added documentation Improvements or additions to documentation and removed bug Something isn't working labels Dec 17, 2024

currantw changed the title ~~[BUG] Outdated PPL Sorting Syntax~~ [BUG] Unimplemented PPL Sort Syntax Dec 19, 2024

Swiddis added enhancement New feature or request help wanted Extra attention is needed and removed documentation Improvements or additions to documentation labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unimplemented PPL Sort Syntax #3180

[BUG] Unimplemented PPL Sort Syntax #3180

currantw commented Dec 2, 2024 •

edited

Loading

Swiddis commented Dec 17, 2024 •

edited

Loading

Swiddis commented Dec 17, 2024

anasalkouz commented Dec 17, 2024

currantw commented Dec 17, 2024 •

edited

Loading

currantw commented Dec 17, 2024 •

edited

Loading

currantw commented Dec 18, 2024 •

edited

Loading

Swiddis commented Dec 19, 2024 •

edited

Loading

[BUG] Unimplemented PPL Sort Syntax #3180

[BUG] Unimplemented PPL Sort Syntax #3180

Comments

currantw commented Dec 2, 2024 • edited Loading

Swiddis commented Dec 17, 2024 • edited Loading

Swiddis commented Dec 17, 2024

anasalkouz commented Dec 17, 2024

currantw commented Dec 17, 2024 • edited Loading

currantw commented Dec 17, 2024 • edited Loading

currantw commented Dec 18, 2024 • edited Loading

Swiddis commented Dec 19, 2024 • edited Loading

currantw commented Dec 2, 2024 •

edited

Loading

Swiddis commented Dec 17, 2024 •

edited

Loading

currantw commented Dec 17, 2024 •

edited

Loading

currantw commented Dec 17, 2024 •

edited

Loading

currantw commented Dec 18, 2024 •

edited

Loading

Swiddis commented Dec 19, 2024 •

edited

Loading