Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DECIMAL type #122

Open
alamb opened this issue Apr 26, 2021 · 16 comments
Open

Implement DECIMAL type #122

alamb opened this issue Apr 26, 2021 · 16 comments
Assignees
Labels
datafusion Changes in the datafusion crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10818

The TPC-H benchmarks correctly specify that all MONEY columns are DECIMAL type (precision and scale are not specified). We currently use DataType::Float64 which is much lighter than a true Decimal type.

To be a valid benchmark we need to ensure we support the same precision as the reference implementation.

@alamb alamb added the datafusion Changes in the datafusion crate label Apr 26, 2021
@j-a-m-l
Copy link
Contributor

j-a-m-l commented Oct 30, 2021

@alamb any idea about how to start this?

@xudong963
Copy link
Member

Hi @j-a-m-l, I see arrow-rs has supported Decimal, maybe we can directly change here https://github.com/apache/arrow-datafusion/blob/22a7d74e9dece44adb790b5ea3fca27d26a2fe51/datafusion/src/sql/planner.rs#L312 cc @alamb

@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2021

@j-a-m-l I think a @xudong963 has a good point that Arrow already includes some support for DecimalArrays.

A good place to start then be to write some tests in sql.rs that create a TableProvider with a column of DecimalArray type and try to run some basic sql

The typical order I would try is something like

// projection
SELECT col + 4 ... FROM table_with_decimal_col

// filter
SELECT col  FROM table_with_decimal_col where col < 5

// sort
SELECT col  FROM table_with_decimal_col order by col

// grouping / aggregatie
SELECT sum(col)  FROM table_with_decimal_col;
SELECT count(other_col)  FROM table_with_decimal_col group by col

Writing those tests is probably a good way to figure out how to plumb through the support in DataFusion -- I would also recommend doing it in that order (projection, filter, sort, grouping / aggregation)

@j-a-m-l
Copy link
Contributor

j-a-m-l commented Nov 1, 2021

Thanks @xudong963 and @alamb

@liukun4515
Copy link
Contributor

Is there any plan for implementation Decimal Type for datafusion?
@alamb

@xudong963
Copy link
Member

Hi @j-a-m-l, do you have time to work on this?

@liukun4515
Copy link
Contributor

Is there any plan for implementation Decimal Type for datafusion? @alamb

I want to take some tasks to support the decimal data type in the datafusion.

@alamb
Copy link
Contributor Author

alamb commented Nov 8, 2021

Is there any plan for implementation Decimal Type for datafusion?

I don't know of any plans other than what is on this ticket. As I mentioned in #122 (comment) I think figuring out what subtasks are needed (by writing some tests and trying out what works / doesn't work) is a good first step

@j-a-m-l
Copy link
Contributor

j-a-m-l commented Nov 9, 2021

No, @xudong963. Currently I don't have enough time to do that, and I'll probably focus first on fixing other issues that I've found, such as #1173.

@alamb
Copy link
Contributor Author

alamb commented Nov 9, 2021

@liukun4515
Copy link
Contributor

Thanks for the feedback.
I will begin to support DECIMAL datatype.
Please assign this issue to me @alamb

@liukun4515
Copy link
Contributor

liukun4515 commented Nov 22, 2021

utils:

@liukun4515
Copy link
Contributor

In order to implement decimal in datafusion, we should ref some data type rules, like spark data type and type coercion.
Now, the numerical data type just support some native type and the type coercion rules are simple. if we add the decimal data type, the type checking, type cast, type promotion and precision checking are all need to be considered.

@alamb
Copy link
Contributor Author

alamb commented Nov 23, 2021 via email

@liukun4515
Copy link
Contributor

liukun4515 commented Nov 24, 2021

I know of in DataFusion today that do this type coercion

yep, Now I just try to support some simple demo code for decimal. For example function(min,max) and just Projection the column with decimal data type (without any operation and functions)

The rule of conversion in the above two files should be refactored, If we want to align the behavior with PG database exactly.

In the PG database, decimal and other numerical data type can be convert to each other, and in some case the varchar also can be convert to a decimal data type. In my opinion, we can support some simple and right rules first, and support other complex features later. For example, In the add mathematics operation for the decimal datatype, if left or right is the decimal datatype, we must require that two size are all decimal type. It is means that we don't support the cast or try cast to or from decimal data type implicitly.

What do you think about? @alamb

@alamb
Copy link
Contributor Author

alamb commented Nov 24, 2021

In my opinion, we can support some simple and right rules first, and support other complex features later

I think this is a great idea -- taking stock / consolidating the existing coercion logic might be a good place to start (before you try to extend it to support Decimal)

alamb added a commit that referenced this issue Dec 18, 2021
Thanks to the work of @rdettai @xudong963  and others, we are making great progress here

Also added #122 as @liukun4515  is actively working on it

cc @hntd187
Dandandan pushed a commit that referenced this issue Dec 18, 2021
Thanks to the work of @rdettai @xudong963  and others, we are making great progress here

Also added #122 as @liukun4515  is actively working on it

cc @hntd187
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

No branches or pull requests

4 participants