Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAP HANA DATABASE Source support improvement[优化SAPhana数据库的支持] #1959

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

wwsheng009
Copy link
Contributor

Pull Request Template

Description

SAP HANA数据作为SAP ERP的数据库,内部有数量巨大的表数据,使用supersonic进行数据挖掘与探索也会不一个不错的选择,在目前的测试中发现SAP HANA数据库中存在以下与普通数据库不兼容的地方:

  • 在sql语句中,如果没有使用双引号,那么会自动的把字段或是数据库表名称转换成大写,但是saphana又允许创建小写的字段或是数据库表。
  • SAP hana关键字比标准的sql标准多出一些关键字。所以在使用calcite进行SQL优化时,会把这些额外的关键字加上了双引号,导致异常。

在此交提交修正有以下的调整:

  • 在SqlQueryConverter-convertNameToBizName处理之后,针对sap hana进行特别处理,对非大写的字段进行处理,使用双引号包含读取字段,别名字段,如果在parse阶段处理,会导致模型字段翻译失败。
  • 针对问题2,在sql rewrite阶段进行字符串替换,针对calcite进行兼容处理。

另外修改一个使用zhipu模型embedding时,程序报错缺少对应的时间设置参数,只是简单的增加默认的60秒超时。

Incompatibilities in the SAP HANA Database:

Adjustments in This Submission:

  • After processing in SqlQueryConverter-convertNameToBizName, special handling for SAP HANA is done. For non-uppercase fields, double quotes are used to enclose fields and alias fields. Processing this during the parse stage will result in model field translation failure.
  • For issue 2, string replacement is done during the SQL rewrite stage to handle Calcite compatibility.

Additional Modification:

A simple change is made to add a default 60-second timeout parameter to prevent the program from reporting an error when using the Zhipu model embedding.

Databases Supporting Lowercase Field Names:

  • Databases that support lowercase field names can better align with the natural case sensitivity required in certain applications.

Support for Complex SAP HANA SQL:

These databases can support complex SQL, such as:

  • Nested subqueries
  • Common Table Expressions (CTEs)
  • Window functions
  • Recursive queries
  • Advanced joins (e.g., full outer joins, cross joins)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Test A
    新增处理功能点方法SqlReplaceHelper->replaceAliasFieldName,用于替换sql中的as字段名。
  • Test B

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Additional information

Any additional information, configuration or data that might be necessary to reproduce the issue.

针对复杂HANA语句处理:
image

use the zhipu model for embeding test
image

可支持使用小写字段的数据库,也可支持以下复杂的sql:

SELECT
  SCHEMA_NAME,
  TABLE_NAME,
  TOTAL,
  TOTAL_MOD,
  INSERTS,
  UPDATES,
  DELETES,
  REPLACES,
  SELECTS,
  TO_VARCHAR(LAST_MODIFY_TIME, 'YYYY/MM/DD HH24:MI:SS') LAST_MODIFY_TIME
FROM
( SELECT
    TS.SCHEMA_NAME,
    TS.TABLE_NAME,
    TS.INSERT_COUNT + TS.DELETE_COUNT + TS.UPDATE_COUNT + TS.REPLACE_COUNT TOTAL_MOD,
    TS.INSERT_COUNT + TS.DELETE_COUNT + TS.UPDATE_COUNT + TS.REPLACE_COUNT + TS.SELECT_COUNT TOTAL,
    TS.INSERT_COUNT INSERTS,
    TS.UPDATE_COUNT UPDATES,
    TS.DELETE_COUNT DELETES,
    TS.REPLACE_COUNT REPLACES,
    TS.SELECT_COUNT SELECTS,
    BI.RESULT_ROWS,
    CASE BI.TIMEZONE WHEN 'UTC' THEN ADD_SECONDS(TS.LAST_MODIFY_TIME, SECONDS_BETWEEN(CURRENT_TIMESTAMP, CURRENT_UTCTIMESTAMP)) ELSE TS.LAST_MODIFY_TIME END LAST_MODIFY_TIME,
    ROW_NUMBER () OVER 
    ( ORDER BY
        MAP(BI.ORDER_BY, 'TABLE', TS.SCHEMA_NAME, ''),
        MAP(BI.ORDER_BY, 'TABLE', TS.TABLE_NAME, ''),
        MAP(BI.ORDER_BY, 
          'TOTAL', TS.INSERT_COUNT + TS.DELETE_COUNT + TS.UPDATE_COUNT + TS.REPLACE_COUNT + TS.SELECT_COUNT,
          'TOTAL_MOD', TS.INSERT_COUNT + TS.DELETE_COUNT + TS.UPDATE_COUNT + TS.REPLACE_COUNT,
          'INSERT', TS.INSERT_COUNT,
          'UPDATE', TS.UPDATE_COUNT,
          'DELETE', TS.DELETE_COUNT,
          'REPLACE', TS.REPLACE_COUNT,
          'SELECT', TS.SELECT_COUNT) DESC
    ) ROW_NUM
  FROM
  ( SELECT                /* Modification section */
      'SERVER' TIMEZONE,                              /* SERVER, UTC */
      '%' SCHEMA_NAME,
      '%' TABLE_NAME,
      'TOTAL' ORDER_BY,        /* TABLE, TOTAL, TOTAL_MOD, INSERT, UPDATE, DELETE, REPLACE, SELECT */
      50 RESULT_ROWS
    FROM
      DUMMY
  ) BI,
    M_TABLE_STATISTICS TS
  WHERE
    TS.SCHEMA_NAME LIKE BI.SCHEMA_NAME AND
    TS.TABLE_NAME LIKE BI.TABLE_NAME
  ORDER BY
    MAP(BI.ORDER_BY, 'TABLE', TS.SCHEMA_NAME, ''),
    MAP(BI.ORDER_BY, 'TABLE', TS.TABLE_NAME, ''),
    MAP(BI.ORDER_BY, 
      'TOTAL', TS.INSERT_COUNT + TS.DELETE_COUNT + TS.UPDATE_COUNT + TS.REPLACE_COUNT,
      'INSERT', TS.INSERT_COUNT,
      'UPDATE', TS.UPDATE_COUNT,
      'DELETE', TS.DELETE_COUNT,
      'REPLACE', TS.REPLACE_COUNT) DESC
)
WHERE
  ( RESULT_ROWS = -1 OR ROW_NUM <= RESULT_ROWS )

@jerryjzhang jerryjzhang merged commit b57eed4 into tencentmusic:master Dec 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants