Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Canner enterprise connector #172

Merged
merged 3 commits into from
Jun 6, 2023

Conversation

onlyjackfrost
Copy link
Contributor

Description

What this PR does:

  • Support Canner Enterprise as a data source in VulcanSQL
  • Caching feature in VulcanSQL is also supported

We use PostgreSQL Wire Protocol to connect to canner enterprise and execute the sql in VulcanSQL.
Check the doc for more connecting info

The export flow:

  • Use Canner Enterprise API to create an async-query that will execute the sql statement and turn the query result into parquet files and stored in S3
  • wait the async-query to finished
  • Get the presigned URLs of this async-query and download.
    For the canner API info, please check here

Issue ticket number

issue #169

Additional Context

  • The Canner async-query APIs have stored the query result as parquet files in S3, so we don't need to do it by ourselves.

@vercel
Copy link

vercel bot commented May 24, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
vulcan-sql-document ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 5, 2023 10:29am

@onlyjackfrost onlyjackfrost changed the title [DRAFT] Feature/canner enterprise connector [DRAFT] Feature: Canner enterprise connector May 24, 2023
@onlyjackfrost onlyjackfrost force-pushed the feature/canner-enterprise-connector branch from e51a8fa to 81e75e2 Compare May 25, 2023 06:21
@codecov-commenter
Copy link

codecov-commenter commented May 25, 2023

Codecov Report

Patch coverage: 84.74% and project coverage change: -0.22 ⚠️

Comparison is base (885720c) 91.13% compared to head (0b0de36) 90.91%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #172      +/-   ##
===========================================
- Coverage    91.13%   90.91%   -0.22%     
===========================================
  Files          314      321       +7     
  Lines         5006     5183     +177     
  Branches       668      695      +27     
===========================================
+ Hits          4562     4712     +150     
- Misses         316      337      +21     
- Partials       128      134       +6     
Flag Coverage Δ
extension-driver-canner 84.74% <84.74%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/extension-driver-canner/src/lib/cannerAdapter.ts 69.04% <69.04%> (ø)
...kages/extension-driver-canner/test/cannerServer.ts 80.00% <80.00%> (ø)
...xtension-driver-canner/src/lib/cannerDataSource.ts 85.05% <85.05%> (ø)
packages/extension-driver-canner/src/index.ts 100.00% <100.00%> (ø)
packages/extension-driver-canner/src/lib/config.ts 100.00% <100.00%> (ø)
...ages/extension-driver-canner/src/lib/sqlBuilder.ts 100.00% <100.00%> (ø)
...ages/extension-driver-canner/src/lib/typeMapper.ts 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@kokokuo kokokuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides some typo, comment and handle error suggestions, others LGTM 👍 👍

for (const profile of profiles) {
// try to connect by pg wire protocol and make request to api server
this.logger.debug(
`Initializing profile: ${profile.name} using pg driver`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming the pg driver to canner driver or canner driver by pg wire-protocol to recognize it when log?

@@ -0,0 +1,13 @@
export interface IEnvConfig {
// indicates whether the extension is running in k8s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: running on k8s, not running in k8s.

The environment variable IS_INTERNAL is true or false may not be clear to know running on k8s or not, because the env does not talk about k8s, so we may need to make our comment describe more ( or you could rename the variable if you don't mind ), otherwise, our member may not know why we need the env to indicate running on k8s especially, so we should also expose more message, like below:

When it's true means the extension is running on k8s, and the canner data source will send requests by HTTP protocol to make the request successful under the k8s cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename this variable to "isOnKubernetes"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest still adding the comment to your environment, to help members or other users know why we need to use the env and why set the value is true, we will use the HTTP protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delivered in 8ed2dfd

this.ssl = ssl;
}

public async createAsyncQueryResultUrls(sql: string): Promise<string[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the comment on the createAsyncQueryResultUrls method to describe what the methods could do?
Our partner or new member may not have the context to know why we need to get the result URLs. so may suggest you describe what the result urls could do :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comment to explain it, thanks.

});
const { restfulApiBaseEndpoint } = response.data;
if (!restfulApiBaseEndpoint) {
throw new Error(`restfulApiBaseEndpoint is not found`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restfulApiBaseEndpoint is a certain field from the /cluster-info api data, however, not all members know the fields of the /cluster-info API except they open the canner enterprise to dig in.

Suggest using a more usual or common description to show, e.g:

The restful API base endpoint is not found, please check "restfulApiBaseEndpoint" field from "/cluster-info" endpoint of Canner Enterprise

Comment on lines 105 to 108
response = await this.workspaceRequest(
'get',
`/v2/async-queries/${requestId}`
);
Copy link
Contributor

@kokokuo kokokuo May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to handle the status, but seems that you use twice to same code to call the requestId and get the current response status, maybe we could use a function expression under the method to reuse ( because the behavior is small, it's ok ):

const getResponse = async (requestId: string) => await this.workspaceRequest('get',`/v2/async-queries/${requestId}`)

let response = await getResponse(requestId)
....
while (!['FINISHED', 'FAILED'].includes(status)) {
  await new Promise((resolve) => setTimeout(resolve, 1000));
  response = await getResponse(requestId);
}

'get',
`/v2/async-queries/${requestId}/result/urls`
);
return data.urls || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the /v2/async-queries/<requestId>/result/urls will throw an error from Canner Enterprise and generate an error message, maybe we should check if exist any errors and throw it.

Tips: we could handle the error in the WorkspaceRequest directly, then you don't need to handle it in different places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use status to check and handle the error message from the canner server

this.ssl = ssl;
}

public async createAsyncQueryResultUrls(sql: string): Promise<string[]> {
Copy link
Contributor

@kokokuo kokokuo May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using the logger.debug ( from our getLogger put in the core/utils ) to record at least steps:

  • send the async query to the Canner Enterprise
  • get the query result parquet files from URLs.

It could help the developer and QA to know the current status, working phase, and approximate location on the K8S when an error happened and the error is not caused by what we defined especially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you miss the suggestion or ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delivered in 8ed2dfd

type: 'canner',
connection: {
host: process.env['CANNER_HOST'],
port: process.env['CANNER_PORT'] || 7432,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why the default is 7432, 7432 port is PG wire-protocol default port ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this introduction, the port of the PG wire protocol is 7432, so I set the default port to 7432

const response = await axios.get(url, {
responseType: 'stream',
});
const fileName = url.split('/').pop()?.split('?')[0] || `part${index}`;
Copy link
Contributor

@kokokuo kokokuo May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you split / and pop last slice, then split(?) and get the first value, because we may not know the url` pattern, could you comment to give a sample and explain to make others know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the pattern of a presigned-url from AWS, it's based on my observation and this StackOverflow.
The file name will be a substring that is after the last "/" and followed by the "?" and the query string

return urls;
}

private async workspaceRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to define a method to reuse for calling the workspace request 👍

        - can access data sources from canner through this extension
        - support cache feature in data source "canner enterprise"
@onlyjackfrost onlyjackfrost force-pushed the feature/canner-enterprise-connector branch from 0b0de36 to a65e56c Compare June 1, 2023 03:14
@kokokuo kokokuo linked an issue Jun 1, 2023 that may be closed by this pull request
Copy link
Contributor

@kokokuo kokokuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the comment, here is still 4 suggestions ( including missed and incorrect handle solution, and comment suggestions.), Others LGTM 👍

this.ssl = ssl;
}

public async createAsyncQueryResultUrls(sql: string): Promise<string[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you miss the suggestion or ?

Comment on lines 70 to 76
if (response.status !== 200) {
throw new Error(
`Failed to get workspace request "${urlPath}" data, status: ${
response.status
}, data: ${JSON.stringify(response.data)}`
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you survey the error handling of the axios ?

The axios will throw erro when the status is not 200, so according to your logistic, it won' t trigger the if-else, please:
https://www.axios-http.cn/docs/handling_errors
https://stackoverflow.com/questions/49967779/axios-handling-errors

Otherwise, you should also check the Canner Enterprise hot to respond the error and it code, please see how we handle the different errors and wrap to HTTP status code to 500 when happened the errors, the reason you could see the comment https://github.com/Canner/canner-ui/blob/5ea2e698ebad8ed953daabe362e773461701bd09/server/errors/formatUnifiedError.ts#L61.

Btw, we also use the axios in Canner to handle the restful API from SQL Engine, and they also will throw different error codes, so that why we use the try-catch to handle it: https://github.com/Canner/canner-ui/blob/5ea2e698ebad8ed953daabe362e773461701bd09/server/adapters/externalQueryTable.ts#L45

And calling the async-query API, it also may throw the domain error and cause the 500 status code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delivered in 8ed2dfd

it('Data source should throw when fail to export data', async () => {
// Arrange
sinon.default
.stub(CannerAdapter.prototype, 'createAsyncQueryResultUrls')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding the comment to describe what reason so need to use prototype to stub the method, not usual solution :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delivered in 8ed2dfd

@@ -0,0 +1,13 @@
export interface IEnvConfig {
// indicates whether the extension is running in k8s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest still adding the comment to your environment, to help members or other users know why we need to use the env and why set the value is true, we will use the HTTP protocol.

Copy link
Contributor

@kokokuo kokokuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, LGTM 👍

@kokokuo kokokuo merged commit a0711f0 into develop Jun 6, 2023
@kokokuo kokokuo deleted the feature/canner-enterprise-connector branch June 6, 2023 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Canner Enterprise DataSource
3 participants