Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix][Connector-V2] Fixed clickhouse connectors cannot stop under multiple parallelism #7921

Merged
merged 16 commits into from
Oct 29, 2024

Conversation

YOMO-Lee
Copy link
Contributor

Fixed the bug where clickhouse connectors cannot stop properly under multiple parallelism

YOMO-Lee and others added 13 commits October 22, 2024 17:28
Supplement and optimize the description of the LocalFile connector on filtering files
[(#7887)](#7887)
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner,  Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
1、When the ClickHouse connector is set to multi parallelism, the task extraction is completed but cannot be stopped normally
[(#7897)](#7897)

2、Added E2E test cases for this issue [(#7897)](#7897)

3、Local developers want to observe **Job Progress Information** in a timely manner, Need to modify the following configuration.The configuration in config is invalid
```
seatunnel engine/seatunnel-engineer-common/src/main/resources/seatunnely.yaml
```
Continue to optimize the document about filtering files and add some examples
[(#7887)](#7887)
@@ -20,6 +20,7 @@ seatunnel:
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value of print-job-metrics-info-interval is 60. This is valid without change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the default value is also used if it is not changed here, but here is a parameter to let you know where you can change it.
Otherwise, if you run an example in the project and want to change the parameters, the results will not work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value of print-job-metrics-info-interval is 60. This is valid without change

I think this parameter can be added here, at least adding the default value will not cause any other bad effects
But it worked for me, at first I wanted to change the configuration, I didn't know where to change, I searched a lot of places in the project, and it didn't take effect after the change, and then it was someone else who pointed out that I knew to change here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can configure it in your local environment. But do not commit it. Please revert it.

@Hisoka-X Hisoka-X changed the title [Fix] ClickHouse Connector BUG (#7897) [Fix][Connector-V2] Fixed clickhouse connectors cannot stop under multiple parallelism Oct 28, 2024
@Hisoka-X Hisoka-X added the First-time contributor First-time contributor label Oct 28, 2024
@@ -101,6 +101,13 @@ public void testClickhouse(TestContainer container) throws Exception {
clearSinkTable();
}

@TestTemplate
public void testSourceParallelism(TestContainer container) throws Exception {
LOG.info("=========Multi parallelism testing begins===========");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.info("=========Multi parallelism testing begins===========");

@@ -20,6 +20,7 @@ seatunnel:
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can configure it in your local environment. But do not commit it. Please revert it.

Comment on lines 77 to 82
if (assigned < 0) {
assigned = subtaskId;
context.assignSplit(subtaskId, new ClickhouseSourceSplit());
} else {
context.signalNoMoreSplits(subtaskId);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (assigned < 0) {
assigned = subtaskId;
context.assignSplit(subtaskId, new ClickhouseSourceSplit());
} else {
context.signalNoMoreSplits(subtaskId);
}
if (assigned < 0) {
assigned = subtaskId;
context.assignSplit(subtaskId, new ClickhouseSourceSplit());
context.signalNoMoreSplits(subtaskId);
} else {
context.signalNoMoreSplits(subtaskId);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write it like this?


        if (assigned < 0) {
            assigned = subtaskId;
            context.assignSplit(subtaskId, new ClickhouseSourceSplit());
        }
        context.signalNoMoreSplits(subtaskId);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write it like this?


        if (assigned < 0) {
            assigned = subtaskId;
            context.assignSplit(subtaskId, new ClickhouseSourceSplit());
        }
        context.signalNoMoreSplits(subtaskId);

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't notice this

Comment on lines 102 to 110
this.readerContext.signalNoMoreElement();
this.splits.clear();
} else if (noMoreSplit
&& splits.isEmpty()
&& Boundedness.BOUNDED.equals(readerContext.getBoundedness())) {
log.info("Closed the bounded ClickHouse source");
this.readerContext.signalNoMoreElement();
this.splits.clear();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.readerContext.signalNoMoreElement();
this.splits.clear();
} else if (noMoreSplit
&& splits.isEmpty()
&& Boundedness.BOUNDED.equals(readerContext.getBoundedness())) {
log.info("Closed the bounded ClickHouse source");
this.readerContext.signalNoMoreElement();
this.splits.clear();
}
this.splits.clear();
}
if (noMoreSplit
&& splits.isEmpty()
&& Boundedness.BOUNDED.equals(readerContext.getBoundedness())) {
log.info("Closed the bounded ClickHouse source");
this.readerContext.signalNoMoreElement();
this.splits.clear();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boundedness.BOUNDED.equals(readerContext.getBoundedness())
This can be removed I think because clickhouse only supports batch mode reading
cc @Hisoka-X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it for now. The purpose of keeping it is to unify the processing logic of most connectors, to facilitate reference for other developers and reduce the chance of bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it for now. The purpose of keeping it is to unify the processing logic of most connectors, to facilitate reference for other developers and reduce the chance of bugs.

What if the user uses processing mode to write STREAMING? It's unnecessary code and I think it's better to get rid of it Developers adding features are certainly familiar with their read patterns

Copy link
Member

@Hisoka-X Hisoka-X Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user uses processing mode to write STREAMING?

Sorry, I don get it. Could you share more details? You meaning set job.mode to STREAMING?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user uses processing mode to write STREAMING?

Sorry, I don get it. Could you share more details? You meaning set job.mode to STREAMING?


env {
  job.mode = "STREAMING"
  parallelism = 2
}
source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  Clickhouse {
    host = "127.0.0.1:8123"
    database = "default"
    sql = "select * from students"
    username = "default"
    password = ""
  }
  # If you would like to get more information about how to configure seatunnel and see full list of source plugins,
  # please go to https://seatunnel.apache.org/docs/connector-v2/source/ClickhouseSource
}


sink {
  Console {
  }
  # If you would like to get more information about how to configure seatunnel and see full list of sink plugins,
  # please go to https://seatunnel.apache.org/docs/connector-v2/sink
}

So for example in this case the user will never exit the program, essentially no matter how you configure the read mode clickhouse only supports batch mode, right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If users set to STREAMING, but source only supported BATCH mode. The job will execute as BATCH mode. It depends on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but an extra judgment is a redundant operation

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add checkpoint lock when read spilt. Please refer

@Hisoka-X Hisoka-X linked an issue Oct 28, 2024 that may be closed by this pull request
3 tasks
@Hisoka-X
Copy link
Member

cc @zhilinli123 as well.

@YOMO-Lee
Copy link
Contributor Author

cc @zhilinli123 as well.
1.Thanks for ping me, I think pom needs to be added


   <dependency>
            <groupId>org.apache.seatunnel</groupId>
            <artifactId>seatunnel-api</artifactId>
            <version>${project.version}</version>
        </dependency>

I don't quite understand where this should be added and where it needs to be used. Currently, I haven't found any places where it needs to be used

@zhilinli123
Copy link
Contributor

Waiting for the start of the current pr merger modify this code: #7529

…tiple parallelism

Adjust the bug regarding the Clickhouse connector not being able to stop properly
}
this.readerContext.signalNoMoreElement();
this.splits.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract clickhouse Close the task method

                this.readerContext.signalNoMoreElement();
                this.splits.clear();

}
if (noMoreSplit && splits.isEmpty()) {
log.info("Closed the bounded ClickHouse source");
this.readerContext.signalNoMoreElement();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -77,6 +77,9 @@ public void registerReader(int subtaskId) {
if (assigned < 0) {
assigned = subtaskId;
context.assignSplit(subtaskId, new ClickhouseSourceSplit());
context.signalNoMoreSplits(subtaskId);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


        if (assigned < 0) {
            assigned = subtaskId;
            context.assignSplit(subtaskId, new ClickhouseSourceSplit());
        }
        context.signalNoMoreSplits(subtaskId);

@zhilinli123
Copy link
Contributor

Wait for CI pass

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if ci passes. Thanks @YOMO-Lee and @zhilinli123 @corgy-w for review!

@zhilinli123
Copy link
Contributor

LGTM if ci passes. Thanks @YOMO-Lee and @zhilinli123 @corgy-w for review!

+1

@hailin0
Copy link
Member

hailin0 commented Oct 29, 2024

waiting for ci passed

@hailin0 hailin0 merged commit 8d9c6a3 into apache:dev Oct 29, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [clickhouse] CK连接器任务抽取完毕后无法正常停止
5 participants