- DryRun enhacement for topics creation/update #221

schnaker85 · 2021-03-03T15:52:47Z

Implements #220 and fixes #217.

I touched some places where I did replace logger output and added documentation where it was missing (and helpful).
Additionally, on windows :-( the tests were not running as already pointed out at #217 which I changed as well due to testability on my local.

The change is backwards compatible and the default value is making sure that the new feature #220 will not be enabled. The only noticeable change is the output of the dryrun which is more detailed than before.

I did not update the changelog.md as I have the feeling it is outdated and not used currently or only updated when something get's released.

I was really having a hardtime to keep on with the formatting as there was no guideline or .editorconfig file for the rules. I hope I did not change the style to much.

- Topic update option without overriding everything (incremental update) - FileBackend uses FileWriter instead of RandomAccessFile due to a bug on JDK for windows

purbon · 2021-03-03T16:55:48Z

Nice, thanks a lot for your contribution, I will take a look at it as soon as possible.

purbon

If I follow this code properly, your core proposal here is to only update the config that is really necessary, right? This is a fair proposal, however, I would like to share a few thoughts on this here:

I see a benefit on error handling, so basically improving the dry run operation to show what might change or not.
In Kafka, there is no special cost of overwritten config values to topics, it is a fairly cheap operation.
The change you propose, should be transparent to users, they should not need to configure anything.

What do you think?

src/main/java/com/purbon/kafka/topology/Configuration.java

src/main/java/com/purbon/kafka/topology/ExecutionPlan.java

src/main/java/com/purbon/kafka/topology/actions/topics/SyncTopicAction.java

src/main/java/com/purbon/kafka/topology/api/adminclient/TopologyBuilderAdminClient.java

purbon · 2021-03-04T12:09:28Z

@schnaker85 can you please separate on another PR only the windows fix? that would be very nice of you and help others very much. I mean it so we don't have to wait to finish talk about this PR to get the windows fix 😄

purbon · 2021-03-04T12:10:15Z

I was really having a hardtime to keep on with the formatting as there was no guideline or .editorconfig file for the rules. I hope I did not change the style to much.

feel free to contribute one? I actually work mostly following the guide by google and use maven check to remember the formatting ;-)

schnaker85 · 2021-03-04T12:48:45Z

If I follow this code properly, your core proposal here is to only update the config that is really necessary, right? This is a fair proposal, however, I would like to share a few thoughts on this here:

I see a benefit on error handling, so basically improving the dry run operation to show what might change or not.

In Kafka, there is no special cost of overwritten config values to topics, it is a fairly cheap operation.

The change you propose, should be transparent to users, they should not need to configure anything.

What do you think?

I can kind of agree with all your points :-) for us the dryrun output is most important about it :) 👍 and if it changes the behaviour withouth beeing able to keep on with the previous behaviour, then we can also remove the configuration possiblity.

- using config object as parameter for the dryrun instead of hardcoded value - ExectionPlans dryruns check for empty string removed

- added javadoc for TopologyBuilderAdminClient#updateTopicConfig - aligned documentation for dryRun output for topics

purbon · 2021-03-07T10:36:32Z

Hi @schnaker85, I hope this message find you good. I have been thinking a bit about an idea I would love to pass by you here.

Currently, the DryRun is build-out of the output of actions, so in case an action generates an update, as you do here in your PR, you need to return this data from the client to the action and then print it out somehow there.

I was thinking about these topics:

Having a metadata component in the execution plan. This metadata would be the responsibility of having "the view".
Actions, when run in DryRun mode, would then be executed and operate only on this metadata view, to be able to show the diff. Ala what terraform is doing somehow.

However, I am having one interesting question:

Does the dry-run mode should need a connection to the cluster?

From one side, topologies have "the view" and if we include these metadata holders, we can enrich it with all defaults/other configs.

From the other side, people could have changed the config in the cluster using another methodology, the same thing could happen for terraform.

So I guess, I am trying to think about a generic solution for all the actions.

A dry-run process would be:

(Optional, as require connection) update cluster metadata
apply actions to the metadata.
output the result as text.

Does this makes sense to you?

schnaker85 · 2021-03-09T07:23:53Z

Hi @schnaker85, I hope this message find you good. I have been thinking a bit about an idea I would love to pass by you here.

Currently, the DryRun is build-out of the output of actions, so in case an action generates an update, as you do here in your PR, you need to return this data from the client to the action and then print it out somehow there.

That's is true and is and was the current situation how julie is handling the dryrun. Before my PR it was even kind of detached as the source of the dryrun was not running the "same code parts" as the not dryrun. (not really a dryrun) I changed it to the run the same methods and just not execution the final action to change.

I was thinking about these topics:

Having a metadata component in the execution plan. This metadata would be the responsibility of having "the view".

Actions, when run in DryRun mode, would then be executed and operate only on this metadata view, to be able to show the diff. Ala what terraform is doing somehow.

However, I am having one interesting question:

Does the dry-run mode should need a connection to the cluster?

Well, I would say it depends on the cofiguration of julie, you already have a configuration flag topology.state.topics.cluster.enabled to control this.

From one side, topologies have "the view" and if we include these metadata holders, we can enrich it with all defaults/other configs.

From the other side, people could have changed the config in the cluster using another methodology, the same thing could happen for terraform.

So I guess, I am trying to think about a generic solution for all the actions.

A dry-run process would be:

(Optional, as require connection) update cluster metadata

apply actions to the metadata.

output the result as text.

Does this makes sense to you?

That makes sense to me and sounds like a proper solution for the new "workflow", I know how terraform works and I like their declarative approach. I think this would give the user a better view of what will change but should also be implemented for the other resources like ACLs and so on.

From my side I won't be able to contribute to the implementation of the new workflow much but this could be something for a future release, do you know what I mean?

purbon · 2021-03-13T13:09:34Z

Hi @schnaker85, thanks a lot for your contribution here, I appreciate it very much, sorry I could not act on it earlier, it's been complicated weeks lately.

I am wondering something, your thoughts will be of very much helpful for me, I try to summarize it here.

You change request (PR) tackles an important improvement, knowing the diff of changes using the dry-run mode.
The proposed solution is tied to the topic and its configuration.

During the discussion/chat here we notice:

The design could be made more generic and tackle as well other components such as schemas and others by using the idea of metadata and calculate the diff there.

As you said the cycles you have does not allow you to contribute the general solution, which is very much fair. Does this summarize the thread here?

So here comes my questions.

Does your usage depends on me merging this change here upstream? If yes, can you please elaborate?

I am thinking to spare some time to build the generic solution as it would be awesome to track in the diff changes in topics, but as well ACLs, schemas or any upcoming module.

If I take the general solution approach, would you like to be a reviewer?

Thanks a lot for your time and thoughts @schnaker85

schnaker85 · 2021-03-14T18:21:49Z

During the discussion/chat here we notice:

The design could be made more generic and tackle as well other components such as schemas and others by using the idea of metadata and calculate the diff there.

As you said the cycles you have does not allow you to contribute the general solution, which is very much fair. Does this summarize the thread here?

Yes, that summarizes it.

So here comes my questions.

Does your usage depends on me merging this change here upstream? If yes, can you please elaborate?

Well, not anymore. We did create our own fork/ and image build to use as long as we have nothing similar in your build (which is fine for now for us).

I am thinking to spare some time to build the generic solution as it would be awesome to track in the diff changes in topics, but as well ACLs, schemas or any upcoming module.

If I take the general solution approach, would you like to be a reviewer?
Yes, I think so, as we also want to go back to your build and not always have to manually merge changes in hours and keep a separate docker build.

purbon · 2021-03-15T14:21:23Z

Ok. thank you very much for all your contribution and help. I understand. So in such a case, I will try to sketch a general solution. If you don't mind, I will close this PR for now. Thanks a lot for your time.

If you don't mind, I will tag you as a reviewer for the general solution. Your thoughts will very much appreciate it.

- DryRun enhacement for topics creation/update

8eb115e

- Topic update option without overriding everything (incremental update) - FileBackend uses FileWriter instead of RandomAccessFile due to a bug on JDK for windows

purbon self-requested a review March 3, 2021 16:55

purbon requested changes Mar 4, 2021

View reviewed changes

Dani Regli and others added 3 commits March 4, 2021 15:32

- changed logger from warn to info for the syncTopic action

4b1304a

- using config object as parameter for the dryrun instead of hardcoded value - ExectionPlans dryruns check for empty string removed

fixed failing tests due to change in ExecutionPlan

cf7cd96

- removed configuration option for sync.overwrite

e5539ca

- added javadoc for TopologyBuilderAdminClient#updateTopicConfig - aligned documentation for dryRun output for topics

schnaker85 requested a review from purbon March 5, 2021 08:01

purbon added the 2.x label Mar 6, 2021

purbon closed this Mar 15, 2021

akselh mentioned this pull request Jun 16, 2021

Improve output for topic changes on dryRun and apply #295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

- DryRun enhacement for topics creation/update #221

- DryRun enhacement for topics creation/update #221

schnaker85 commented Mar 3, 2021

purbon commented Mar 3, 2021

purbon left a comment

purbon commented Mar 4, 2021

purbon commented Mar 4, 2021

schnaker85 commented Mar 4, 2021

purbon commented Mar 7, 2021

schnaker85 commented Mar 9, 2021

purbon commented Mar 13, 2021

schnaker85 commented Mar 14, 2021

purbon commented Mar 15, 2021

- DryRun enhacement for topics creation/update #221

- DryRun enhacement for topics creation/update #221

Conversation

schnaker85 commented Mar 3, 2021

purbon commented Mar 3, 2021

purbon left a comment

Choose a reason for hiding this comment

purbon commented Mar 4, 2021

purbon commented Mar 4, 2021

schnaker85 commented Mar 4, 2021

purbon commented Mar 7, 2021

schnaker85 commented Mar 9, 2021

purbon commented Mar 13, 2021

schnaker85 commented Mar 14, 2021

purbon commented Mar 15, 2021