Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- DryRun enhacement for topics creation/update #221

Closed

Conversation

schnaker85
Copy link

Implements #220 and fixes #217.

I touched some places where I did replace logger output and added documentation where it was missing (and helpful).
Additionally, on windows :-( the tests were not running as already pointed out at #217 which I changed as well due to testability on my local.

The change is backwards compatible and the default value is making sure that the new feature #220 will not be enabled. The only noticeable change is the output of the dryrun which is more detailed than before.

I did not update the changelog.md as I have the feeling it is outdated and not used currently or only updated when something get's released.

I was really having a hardtime to keep on with the formatting as there was no guideline or .editorconfig file for the rules. I hope I did not change the style to much.

- Topic update option without overriding everything (incremental update)
- FileBackend uses FileWriter instead of RandomAccessFile due to a bug on JDK for windows
@purbon purbon self-requested a review March 3, 2021 16:55
@purbon
Copy link
Collaborator

purbon commented Mar 3, 2021

Nice, thanks a lot for your contribution, I will take a look at it as soon as possible.

Copy link
Collaborator

@purbon purbon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow this code properly, your core proposal here is to only update the config that is really necessary, right? This is a fair proposal, however, I would like to share a few thoughts on this here:

  • I see a benefit on error handling, so basically improving the dry run operation to show what might change or not.
  • In Kafka, there is no special cost of overwritten config values to topics, it is a fairly cheap operation.
  • The change you propose, should be transparent to users, they should not need to configure anything.

What do you think?

@purbon
Copy link
Collaborator

purbon commented Mar 4, 2021

@schnaker85 can you please separate on another PR only the windows fix? that would be very nice of you and help others very much. I mean it so we don't have to wait to finish talk about this PR to get the windows fix 😄

@purbon
Copy link
Collaborator

purbon commented Mar 4, 2021

I was really having a hardtime to keep on with the formatting as there was no guideline or .editorconfig file for the rules. I hope I did not change the style to much.

feel free to contribute one? I actually work mostly following the guide by google and use maven check to remember the formatting ;-)

@schnaker85
Copy link
Author

If I follow this code properly, your core proposal here is to only update the config that is really necessary, right? This is a fair proposal, however, I would like to share a few thoughts on this here:

  • I see a benefit on error handling, so basically improving the dry run operation to show what might change or not.
  • In Kafka, there is no special cost of overwritten config values to topics, it is a fairly cheap operation.
  • The change you propose, should be transparent to users, they should not need to configure anything.

What do you think?

I can kind of agree with all your points :-) for us the dryrun output is most important about it :) 👍 and if it changes the behaviour withouth beeing able to keep on with the previous behaviour, then we can also remove the configuration possiblity.

Dani Regli and others added 3 commits March 4, 2021 15:32
- using config object as parameter for the dryrun instead of hardcoded value
- ExectionPlans dryruns check for empty string removed
- added javadoc for TopologyBuilderAdminClient#updateTopicConfig
- aligned documentation for dryRun output for topics
@schnaker85 schnaker85 requested a review from purbon March 5, 2021 08:01
@purbon purbon added the 2.x label Mar 6, 2021
@purbon
Copy link
Collaborator

purbon commented Mar 7, 2021

Hi @schnaker85, I hope this message find you good. I have been thinking a bit about an idea I would love to pass by you here.

Currently, the DryRun is build-out of the output of actions, so in case an action generates an update, as you do here in your PR, you need to return this data from the client to the action and then print it out somehow there.

I was thinking about these topics:

  • Having a metadata component in the execution plan. This metadata would be the responsibility of having "the view".
  • Actions, when run in DryRun mode, would then be executed and operate only on this metadata view, to be able to show the diff. Ala what terraform is doing somehow.

However, I am having one interesting question:

  • Does the dry-run mode should need a connection to the cluster?

From one side, topologies have "the view" and if we include these metadata holders, we can enrich it with all defaults/other configs.

From the other side, people could have changed the config in the cluster using another methodology, the same thing could happen for terraform.

So I guess, I am trying to think about a generic solution for all the actions.

A dry-run process would be:

  • (Optional, as require connection) update cluster metadata
  • apply actions to the metadata.
  • output the result as text.

Does this makes sense to you?

@schnaker85
Copy link
Author

Hi @schnaker85, I hope this message find you good. I have been thinking a bit about an idea I would love to pass by you here.

Currently, the DryRun is build-out of the output of actions, so in case an action generates an update, as you do here in your PR, you need to return this data from the client to the action and then print it out somehow there.

That's is true and is and was the current situation how julie is handling the dryrun. Before my PR it was even kind of detached as the source of the dryrun was not running the "same code parts" as the not dryrun. (not really a dryrun) I changed it to the run the same methods and just not execution the final action to change.

I was thinking about these topics:

  • Having a metadata component in the execution plan. This metadata would be the responsibility of having "the view".
  • Actions, when run in DryRun mode, would then be executed and operate only on this metadata view, to be able to show the diff. Ala what terraform is doing somehow.

However, I am having one interesting question:

  • Does the dry-run mode should need a connection to the cluster?

Well, I would say it depends on the cofiguration of julie, you already have a configuration flag topology.state.topics.cluster.enabled to control this.

From one side, topologies have "the view" and if we include these metadata holders, we can enrich it with all defaults/other configs.

From the other side, people could have changed the config in the cluster using another methodology, the same thing could happen for terraform.

So I guess, I am trying to think about a generic solution for all the actions.

A dry-run process would be:

  • (Optional, as require connection) update cluster metadata
  • apply actions to the metadata.
  • output the result as text.

Does this makes sense to you?

That makes sense to me and sounds like a proper solution for the new "workflow", I know how terraform works and I like their declarative approach. I think this would give the user a better view of what will change but should also be implemented for the other resources like ACLs and so on.

From my side I won't be able to contribute to the implementation of the new workflow much but this could be something for a future release, do you know what I mean?

@purbon
Copy link
Collaborator

purbon commented Mar 13, 2021

Hi @schnaker85, thanks a lot for your contribution here, I appreciate it very much, sorry I could not act on it earlier, it's been complicated weeks lately.

I am wondering something, your thoughts will be of very much helpful for me, I try to summarize it here.

  • You change request (PR) tackles an important improvement, knowing the diff of changes using the dry-run mode.
  • The proposed solution is tied to the topic and its configuration.

During the discussion/chat here we notice:

  • The design could be made more generic and tackle as well other components such as schemas and others by using the idea of metadata and calculate the diff there.

As you said the cycles you have does not allow you to contribute the general solution, which is very much fair. Does this summarize the thread here?

So here comes my questions.

  • Does your usage depends on me merging this change here upstream? If yes, can you please elaborate?

I am thinking to spare some time to build the generic solution as it would be awesome to track in the diff changes in topics, but as well ACLs, schemas or any upcoming module.

If I take the general solution approach, would you like to be a reviewer?

Thanks a lot for your time and thoughts @schnaker85

@schnaker85
Copy link
Author

During the discussion/chat here we notice:

  • The design could be made more generic and tackle as well other components such as schemas and others by using the idea of metadata and calculate the diff there.

As you said the cycles you have does not allow you to contribute the general solution, which is very much fair. Does this summarize the thread here?

Yes, that summarizes it.

So here comes my questions.

  • Does your usage depends on me merging this change here upstream? If yes, can you please elaborate?

Well, not anymore. We did create our own fork/ and image build to use as long as we have nothing similar in your build (which is fine for now for us).

I am thinking to spare some time to build the generic solution as it would be awesome to track in the diff changes in topics, but as well ACLs, schemas or any upcoming module.

If I take the general solution approach, would you like to be a reviewer?
Yes, I think so, as we also want to go back to your build and not always have to manually merge changes in hours and keep a separate docker build.

@purbon
Copy link
Collaborator

purbon commented Mar 15, 2021

Ok. thank you very much for all your contribution and help. I understand. So in such a case, I will try to sketch a general solution. If you don't mind, I will close this PR for now. Thanks a lot for your time.

If you don't mind, I will tag you as a reviewer for the general solution. Your thoughts will very much appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.cluster-state file cannot not be deleted when runnnig the tests on a Windows machine
2 participants