Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Remove kafka dependency from spark_expectations #39

Closed
amaldevknike opened this issue Sep 26, 2023 · 1 comment
Closed

[BUG] Remove kafka dependency from spark_expectations #39

amaldevknike opened this issue Sep 26, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@amaldevknike
Copy link

amaldevknike commented Sep 26, 2023

Describe the bug
Getting timeout error from Kafka while executing the spark_expectations library.

To Reproduce
I was trying to execute the spark_expectations library using the sample rules provided in git hub. But I haven’t configured a kafka topic as this is not a specific requirement for us. But it looks like the library is configured in such a way that it doesn’t allow to disable to kafka section. It always validates the response form kafka stats and hence its becoming a blocker for me to try out the spark_expectations library in my local environment
Steps to reproduce the behavior:

  1. Install spark_expectations library
  2. Set up Metadata tables (DQ rules table and stats table)
  3. Assign the rules
  4. Set up alerts(create a config file as dq_spark_expectations_config.ini)
  5. Run the spark expectations.

Expected behavior
After executing the library, stats should be written to table regardless of if the kafka topic is configured or not. I am planning to use this library extensively in my project as because of its inflight capability and wider stats. But I expect to have more flexibility to disable Kafka section.

[Screenshots]
[Timeout_kafka)

Desktop (please complete the following information):
Tried in my azure environment.

  • OS: Windows
  • Browser :-Microsoft edge
  • Version :- 117.0.2045.31

Additional context
This issue can be resolved by removing the kafka dependency from spark_expectations library so that it doesn’t always expect a response from kafka stats. Currently this library is limited to specific environment where kafka topic is configured. The library can be further enhanced by providing an option to disable kafka section if that is not in scope.

@amaldevknike amaldevknike added the bug Something isn't working label Sep 26, 2023
@amaldevknike amaldevknike changed the title [BUG] Please add your bug title here [BUG] Remove kafka dependency from spark_expectations Sep 26, 2023
@asingamaneni
Copy link
Collaborator

Now it is optional and you can disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants