Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to initialize SparkIntegration #3161

Closed
seyoon-lim opened this issue Jun 12, 2024 · 1 comment
Closed

Failed to initialize SparkIntegration #3161

seyoon-lim opened this issue Jun 12, 2024 · 1 comment
Labels

Comments

@seyoon-lim
Copy link
Contributor

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.5.1

Steps to Reproduce

Hello,

I've encountered an issue when using SparkIntegration with my PySpark application. I was following the guide specified at Spark Driver Integration Documentation and experienced the following AttributeError:

sc._jsc.sc().addSparkListener(listener)
E   AttributeError: 'SparkContext' object has no attribute '_jsc'

Upon investigating, it seems that the issue may stem from the code at sentry-python/spark_driver.py#L50. The sc._jsc attribute is set after the SparkContext is initialized, as seen in apache/spark/pyspark/context.py#L296.

Consequently, _start_sentry_listener and _set_app_properties referenced at spark_driver.py#L62-L63 should ideally be invoked after spark_context_init is executed.

I have tested this modification using both local and yarn Spark masters, and fixed version in my repo appears to be functioning correctly.

this is my test code

def test_initialize_spark_integration(sentry_init):
    # fail with the code: https://github.com/getsentry/sentry-python/blob/2.5.1/sentry_sdk/integrations/spark/spark_driver.py#L53
    # success with the code: https://github.com/seyoon-lim/sentry-python/blob/fix-spark-driver-integration/sentry_sdk/integrations/spark/spark_driver.py#L53
    sentry_init(integrations=[SparkIntegration()])
    SparkContext.getOrCreate()

Looking forward to your feedback and suggestions for addressing this issue.

Thank you!

Expected Result

from pyspark.sql import SparkSession
import sentry_sdk
from sentry_sdk.integrations.spark import SparkIntegration


if __name__ == "__main__":
    sentry_sdk.init(
        dsn=matrix_dsn,
        integrations=[SparkIntegration()],
    )

    spark = SparkSession.builder.getOrCreate()
    ...

Actual Result

Traceback (most recent call last):
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/entrypoint.py", line 17, in <module>
    spark = SparkSession.builder.getOrCreate()
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 514, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/pyspark/context.py", line 201, in __init__
    self._do_init(
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/utils.py", line 1710, in runner
    return sentry_patched_function(*args, **kwargs)
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 69, in _sentry_patched_spark_context_init
    _start_sentry_listener(self)
  File "/Users/kakao/Desktop/shaun/workplace/my-repos/du-batch/venv/lib/python3.9/site-packages/sentry_sdk/integrations/spark/spark_driver.py", line 55, in _start_sentry_listener
    sc._jsc.sc().addSparkListener(listener)
AttributeError: 'SparkContext' object has no attribute '_jsc'
@sentrivana
Copy link
Contributor

Thanks for all the research you put into this @seyoon-lim and for the PR! We will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

2 participants