Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new __repr__ for pyspark StructField such that the error logs explicitly show metadata differences #77

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

henrytomsf
Copy link

@henrytomsf henrytomsf commented Oct 11, 2023

Description

Helps address #76 Added a new class StructFieldPrettyPrint that will allow better representation of the StructFIeld type to show the name, data type, nullability, and the metadata. Currently pyspark's __repr__ attribute (docs) only returns:

return "StructField(%s,%s,%s)" % (self.name, self.dataType,
                                          str(self.nullable).lower())

This is not ideal when users want to compare all the attributes including metadata since it won't show up in the error message.

The new __repr__ in the StructFieldPrettyPrint will override the pyspark StructField's __repr__ method with something more explicit:

return "StructField(%s, %s, %s, %s)" % (
            f"'{self.structfield.name}'",
            self.structfield.dataType,
            str(self.structfield.nullable).lower(),
            str(self.structfield.metadata)
        ) 

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this been tested?

  • Passes existing testing suite (pytest tests/)

…clearly show the name, type, nullability, and metadata.
@henrytomsf
Copy link
Author

henrytomsf commented Oct 11, 2023

I'm not sure how you want to handle the image assets that need to be changed for the documentation in the README as I assume there's some styling that we should adhere to so I left that update out.

but I see this now:

E           chispa.schema_comparer.SchemasNotEqualError: 
E           +-------------------------------------------+----------------------------------------------------------------------------+
E           |                  schema1                  |                                  schema2                                   |
E           +-------------------------------------------+----------------------------------------------------------------------------+
E           |    StructField('test_age', LongType(), True, {})    |                    StructField('test_age', LongType(), True, {})                     |
E           | StructField('test_name', StringType(), true, {}) | StructField('test_name', StringType(), true, {'description': 'test description'}) |
E           +-------------------------------------------+----------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant