Refactor endianness, byte swaps, and indexers to be more standardized… · TGSAI/segy@8532d91

Commit

Refactor endianness, byte swaps, and indexers to be more standardized. (

#57)

* Add validate_assignment to model_config

The model_config object within the base.py file was updated. A new parameter, validate_assignment, was added to the configuration. This feature allows for checks and validation of the parameters being assigned to the model automatically.

* Update variable name in config.py

Renamed 'endian' variable to 'endianness' in src/segy/config.py. This change improves code readability and better reflects what the variable stands for in the context of the file parsing settings.

* Refactor endianness handling in data types

The main change in this commit is the restructuring of how endianness handling is done in data type definitions. The responsibility for setting the endianness is moved out from the ScalarType and DataTypeDescriptor and into the StructuredFieldDescriptor. This change simplifies the datatype handling in ScalarType and DataTypeDescriptor, and delegates the endianness handling to a higher level in the data structure hierarchy.

* Add function to extract detailed dtype information

The new function `_extract_nested_dtype` is introduced. Besides this, significant modifications were made to the existing `_modify_dtype_field` function. These changes allow for more detailed dtype extraction and improved handling of field size changes, particularly when dealing with nested dtypes.

* Add documentation to _extract_nested_dtype function

A docstring has been added to the _extract_nested_dtype function in transforms.py file within the segy module. This update enhances code readability by explaining what the function is intended to do.

* Add HeaderArrayTransform to transforms.py

The HeaderArrayTransform class has been added to src/segy/transforms.py, which casts a struct array to a header array. Also, this class was incorporated into TransformFactory, providing an additional transformation strategy to the factory's methods.

* Move test_transforms.py to tests directory

The test_transforms.py file was previously located in the tests/main directory. To ensure consistency and ease of access, it has been relocated to the main tests directory.

* Refactor endianness handling in trace schema

The changes involve removing the endianness field from the Trace Data descriptor class and adding it to the Trace descriptor class. This allows endianness handling at the level of the entire trace, including headers and data, instead of just at the data samples level. The dtype property computations are also adjusted accordingly to consider the new endianness placement.

* Update endianness handling in SEG-Y schema

The endianness setting has been moved from individual field descriptors to the top level of schema models. This change simplifies the setting of endianness and improves the auto-update mechanism for associated submodels. This is achieved by adding a model validator in the Trace schema.

* Add endianness field to SegyDescriptor class

The 'endianness' field has been added to the SegyDescriptor class in the SEGY schema. This field represents the endianness of a SEG-Y file and adds more detail to the descriptor. The endianness can either be 'big' or 'little' and defaults to 'None' if not specified.

* Add model_validator to update endianness in submodels

A `pydantic` model_validator has been added to the Segy schema. This is to ensure that the endianness of the binary file header and the trace are updated whenever the endianness of the main model changes. This way it helps to keep the data consistent across different submodels.

* Add model validator to synchronise endianness

A new model validator, `update_submodel_endianness`, has been added to the `TraceDescriptor` class in `src/segy/schema/trace.py`. The function ensures that the endianness of the header and extended_header descriptors matches the trace endianness.

* Fix endianness property reference in factory.py

The attribute 'endianness' was incorrectly accessed from 'data_descriptor' under 'trace_descriptor'. The commit corrects this mistake by directly accessing 'endianness' from 'trace_descriptor'. This eliminates a potential Source of error and improves the readability of the code.

* Refactor code to improve byte swapping and transforms

The code was refactored by better organizing the byte swapping and transform applications between trace data and headers. The updates improve the manipulation of SEGY file data, including doing away with the previous methods for byte swapping and transforms. This way, the byte swapping and transform application processes are more standardized and easily applied to trace data, headers, and other data types.

* Refactor to new endianness handling in tests

The primary change in these diffs revolves around reworking how endianness is handled within various tests. Updates were made to directly import and utilize the Endianness enum from the segy.schema module. Additionally, unused import statements were removed and the order of component copying was adjusted for increased efficiency.

* Update data type descriptors in docstrings

The commit revises examples in docstrings of DataTypeDescriptor and StructuredFieldDescriptor classes. Big endian formats are replaced with standard formats to improve clarity. The new examples primarily use 'float32', 'uint16', and similar standard data types instead of endian-specific ones.

* Remove HeaderArrayTransform class

The HeaderArrayTransform class was removed from the 'transforms.py' file as it was unused. The reference to this class was also removed from the TransformFactory class instantiation, and the import statement for the HeaderNDArray has been deleted.

* Rename 'BaseNDArray' to 'SegyArray'

The 'BaseNDArray' class in the 'arrays.py' file has been renamed to 'SegyArray'. Likewise, the 'HeaderNDArray' class is now called 'HeaderArray'. The change is to make the class names more contextually specific to their use within the SEGY module.

* Add TraceArray class to arrays.py

A new class named TraceArray has been added to the arrays module. This class inherits from the SegyArray class and provides convenient mechanisms to access the headers and data of traces.

* Update return types in decoding methods

Updated the return types of decoding methods in the 'indexing.py' module. Instead of returning np.array, the 'decode' methods in 'AbstractIndexer', 'HeaderIndexer' and 'DataIndexer' classes now return 'TraceArray' and 'HeaderArray' data types. This improves the data handling and readability by using more explicit return types.

* Update return types in decoding methods

Updated the return types of decoding methods in the 'indexing.py' module. Instead of returning np.array, the 'decode' methods in 'AbstractIndexer', 'HeaderIndexer' and 'DataIndexer' classes now return 'TraceArray' and 'HeaderArray' data types. This improves the data handling and readability by using more explicit return types.

* Refactor tests to directly access attributes

The tests have been refactored to access `header` and `data` attributes directly, instead of using bracket notation. This provides a more straightforward approach and increases readability of the code. Also, the assertion calls have been switched to use the `assert_array_equal` function, enhancing the clarity and precision of the tests.

* Update binary_header return type in segy file

The return type of the binary_header function in the segy file has been changed. Previously it was returning an NDArray, but now it has been updated to return a HeaderArray.

* Refactor code to rename 'TraceDataDescriptor' to 'TraceSampleDescriptor'

The commits include the renaming of 'TraceDataDescriptor' to 'TraceSampleDescriptor' across multiple test files, classes, and functions. This change is for clarity, as the descriptor deals specifically with trace samples rather than generic data. All associated variable and function names, comments and documentation strings are updated accordingly to reflect this change.

* Refactor dtype extraction and update imports in segy module

Refactoring of the _extract_nested_dtype method in the transforms.py module has been performed to improve its usability and readability, leveraging the DTypeLike from numpy for return types. Additionally, updated references and imports in schema/segy.py and schema/data_type.py to better reflect changes in the segy module. Minor changes also implemented to enhance type checking and exception handling.

* Refactor type imports and adjust function parameters in test_header.py

Moved the numpy.typing imports within the TYPE_CHECKING conditional to improve readability and potential execution efficiency. Simplified get_dt_info function by removing unnecessary parameters and adjusting how the dtype parameter is passed. The dtype information extraction has also been streamlined for clearer comprehension.

* Update creation tutorial

It primarily corrects function calls and data access patterns to match updated library structure.

* Update quickstart tutorial

It primarily corrects function calls and data access patterns to match updated library structure.

* Remove unused import and function in test_data_type.py

The "string" module was imported but not used within the code, thus it has been removed. The function "_compare_json_strings" was also deleted as it was not being utilized. This helps to streamline the code and make it less cluttered.

* Remove trace descriptor fixture from conftest.py

The TraceDescriptor import statement and fixture have been removed from the conftest.py file. The changes involved the removal of both the import statement for TraceDescriptor and the corresponding fixture that was used in the testing suite. This simplifies the code while maintaining its functionality.

---------

Co-authored-by: Altay Sansal <altay.sansal@tgs.com>

Loading branch information

tasansal and Altay Sansal authored Mar 27, 2024

1 parent 3d887c3 commit 8532d91

docs/tutorials/creation.ipynb

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -103,12 +103,12 @@
  
      },

      {

       "cell_type": "code",

       "execution_count": 4,

       "execution_count": 5,

       "id": "62c68de0-13da-4d9f-8637-0e26e578613d",

       "metadata": {},

       "outputs": [],

       "source": [

        "traces = factory.create_traces(data=samples, headers=headers)"

        "traces = factory.create_traces(samples=samples, headers=headers)"

       ]

      },

      {

    @@ -123,7 +123,7 @@
  
      },

      {

       "cell_type": "code",

       "execution_count": 5,

       "execution_count": 6,

       "id": "34d9906c-7449-4b42-8605-7174857c4093",

       "metadata": {},

       "outputs": [],

    @@ -151,7 +151,7 @@
  
      },

      {

       "cell_type": "code",

       "execution_count": 6,

       "execution_count": 7,

       "id": "3262356a-56a8-46fb-be0d-7feb5928a8fe",

       "metadata": {},

       "outputs": [

    @@ -211,7 +211,7 @@
  
      },

      {

       "cell_type": "code",

       "execution_count": 7,

       "execution_count": 8,

       "id": "fd20f55c-fd27-49d3-987f-09bb29f4d07d",

       "metadata": {},

       "outputs": [

    @@ -311,7 +311,7 @@
  
           "[1 rows x 31 columns]"

          ]

         },

         "execution_count": 7,

         "execution_count": 8,

         "metadata": {},

         "output_type": "execute_result"

        }

    @@ -322,7 +322,7 @@
  
      },

      {

       "cell_type": "code",

       "execution_count": 8,

       "execution_count": 11,

       "id": "6f70317d-34b7-421c-a6e0-11e048b1aca2",

       "metadata": {},

       "outputs": [

    @@ -338,18 +338,18 @@
  
           "       [ 14.,  15.,  16., ..., 112., 113., 114.]], dtype=float32)"

          ]

         },

         "execution_count": 8,

         "execution_count": 11,

         "metadata": {},

         "output_type": "execute_result"

        }

       ],

       "source": [

        "file.data[:]"

        "file.sample[:]"

       ]

      },

      {

       "cell_type": "code",

       "execution_count": 9,

       "execution_count": 12,

       "id": "ecc7d80e-806c-4103-bef3-b7019d17a839",

       "metadata": {},

       "outputs": [

    @@ -525,7 +525,7 @@
  
           "14              15          1000         10700         10           114"

          ]

         },

         "execution_count": 9,

         "execution_count": 12,

         "metadata": {},

         "output_type": "execute_result"

        }

0 comments on commit `8532d91`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8532d91`

Commit

There are no files selected for viewing

0 comments on commit 8532d91

0 comments on commit `8532d91`