Refactor Tests (#63)

* Change return type from bytearray to bytes The return type of several functions in segy/indexing.py has been changed from bytearray to bytes. This includes 'merge', 'decode' methods in abstract classes and derived classes. These changes are intended to bring more accurate typing and therefore avoid potential issues down the line. * Refactor indexing tests The tests related to indexing in the main folder have been refactored and moved to the main test folder for better organization. These tests cover edge cases for bounds checking and array merging methods, along with tests for handling binary data and fake filesystems. * Move test_transforms.py to main directory The test_transforms.py file has been renamed and moved to the main directory in the tests folder. This is a part of the larger restructuring effort to make the tests directory more organized and manageable. * Move test_ibm_ieee.py and expand docstring The test_ibm_ieee.py file was moved from the main directory to the tests directory. Along with this change, the file's docstring was expanded to include information about IBM and IEEE floating point conversions and references for test values. * Expand docstring in test_indexing.py * Rename and move test_registry module The file tests/standards/test_registry.py has been renamed to tests/test_standards_registry.py. This makes the naming scheme more consistent across the test suite. * Rename base_spec to standard_spec in tests * Rename test_factory.py to test_segy_factory.py * Remove additional_trace_headers field from rev1.py The additional_trace_headers field has been deleted from the StructuredFieldDescriptor in the rev1.py file. * Add minimal SEG-Y standard specification A new specification file for the minimal SEG-Y standard has been introduced. It includes only mandatory fields. It is now a part of the implemented SEG-Y standards and is automatically registered upon import. This allows easier standard choice for users and simplifies future standards expansion. * Update pytest fixture scope in conftest.py The pytest fixture's scope in conftest.py has been updated from function level to session level. This change means that the fixture will only be invoked once per session, rather than before each individual test, potentially improving test performance. * Update transformations based on endianness and format Conditional transformations are now implemented for both endianness and scalar type. The TransformFactory's creation methods are invoked based on the target endianness and the target format. This helps to selectively apply the necessary transformation, further improving data processing efficiency. * Add SEG-Y format mapping and related properties The update introduces a mapping dictionary for SEG-Y formats in the SegyFactory class. Added properties for 'trace_sample_format' and 'segy_revision'. Also, adjusted the binary file header generation to use these new properties and the format map for 'data_sample_format'. * Rename 'create_trace_data_template' to 'create_trace_sample_template' The function 'create_trace_data_template' are renamed to 'create_trace_sample_template' in both tutorials and source code to ensure consistency and clear communication about its purpose. The tutorial and the factory source code now use the more accurate 'sample_template' instead of 'data_template'. * Add dimension to sample_template when num_samp is 1 This commit addresses the rare case where the number of samples (num_samp) is 1. In such cases, the sample template's dimension is now increased by one, ensuring it always conforms to expected array shape norms. * Refactor code to initialize arrays with zeros The code in 'src/segy/factory.py' was refactored to initialize arrays directly with zeros, instead of first creating an empty array and then filling it with zeros. The refactoring simplifies the logic and execution, as well as increases code readability. The `fill` parameter, which was previously used to determine whether to fill the array with zeros, has been removed as it is no longer necessary. * Update trace sample type logic in template * Update unit tests for SegyFactory class The test suite for SegyFactory class has been significantly enhanced. A new fixture has been introduced to generate test cases for SegyFactory. The tests now cover various encodings, trace serialization, trace headers, and binary file headers. Several test scenarios have been added as well to check for mismatches in headers and traces, and incorrect dimensions. * Update unit tests for SegyFactory class The test suite for SegyFactory class has been significantly enhanced. A new fixture has been introduced to generate test cases for SegyFactory. The tests now cover various encodings, trace serialization, trace headers, and binary file headers. Several test scenarios have been added as well to check for mismatches in headers and traces, and incorrect dimensions. * Handle edge cases for single traces and samples in SEGY factory In the SEGY factory, the code has been updated to properly handle edge cases where there are single traces or single samples per trace. The previous version did not account for these scenarios, leading to potential issues in the processing pipeline. Now, these edge cases are explicitly checked for, with samples being appropriately squeezed when only a single trace or sample is present. * Simplify dtype calculation in sample template creation The process for calculating dtype and creating samples has been streamlined in the segy factory. The check for scalar type "IBM32" has been simplified by directly assigning the value "float32". Additionally, sample array initialization has been directly integrated within the return statement, replacing the previously verbose logic. * Refactor trace serialization test in SEGY factory Refactoring was done to trace serialization test in the SEGY factory test module. The update included clear segregation and comments for data generation, byte creation, and the assertion parts. It also includes handling different float types and byte order scenarios, making the test more comprehensive and easier to understand. * Update num_traces parameter in pytest mark This is to avoid conflicts with #traces * clarify field variable names * Refactor dtype property per deprecation warning. * simplify sample shape-dtype handling with new numpy dtype spec * Move test_transforms.py to tests directory This update rearranges the test files to enhance readability and maintainability. It moves test_transforms.py directly into the 'tests' directory for easier access and a more organized structure. * Refactor and enhance SEGY file tests The commit renames `tests/main/test_segy_file.py` to `tests/test_segy_file.py` and adds new fixture functions to the latter for SEGY file properties like endianness, standard, and sample format. It also provides functionality for mocking SEGY URI. Additionally, it commences work on a TestSegyFile class with updated revisions test methods commented out for future completion. * Update SEG-Y file spec inferencing The update refactored the file processing part and improved the way specifications are inferred from SEG-Y files. Procedures to read binary file header buffer and infer specification from binary header have been added. Furthermore, a transform is only added if the specification endianness is "BIG". These changes should optimize and clarify the process of handling SEG-Y files. * Adjust SEGY revision setting in factory The code has been modified to only set the SEGY revision value if it is not equal to REV0. This avoids unnecessary operations and grants more efficiency when handling SEGY standards, primarily when the REV0 standard is in use. This change is applied within the binary header configuration of the factory module in SEGY. * Move SEGY_FORMAT_MAP to separate module for reusability The SEGY_FORMAT_MAP dict, previously defined in factory.py was moved into a new module, mapping.py under standards directory. This allows for a cleaner organization and a better segregation of responsibilities between modules. * Replace simple dictionary with bidirectional dictionary in SEG-Y mapping The simple dictionary that was used for SEG-Y format mapping is replaced with a bidirectional dictionary, called BiDict, to allow for more robust and flexible operations. This dictionary allows for inverse lookups and prevents the creation of duplicate values with different keys, thereby enhancing the reliability and efficiency of mapping operations. * Update spec inference to do sample format * Remove conftest.py and refactor test_segy_file.py The conftest.py file used for main tests is removed as its functionalities are now redundant. Also, the test_segy_file.py is thoroughly refactored with the introduction of the dataclass "SegyFileTestConfig" to provide a more structured and configurable test setup process. The tests are then rewritten to utilize this new structure. Along with these, minor variable name adjustments are made to improve clarity. * Rename IBM IEEE test file The test file for IBM IEEE has been renamed to test_ibm_float.py. This new name is more descriptive and accurately represents the content and functionality of the tests contained within the file. * Remove specific validation condition comment * Updated condition to include writeability check In the transform.py file, the previous transformation check only considered size matchings. The added condition prevents in-place modifications for immutable fields. This modification ensures we consider the writeability of the array before performing any in-place changes, preventing errors due to attempts to change immutable array fields. * Expand SegyFile tests and add new test cases The SEGY file testing has been expanded with introduction of new random test trace data generated using factory. Additional test cases have been added to test trace header accessor, trace sample accessor, and trace accessor including their respective values. The new changes also enable testing for INT32 scalar type and appropriately type and format the generated data. * Reorder buffer reading in segy file * Reorder buffer reading in segy file * Remove some dependency on settings in `segy/file.py` The code has been refactored to remove reliance on settings to define endianness, samples per trace, sample interval, and extended text headers. These are now directly inferred or specified in the file itself without resorting to settings overrides, simplifying the flow and increasing the robustness of the code. * Refactor create_spec function in segy file The function create_spec in file.py (in the segy module) is refactored to eliminate optional spec assignment. The handling of 'endian' parameter is now mandatory for the function, reducing redundancy and improving overall code readability. * Add SegyFileSettings override tests and update defaults This commit introduces a new test class, 'TestSegyFileSettingsOverride', to verify the behavior of setting overrides for SegyFile. It also changes the default endianness for generated test SEG-Y file to be either BIG or None and adjusts sample format in the parameterized test cases. * remove unnecessary endianness assignment * Update test descriptions in test_segy_file.py The commit revises the descriptions for the tests 'test_revision_override' and 'test_revision_endian_override' in test_segy_file.py. These changes provide clearer descriptions of the operations being tested - specifically, the creation and opening of files with override settings. * Remove tests/schema/conftest.py The tests/schema/conftest.py file was removed as part of a code cleanup. This file contained several unit tests and fixtures, but it is no longer necessary due to changes in the testing strategy and schema definitions. * Add tests for TraceDescriptor and refactor TextHeaderDescriptor tests The changes introduced new tests that verify the functionality of the TraceDescriptor. This was added within a new Python file named "test_schema_trace.py". It also refactored existing tests for TextHeaderDescriptor, simplifying the code and reducing the number of total lines. Furthermore, the file "test_header.py" was renamed to "test_schema_header.py" to better reflect its contents. * Add tests for SEGY schema data types New test file `test_schema_data_type.py` has been created to validate components that define fields and data types of a SEGY Schema. This includes tests for data type descriptor creation, structured data type descriptor functionality, and validation of StructuredDataTypeDescriptor's recreation from JSON. * Add tests for customizing SegyDescriptor components The tests cover modifications of different parts of SEGY files using the SegyDescriptor class. Specifically, tests are written to validate customizations for the textual file header, binary file headers, and trace headers. Extended text header and trace sample descriptor modifications are also covered. The tests ensure that these customizations adhere to the SEGY standards. * update lock file * Replace custom BiDict with bidict library The custom BiDict class in `src/segy/standards/mapping.py` has been replaced with the external bidict library for cleaner code and improved efficiency. The bidict library was also added to the dependencies in `pyproject.toml`. This move simplifies the codebase by utilising a well-tested external library, thus removing the need for maintaining the custom BiDict implementation. * Refactor file handling operations in SEGY module Replaced numpy methods for byte handling with their struct equivalents for better performance and reliability in the file handling operations within the SEGY module. Fixed type mappings and validations for sample format detection and made the code more intuitive and efficient. * Update type references and simplify settings initialization Removed an unused import and updated type references in test_segy_file.py, test_segy_factory.py, and test_schema_data_type.py. Also simplified the initialization of SegyFileSettings instances for readability and consistency. Renamed TestConfig to SegyFactoryTestConfig for better context understanding. Changed the revision from constants to literal values to reflect change in the codebase. * Enable GIL release in SEGY conversion functions The modification in the 'ibm.py' file enables the release of the Global Interpreter Lock (GIL) in the IEEE to IBM and IBM to IEEE float conversion functions. This can facilitate multithreaded Python programs and potentially enhance performance. Simultaneously, type hints in the function arguments were removed to prevent linting errors. * Refactor code and remove unnecessary type ignore comments The type ignore comments in various files such as ibm.py, arrays.py, indexing.py, etc., were removed as they were redundant. The unwanted function in standards_registry.py causing a ValueError was also removed. The typing import was rearranged in a couple of files for better readability. --------- Co-authored-by: Altay Sansal <altay.sansal@tgs.com>
TGSAI · Apr 3, 2024 · 7bb9c79 · 7bb9c79
1 parent face156
commit 7bb9c79
Show file tree

Hide file tree

Showing 35 changed files with 1,260 additions and 1,374 deletions.
diff --git a/docs/tutorials/creation.ipynb b/docs/tutorials/creation.ipynb
@@ -80,7 +80,7 @@
     "TRACE_COUNT = 15\n",
     "\n",
     "headers = factory.create_trace_header_template(size=TRACE_COUNT)\n",
-    "samples = factory.create_trace_data_template(size=TRACE_COUNT)\n",
+    "samples = factory.create_trace_sample_template(size=TRACE_COUNT)\n",
     "\n",
     "for trace_idx in range(TRACE_COUNT):\n",
     "    headers[trace_idx][\"trace_seq_file\"] = trace_idx + 1\n",

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -32,6 +32,7 @@ s3fs = {version = ">=2024.2.0", optional = true}
 adlfs = {version = ">=2024.2.0", optional = true}
 eval-type-backport = {version = "^0.1.3", python = "<3.10"}
 click-params = "^0.5.0"
+bidict = "^0.23.1"
 
 [tool.poetry.group.dev.dependencies]
 ruff = "^0.3.4"
@@ -122,6 +123,7 @@ style = "google"
 arg-type-hints-in-docstring = false
 check-return-types = false
 check-yield-types = false
+exclude = 'src/segy/ibm.py'
 
 [tool.coverage.paths]
 source = ["src", "*/site-packages"]

diff --git a/src/segy/arrays.py b/src/segy/arrays.py
@@ -12,13 +12,12 @@
 
 from json import dumps as json_dumps
 from typing import TYPE_CHECKING
+from typing import Any
 
 import numpy as np
 from pandas import DataFrame
 
 if TYPE_CHECKING:
-    from typing import Any
-
     from numpy.typing import NDArray
 
 

diff --git a/src/segy/factory.py b/src/segy/factory.py
@@ -7,6 +7,9 @@
 import numpy as np
 
 from segy.schema import Endianness
+from segy.schema import ScalarType
+from segy.schema import SegyStandard
+from segy.standards.mapping import SEGY_FORMAT_MAP
 from segy.transforms import TransformFactory
 from segy.transforms import TransformPipeline
 
@@ -53,6 +56,16 @@ def __init__(
 
         self.spec.trace.sample_descriptor.samples = samples_per_trace
 
+    @property
+    def trace_sample_format(self) -> ScalarType:
+        """Trace sample format of the SEG-Y file."""
+        return self.spec.trace.sample_descriptor.format
+
+    @property
+    def segy_revision(self) -> SegyStandard:
+        """Revision of the SEG-Y file."""
+        return self.spec.segy_standard
+
     def create_textual_header(self, text: str | None = None) -> bytes:
         """Create a textual header for the SEG-Y file.
 
@@ -83,35 +96,33 @@ def create_binary_header(self) -> bytes:
         binary_descriptor = self.spec.binary_file_header
         bin_header = np.zeros(shape=1, dtype=binary_descriptor.dtype)
 
-        bin_header["seg_y_revision"] = self.spec.segy_standard.value * 256
+        if self.segy_revision != SegyStandard.REV0:
+            bin_header["seg_y_revision"] = self.segy_revision.value * 256
+
         bin_header["sample_interval"] = self.sample_interval
         bin_header["sample_interval_orig"] = self.sample_interval
         bin_header["samples_per_trace"] = self.samples_per_trace
         bin_header["samples_per_trace_orig"] = self.samples_per_trace
+        bin_header["data_sample_format"] = SEGY_FORMAT_MAP[self.trace_sample_format]
 
         return bin_header.tobytes()
 
     def create_trace_header_template(
         self,
         size: int = 1,
-        fill: bool = True,
     ) -> NDArray[Any]:
         """Create a trace header template array that conforms to the SEG-Y spec.
 
         Args:
             size: Number of headers for the template.
-            fill: Optional, fill with zeros. Default is True.
 
         Returns:
             Array containing the trace header template.
         """
         descriptor = self.spec.trace.header_descriptor
         dtype = descriptor.dtype.newbyteorder(Endianness.NATIVE.symbol)
 
-        header_template = np.empty(shape=size, dtype=dtype)
-
-        if fill is True:
-            header_template.fill(0)
+        header_template = np.zeros(shape=size, dtype=dtype)
 
         # 'names' assumed not None by data structure (type ignores).
         field_names = header_template.dtype.names
@@ -123,37 +134,33 @@ def create_trace_header_template(
 
         return header_template
 
-    def create_trace_data_template(
+    def create_trace_sample_template(
         self,
         size: int = 1,
-        fill: bool = True,
     ) -> NDArray[Any]:
         """Create a trace data template array that conforms to the SEG-Y spec.
 
         Args:
             size: Number of traces for the template.
-            fill: Optional, fill with zeros. Default is True.
 
         Returns:
             Array containing the trace data template.
         """
         descriptor = self.spec.trace.sample_descriptor
-        dtype = descriptor.dtype.newbyteorder(Endianness.NATIVE.symbol)
+        dtype = descriptor.dtype
 
-        data_template = np.empty(shape=size, dtype=dtype)
+        if self.trace_sample_format == ScalarType.IBM32:
+            dtype = np.dtype(("float32", (self.samples_per_trace,)))
 
-        if fill is True:
-            data_template.fill(0)
-
-        return data_template
+        return np.zeros(shape=size, dtype=dtype)
 
     def create_traces(self, headers: NDArray[Any], samples: NDArray[Any]) -> bytes:
         """Convert trace data and header to bytes conforming to SEG-Y spec.
 
         The rows (length) of the headers and traces must match. The headers
         must be a (num_traces,) shape array and data must be a
         (num_traces, num_samples) shape array. They can be created via the
-        `create_trace_header_template` and `create_trace_data_template` methods.
+        `create_trace_header_template` and `create_trace_sample_template` methods.
 
         Args:
             headers: Header array.
@@ -185,19 +192,22 @@ def create_traces(self, headers: NDArray[Any], samples: NDArray[Any]) -> bytes:
             msg = "Header array must have the same number of rows as data array."
             raise ValueError(msg)
 
-        target_endian = trace_descriptor.endianness
-
         header_pipeline = TransformPipeline()
         data_pipeline = TransformPipeline()
 
-        byte_swap = TransformFactory.create("byte_swap", target_endian)
-        ibm_float = TransformFactory.create("ibm_float", "to_ibm")
+        target_endian = trace_descriptor.endianness
+        target_format = trace_descriptor.sample_descriptor.format
+
+        if target_endian == Endianness.BIG:
+            byte_swap = TransformFactory.create("byte_swap", target_endian)
+            header_pipeline.add_transform(byte_swap)
+            data_pipeline.add_transform(byte_swap)
 
-        header_pipeline.add_transform(byte_swap)
-        data_pipeline.add_transform(ibm_float)
-        data_pipeline.add_transform(byte_swap)
+        if target_format == ScalarType.IBM32:
+            ibm_float = TransformFactory.create("ibm_float", "to_ibm")
+            data_pipeline.add_transform(ibm_float)
 
-        trace = np.empty(shape=len(samples), dtype=trace_descriptor.dtype)
+        trace = np.zeros(shape=headers.size, dtype=trace_descriptor.dtype)
         trace["header"] = header_pipeline.apply(headers)
         trace["sample"] = data_pipeline.apply(samples)