You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When dynamic schema is enabled and the symbol contains empty dataframe it is possible to append a rowrange index to the symbol. This should be forbidden since the indexes are not matching. The append changes the index type to be rowrange, but still it is possible to append datetime to that as well leading to a data corruption.
Steps/Code to Reproduce
importarcticdbimportnumpyasnpimportpandasaspd# 4.4.0print(arcticdb.__version__)
ac=arcticdb.Arctic("lmdb://test")
opts=arcticdb.LibraryOptions(dynamic_schema=True)
lib=ac.get_library("test", create_if_missing=True, library_options=opts)
lib.write("test_dyn", pd.DataFrame({"a": [], dtype="float64"}))
assertlib.read("test_dyn").data.index.equals(pd.DatetimeIndex([]))
# This should not be allowedlib.append("test_dyn", pd.DataFrame({"a": [1.0]}, pd.RangeIndex(start=0, stop=1, step=1)))
# The index appears to be changedassertlib.read("test_dyn").data.index.equals(pd.RangeIndex(start=0, stop=1, step=1))
# The output looks ok even though the index is messed up# a# 0 1print(lib.read("test_dyn").data)
# Even though the index appears to be changed the following will throw:# lib.append("test_dyn", pd.DataFrame({"a": [1.0]}, pd.RangeIndex(start=1, stop=2, step=1)))# Appending datetime to this is allowed only if validate index is falselib.append("test_dyn", pd.DatetimeIndex(["01/01/2024"]), validate_index=False)
# The data is corrupted# index# 1 NaT# 682406825570164596 2024-01-01
Expected Results
There are two cases:
The empty index feature is enabled (Implement empty index for 0-rowed columns #1429, Feature flag empty_index #1475)
Anything could be appended to symbol containing an empty dataframe. The type of the index is determined at the time of first append (or update). After the type of the index is determined index mixing indexes should not be allowed.
The empty index feature is disabled
The index is determined at write time. In case of Pandas 2 it is DatetimeIndex([]) for all empty DataFrames (in case of Pandas 1 it is not so consistent refer to test_empty_column_type.py for the expected behavior). Appending to a symbol containing empty dataframe is allowed only if the indexes are matching.
Describe the bug
When dynamic schema is enabled and the symbol contains empty dataframe it is possible to append a rowrange index to the symbol. This should be forbidden since the indexes are not matching. The append changes the index type to be rowrange, but still it is possible to append datetime to that as well leading to a data corruption.
Steps/Code to Reproduce
Expected Results
There are two cases:
The empty index feature is enabled (Implement empty index for 0-rowed columns #1429, Feature flag empty_index #1475)
Anything could be appended to symbol containing an empty dataframe. The type of the index is determined at the time of first append (or update). After the type of the index is determined index mixing indexes should not be allowed.
The empty index feature is disabled
The index is determined at write time. In case of Pandas 2 it is DatetimeIndex([]) for all empty DataFrames (in case of Pandas 1 it is not so consistent refer to test_empty_column_type.py for the expected behavior). Appending to a symbol containing empty dataframe is allowed only if the indexes are matching.
OS, Python Version and ArcticDB Version
Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22631-SP0
ArcticDB: 4.4.0
Numpy: 1.26.3
Pandas: 2.1.4
Backend storage used
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: