Refactor the virtualfile_in function to accept more 1-D arrays #2744

seisman · 2023-10-14T07:25:09Z

Description of proposed changes

Here are the current definitions of the virtualfile_from_data method and the data_kind function:

Lines 1473 to 1483 in c9d6147

    
           def virtualfile_from_data( 
        
               self, 
        
               check_kind=None, 
        
               data=None, 
        
               x=None, 
        
               y=None, 
        
               z=None, 
        
               extra_arrays=None, 
        
               required_z=False, 
        
               required_data=True, 
        
           ):

pygmt/pygmt/helpers/utils.py

Line 110 in c9d6147

    
           def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data=True):

When I started issue #2731, I realized the current function definitions have some limitations:

For some modules, the number of input columns can vary, depending on the given options. For example, binstats usually requires 3 columns (x/y/z), but only requires 2 columns (x/y) if -Cn is used, and requires 4 columns (x/y/z/w) if -W is used. I don't think we want to add w=None and required_w=False to these functions. Also, we don't check if the input table has the required number of columns.
The data_kind function does three things: (1) determines the kind of the input data, and (2) checks if the data/x/y/z combinations are valid; and (3) checks if the matrix-type data has 3 columns. The data_kind function is called inside virtualfile_from_data, but sometimes we need to know in data kind when wrapping GMT modules, for example, in Figure.plot and Figure.plot3d. It means the data_kind function is called twice, which is not necessary.

Solutions:

Refactor the virtualfile_from_data function like data=None, vectors=None, names=["x", "y"]. vectors is a list of vectors (e.g., vectors=[x, y]) and names is a list of column names. The wrappers are responsible for preparing the list of 1-D arrays (vectors) and counting the column names (names).
Let the data_kind focus on determining the data kind and have another separate function to check if the input data/vectors are valid.

seisman · 2023-10-15T09:11:33Z

pygmt/helpers/utils.py

@@ -15,127 +15,133 @@
 from pygmt.exceptions import GMTInvalidInput


-def _validate_data_input(
-    data=None, x=None, y=None, z=None, required_z=False, required_data=True, kind=None
+def validate_data_input(


I think it's more useful to pass the list of column names instead, i.e., replacing ncols=2 with names=["x", "y"].

So, for most modules, vectors=["x", "y"] and names=["x", "y"] or vectors=[x, y, z] and names=["x", "y", "z"].

For more complicated modules like plot or plot3d, the names can be
names=["x", "y", "direction_arg1", "direction_arg2", "fill", "size", "symbol", "transparency"].

The column names will be very useful when the GMTInvalidInput exception is raised.
For example, instead of "Column 5 can't be None.", we can say "Column 5 ('size') can't be None.". Instead of "data must have at least 8 columns.", we can say

data must have at least 8 columns: x y direction_arg1 direction_arg2 fill size symbol transparency

Done in f37413b

pygmt/clib/session.py

pygmt/helpers/utils.py

weiji14 · 2023-12-25T00:44:50Z

pygmt/helpers/utils.py

+        if len(vectors) < len(names):
+            raise GMTInvalidInput(
+                f"Requires {len(names)} 1-D arrays but got {len(vectors)}."
+            )


Missing unit test for this if-condition.

weiji14 · 2023-12-25T00:45:10Z

pygmt/helpers/utils.py

+                if len(data.shape) == 1 and data.shape[0] < len(names):
+                    raise GMTInvalidInput(msg)


Missing unit test for this if-condition.

weiji14 · 2023-12-25T00:47:34Z

pygmt/src/project.py

+    vectors, names = [x, y], "xy"
+    if z is not None:
+        vectors.append(z)


Need to append 'z' to names here? Also, need a unit test for this if-condition.

Actually no. The problem is that project only requires two columns, but three or more columns are required. Currently, the variable names are used for two purposes: (1) names of passed columns; (2) the number of columns. So, if we append z to names here, the calling pygmt.project(data=data) will fail if data has only two columns. I think we still need to maintain a variable for the number of required columns.

pygmt/tests/test_helpers.py

weiji14 · 2023-12-27T01:53:25Z

pygmt/clib/session.py

+        kind = data_kind(data, required=required_data)
+        validate_data_input(
+            data=data,
+            vectors=vectors,
+            names=names,
+            required_data=required_data,
+            kind=kind,
        )


The validation checks have been moved from within data_kind to virtualfile_from_data here. But in plot.py, we actually use data_kind on its own here:

pygmt/pygmt/src/plot.py

Line 217 in 3076ddc

kind = data_kind(data, x, y)

Are we ok with raising GMTInvalidInput much later here in virtualfile_from_data (after all the keyword argument parsing), rather than early on in data_kind?

seisman · 2024-10-03T14:36:02Z

Closing this PR since it will be superseded by #3369.

seisman force-pushed the refactor/virtualfile-to-data branch from 49b1b3f to 5512d2f Compare October 14, 2023 07:53

seisman changed the title ~~POC: Refactor the virtualfile_to_data and data_kind function to accept more 1-D arrays~~ WIP POC: Refactor the virtualfile_to_data and data_kind function to accept more 1-D arrays Oct 14, 2023

seisman changed the title ~~WIP POC: Refactor the virtualfile_to_data and data_kind function to accept more 1-D arrays~~ WIP/POC: Refactor the virtualfile_to_data and data_kind function to accept more 1-D arrays Oct 14, 2023

seisman force-pushed the refactor/virtualfile-to-data branch from 5512d2f to 70fc9e4 Compare October 14, 2023 12:15

Refactor the data_kind and the virtualfile_to_data functions

66c4b97

seisman force-pushed the refactor/virtualfile-to-data branch from 70fc9e4 to 66c4b97 Compare October 14, 2023 12:19

seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Oct 14, 2023

seisman added this to the 0.11.0 milestone Oct 14, 2023

Update more functions

78c28cd

seisman commented Oct 15, 2023

View reviewed changes

seisman added 8 commits October 15, 2023 18:41

Merge branch 'main' into refactor/virtualfile-to-data

f849e5a

Change ncols to names

f37413b

Fix more tests

3de7666

Fix project

93b91d0

Merge branch 'main' into refactor/virtualfile-to-data

2eecf48

Fix more tests

1d6e568

Fixes

6f9fc19

Merge branch 'main' into refactor/virtualfile-to-data

68034ed

seisman mentioned this pull request Oct 17, 2023

Figure.plot: Refactor to increase code readability #2742

Merged

seisman added 8 commits October 17, 2023 17:05

Fix triangulate

0db21bc

Fix text

7cf5290

Fix more failing tests

b0b6d2a

More fixes

fa875ef

Fix linting issues

2ee0df2

Fix linting issues

d5c8340

Fix linting issues

30bacb1

Merge branch 'main' into refactor/virtualfile-to-data

4465f9b

seisman commented Oct 20, 2023

View reviewed changes

pygmt/clib/session.py Outdated Show resolved Hide resolved

Update pygmt/clib/session.py

593f252

seisman modified the milestones: 0.11.0, 0.12.0 Dec 11, 2023

weiji14 reviewed Dec 25, 2023

View reviewed changes

Merge branch 'main' into refactor/virtualfile-to-data

872fd59

weiji14 reviewed Dec 27, 2023

View reviewed changes

seisman added 2 commits January 16, 2024 22:06

Merge branch 'main' into refactor/virtualfile-to-data

3ed0eb2

Merge branch 'main' into refactor/virtualfile-to-data

efa7a11

seisman removed this from the 0.12.0 milestone Feb 26, 2024

Merge branch 'main' into refactor/virtualfile-to-data

23fc3ea

seisman marked this pull request as draft March 1, 2024 04:51

seisman changed the title ~~Refactor the virtualfile_from_data and data_kind function to accept more 1-D arrays~~ Refactor the virtualfile_in and data_kind function to accept more 1-D arrays Mar 4, 2024

seisman added 3 commits July 11, 2024 17:57

Merge branch 'main' into refactor/virtualfile-to-data

aa05333

Fix plot and plot3d

5c10fc4

Fix errors in merging the main branch

525a353

This was referenced Jul 12, 2024

Add the Session.virtualfile_from_stringio method to allow StringIO input for certain functions/methods #3326

Merged

Refactor the data_kind and validate_data_input functions #3335

Merged

seisman added 2 commits July 20, 2024 14:07

Merge branch 'main' into refactor/virtualfile-to-data

2f3fcc4

Fix merging issue

b55a9ad

seisman changed the title ~~Refactor the virtualfile_in and data_kind function to accept more 1-D arrays~~ Refactor the virtualfile_in afunction to accept more 1-D arrays Jul 20, 2024

seisman changed the title ~~Refactor the virtualfile_in afunction to accept more 1-D arrays~~ Refactor the virtualfile_in function to accept more 1-D arrays Jul 20, 2024

Merge branch 'main' into refactor/virtualfile-to-data

46be0fa

seisman added this to the 0.13.0 milestone Jul 23, 2024

seisman modified the milestones: 0.13.0, 0.14.0 Aug 4, 2024

seisman removed this from the 0.14.0 milestone Sep 5, 2024

seisman mentioned this pull request Sep 11, 2024

Refactor Session.virtualfile_in, removing 'extra_arrays'/'required_z' and add 'required_cols' #3369

Draft

seisman mentioned this pull request Oct 2, 2024

data_kind: Add more tests to demonstrate the data kind of various data types #3480

Merged

seisman closed this Oct 3, 2024

seisman deleted the refactor/virtualfile-to-data branch October 3, 2024 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the virtualfile_in function to accept more 1-D arrays #2744

Refactor the virtualfile_in function to accept more 1-D arrays #2744

seisman commented Oct 14, 2023 •

edited

Loading

seisman Oct 15, 2023

seisman Oct 15, 2023

weiji14 Dec 25, 2023

weiji14 Dec 25, 2023

weiji14 Dec 25, 2023

seisman Jan 19, 2024

weiji14 Dec 27, 2023

seisman commented Oct 3, 2024

	def virtualfile_from_data(
	self,
	check_kind=None,
	data=None,
	x=None,
	y=None,
	z=None,
	extra_arrays=None,
	required_z=False,
	required_data=True,
	):

		if len(data.shape) == 1 and data.shape[0] < len(names):
		raise GMTInvalidInput(msg)

Refactor the virtualfile_in function to accept more 1-D arrays #2744

Refactor the virtualfile_in function to accept more 1-D arrays #2744

Conversation

seisman commented Oct 14, 2023 • edited Loading

seisman Oct 15, 2023

Choose a reason for hiding this comment

seisman Oct 15, 2023

Choose a reason for hiding this comment

weiji14 Dec 25, 2023

Choose a reason for hiding this comment

weiji14 Dec 25, 2023

Choose a reason for hiding this comment

weiji14 Dec 25, 2023

Choose a reason for hiding this comment

seisman Jan 19, 2024

Choose a reason for hiding this comment

weiji14 Dec 27, 2023

Choose a reason for hiding this comment

seisman commented Oct 3, 2024

seisman commented Oct 14, 2023 •

edited

Loading