Simplified the FastConv1d API. Ensured that all sequence lengths pass…

…ed to kernel are int32. Prepped for 0.3.5.2 release.
jlparkI · May 17, 2024 · c8967c6 · c8967c6
1 parent 2cc1306
commit c8967c6
Show file tree

Hide file tree

Showing 53 changed files with 37 additions and 133 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -201,3 +201,8 @@ kernels. Memory consumption now scales as O(1) with sequence length
 for both CPU / GPU. Added the simplex random features (Reid et al. 2023)
 modification as an option for most kernels, with the exception of
 MiniARD and (of course) Linear.
+
+### Version 0.3.5.2
+Simplified the FastConv1d API and updated the docs. FastConv1d now double checks
+that the sequence lengths are int32. pre_prediction_checks now also ensures
+sequence lengths are int32.
diff --git a/docs/advanced/fitting_tutorial.rst b/docs/advanced/fitting_tutorial.rst
@@ -110,7 +110,7 @@ As a side note: SGD is popular in the literature because it works well for
 deep learning. Most papers that recommend some flavor of SGD haven't tried
 to use it for high-dimensional linear problems that may sometimes be
 ill-conditioned and where a tight fit is desired -- SGD does NOT work well
-under this particular set of circumstances, at least Adam, AMSGrad, SVRG,
-and the other usual suspects. The amount of learning rate tuning required
-to get SGD to work well is simply not acceptable for an out of the box
-fitting routine.
+compared to preconditioned CG under this particular set of circumstances,
+at least Adam, AMSGrad, SVRG, and the other usual suspects. The amount of
+learning rate and learning rate schedule tuning required to get SGD to work
+well is simply not acceptable for an out of the box fitting routine.
diff --git a/docs/advanced/initialization_tutorial.rst b/docs/advanced/initialization_tutorial.rst
@@ -117,4 +117,4 @@ and time series:
 
 .. autoclass:: xGPR.FastConv1d
    :special-members: __init__
-   :members: conv1d_pretrain_feat_extract, conv1d_x_feat_extract
+   :members: predict
diff --git a/docs/approximation_background.rst b/docs/approximation_background.rst
@@ -9,8 +9,8 @@ million matrix for a million-datapoint dataset.)
 xGPR approximates the kernel matrix by representing each datapoint using a
 set of "random features", a representation constructed such that the
 dot product between any two representations is approximately equivalent
-to the exact kernel measurement. This converts Gaussian processes and
-kernel discriminant classifiers into ridge regression and LDA.
+to the exact kernel measurement. This converts Gaussian process regression
+into ridge regression.
 
 The more random features you use, the more accurate the approximation
 and the closer to an exact kernel machine. The error decreases exponentially
@@ -31,7 +31,8 @@ xGPR model is performing pretty well but we'd like it to perform
 a little better -- just use more random features. If perforance is dramatically below
 what we need, by contrast, and increasing the number of RFFs isn't helping
 very much, we know we need to switch to a different kernel, feature set or
-modeling technique.
+use another modeling technique instead of a GP.
+
 
 How many random features do I need?
 ------------------------------------

diff --git a/docs/basic_tutorial.rst b/docs/basic_tutorial.rst
@@ -138,7 +138,7 @@ change it subsequently as well, at least right up until we fit). ``num_rffs`` co
 how accurately the kernel is approximated. The error in the kernel approximation falls
 off exponentially with larger ``num_rffs`` values, so increasing ``num_rffs`` generally
 makes the model more accurate, but with diminishing returns. It also increases
-computational expense (fitting using ``num_rffs=4096`` will be much faster than fitting
+computational expense (fitting using ``num_rffs=1024`` will be much faster than fitting
 with ``num_rffs=32,768``).
 
 Finally, notice that when calling ``model.predict`` just as when building a dataset,
@@ -214,8 +214,8 @@ Here's an example:::
 Now, we just minimize the value returned by this function -- again, we can use Optuna,
 grid search, Bayesian optimization, what have you.
 
-Notice one funny trick in the function above. ``exact_nmll`` is much faster if the
-number of RFFs is small. On GPU, it can be reasonably fast up to about 8,192 RFFs.
+Notice one thing in the function above. ``exact_nmll`` is much faster if the
+number of RFFs is small. On GPU, it can be reasonably fast up to about 8,000 RFFs or so.
 It has cubic scaling, however, so for large numbers of RFFs it can get very
 slow very quickly. ``approximate_nmll`` has much better scaling and so is your
 friend if you want to tune using a large ``num_rffs``. It does involve an additional
@@ -248,7 +248,7 @@ None (the default) for ``bounds``; if None, xGPR uses some default search bounda
 ``tune_hyperparams_crude`` uses an SVD, which means it doesn't scale well
 -- it can get pretty slow for  ``num_rffs = 3,000`` or above. Fortunately, we've generally
 found that the hyperparameters which give good NMLL with a small number of RFFs
-(a sketchy kernel approximation) are *usually* not too terribly far away from those which give
+(a sketchy kernel approximation) are usually not too terribly far away from those which give
 good NMLL with a larger number of RFFs (a better kernel approximation).
 (This is a rule of thumb, and like all rules of thumb should be used with caution.)
 So, one way to use these two functions together is to use ``tune_hyperparams_crude`` for a
@@ -285,7 +285,7 @@ Remember that when calculating NMLL, we could use ``exact_nmll`` or
 ``approximate_nmll``. The function ``tune_hyperparams`` offers you the same choice:
 you can set ``nmll_method`` to either ``nmll_method=exact`` or ``nmll_method=approximate``,
 and the considerations are the same. Again, ``exact`` is faster if ``num_rffs`` is small,
-maybe < 8,192 or so, while ``approximate_nmll`` has better scaling.
+while ``approximate_nmll`` has better scaling.
 
 Finally, one important thing to keep in mind. Most of these methods run at reasonable
 speed on GPU. On CPU, however, tuning with a large ``num_rffs`` can be a slow slow slog.
@@ -295,7 +295,7 @@ Setting the ``num_threads`` parameter on your model can help a little, e.g.:::
 
 ``num_threads`` is ignored if you're fitting on GPU. But that can only help so much. We strongly
 recommend doing hyperparameter tuning and fitting on GPU whenever possible. Making predictions,
-by contrast, is reasonably fast on CPU (even if not quite as fast as GPU). So fitting on GPU and
+by contrast, is reasonably fast on CPU. So fitting on GPU and
 doing inference on CPU is a perfectly viable way to go if desired.
 
 That's really all you absolutely need to know! For lots of useful TMI, see Advanced Tutorials.
diff --git a/docs/kernel_info/static_layers.rst b/docs/kernel_info/static_layers.rst
@@ -44,30 +44,8 @@ Linear or Matern or some other in principle, but we've never found that to be
 terribly useful.)
 
 For each feature extractor, you can supply additional arguments to control
-what kind of features it generates. These are summarized below.
+what kind of features it generates. More details are below.
 
-.. list-table:: static layer arguments
-   :header-rows: 1
-
-   * - Argument
-     - Description
-   * - ``seq_width``
-     - | The number of features per sequence element in the
-       | input (i.e. shape[2] when the input is a 3d numpy
-       | array).
-   * - ``device``
-     - | One of "gpu", "cpu".
-   * - ``random_seed``
-     - | The random seed (for reproducibility).
-   * - ``conv_width``
-     - | An integer -- the convolution width.
-   * - ``num_features``
-     - | The number of features ``FastConv1d`` should generate.
-       | Larger numbers will improve accuracy but increase
-       | computational expense. Try 1000 - 2000 to start with
-       | and increase if needed.
-   * - ``simplex_rffs``
-     - | Applies the simplex random features modification from
-       | Reid et al. 2023 (an experimental feature that may
-       | sometimes improve performance but slightly increases
-       | computational cost).
+.. autoclass:: xGPR.FastConv1d
+   :special-members: __init__
+   :members: predict
diff --git a/docs/purpose.rst b/docs/purpose.rst
@@ -6,7 +6,8 @@ models and approximate kernel discriminant classifiers to datasets ranging
 in size from hundreds to millions of datapoints.
 It runs on either CPU or GPU, models tabular data, sequence & time series
 data and graph data, and fits datasets too large to load into memory in a
-straightforward way.
+straightforward way. It is primarily designed for regression but also
+supports classification.
 
 
 Limitations of xGPR

diff --git a/setup.py b/setup.py
@@ -180,7 +180,7 @@ def main():
         long_description = long_description,
         long_description_content_type="text/markdown",
         install_requires=["numpy>=1.10", "scipy>=1.7.0",
-                    "cython>=0.10"],
+                    "cython>=0.10", "scikit-learn"],
         ext_modules = ext_modules,
         package_data={"": ["*.h", "*.c", "*.cu", "*.cpp",
                             "*.pyx", "*.sh"]}

diff --git a/test/TESTING_README.txt → tests/TESTING_README.txt b/test/TESTING_README.txt → tests/TESTING_README.txt
diff --git a/test/approximate_nmll_tests/test_slq_nmll.py → ...s/approximate_nmll_tests/test_slq_nmll.py b/test/approximate_nmll_tests/test_slq_nmll.py → ...s/approximate_nmll_tests/test_slq_nmll.py
diff --git a/test/auxiliary_tests/auxiliary_tests.py → tests/auxiliary_tests/auxiliary_tests.py b/test/auxiliary_tests/auxiliary_tests.py → tests/auxiliary_tests/auxiliary_tests.py
diff --git a/...asic_dataset_tests/basic_dataset_tests.py → ...asic_dataset_tests/basic_dataset_tests.py b/...asic_dataset_tests/basic_dataset_tests.py → ...asic_dataset_tests/basic_dataset_tests.py
diff --git a/...ete_pipeline_tests/current_kernel_list.py → ...ete_pipeline_tests/current_kernel_list.py b/...ete_pipeline_tests/current_kernel_list.py → ...ete_pipeline_tests/current_kernel_list.py
diff --git a/...eline_tests/discriminant_pipeline_test.py → ...eline_tests/discriminant_pipeline_test.py b/...eline_tests/discriminant_pipeline_test.py → ...eline_tests/discriminant_pipeline_test.py
diff --git a/.../complete_pipeline_tests/fitting_utils.py → .../complete_pipeline_tests/fitting_utils.py b/.../complete_pipeline_tests/fitting_utils.py → .../complete_pipeline_tests/fitting_utils.py
diff --git a/...te_pipeline_tests/test_current_kernels.py → ...te_pipeline_tests/test_current_kernels.py b/...te_pipeline_tests/test_current_kernels.py → ...te_pipeline_tests/test_current_kernels.py
diff --git a/test/fht_operations_tests/ard_grad_fht.py → tests/fht_operations_tests/ard_grad_fht.py b/test/fht_operations_tests/ard_grad_fht.py → tests/fht_operations_tests/ard_grad_fht.py
diff --git a/...rations_tests/basic_fht_functions_test.py → ...rations_tests/basic_fht_functions_test.py b/...rations_tests/basic_fht_functions_test.py → ...rations_tests/basic_fht_functions_test.py
diff --git a/...perations_tests/conv_testing_functions.py → ...perations_tests/conv_testing_functions.py b/...perations_tests/conv_testing_functions.py → ...perations_tests/conv_testing_functions.py
diff --git a/test/fht_operations_tests/fht_conv1d_test.py → ...s/fht_operations_tests/fht_conv1d_test.py b/test/fht_operations_tests/fht_conv1d_test.py → ...s/fht_operations_tests/fht_conv1d_test.py
diff --git a/...t_operations_tests/maxpool_conv1d_test.py → ...t_operations_tests/maxpool_conv1d_test.py b/...t_operations_tests/maxpool_conv1d_test.py → ...t_operations_tests/maxpool_conv1d_test.py
diff --git a/test/fht_operations_tests/rbf_fht.py → tests/fht_operations_tests/rbf_fht.py b/test/fht_operations_tests/rbf_fht.py → tests/fht_operations_tests/rbf_fht.py
diff --git a/...ations_tests/variable_length_seqs_test.py → ...ations_tests/variable_length_seqs_test.py b/...ations_tests/variable_length_seqs_test.py → ...ations_tests/variable_length_seqs_test.py
diff --git a/test/fitting_tests/test_cg_fit.py → tests/fitting_tests/test_cg_fit.py b/test/fitting_tests/test_cg_fit.py → tests/fitting_tests/test_cg_fit.py
diff --git a/test/fitting_tests/test_exact_fit.py → tests/fitting_tests/test_exact_fit.py b/test/fitting_tests/test_exact_fit.py → tests/fitting_tests/test_exact_fit.py
diff --git a/test/fitting_tests/test_lbfgs_fit.py → tests/fitting_tests/test_lbfgs_fit.py b/test/fitting_tests/test_lbfgs_fit.py → tests/fitting_tests/test_lbfgs_fit.py
diff --git a/test/fitting_tests/test_mean_calcs.py → tests/fitting_tests/test_mean_calcs.py b/test/fitting_tests/test_mean_calcs.py → tests/fitting_tests/test_mean_calcs.py
diff --git a/test/fitting_tests/test_offline_cg_fit.py → tests/fitting_tests/test_offline_cg_fit.py b/test/fitting_tests/test_offline_cg_fit.py → tests/fitting_tests/test_offline_cg_fit.py
diff --git a/test/gradient_calc_tests/__init__.py → tests/gradient_calc_tests/__init__.py b/test/gradient_calc_tests/__init__.py → tests/gradient_calc_tests/__init__.py
diff --git a/...ient_calc_tests/check_kernel_gradients.py → ...ient_calc_tests/check_kernel_gradients.py b/...ient_calc_tests/check_kernel_gradients.py → ...ient_calc_tests/check_kernel_gradients.py
diff --git a/...lc_tests/kernel_specific_gradient_test.py → ...lc_tests/kernel_specific_gradient_test.py b/...lc_tests/kernel_specific_gradient_test.py → ...lc_tests/kernel_specific_gradient_test.py
diff --git a/...econditioner_tests/preconditioner_test.py → ...econditioner_tests/preconditioner_test.py b/...econditioner_tests/preconditioner_test.py → ...econditioner_tests/preconditioner_test.py
diff --git a/test/run_auxiliary_tests.sh → tests/run_auxiliary_tests.sh b/test/run_auxiliary_tests.sh → tests/run_auxiliary_tests.sh
diff --git a/test/run_basic_tests.sh → tests/run_basic_tests.sh b/test/run_basic_tests.sh → tests/run_basic_tests.sh
diff --git a/test/run_complete_pipe_sl_tests.sh → tests/run_complete_pipe_sl_tests.sh b/test/run_complete_pipe_sl_tests.sh → tests/run_complete_pipe_sl_tests.sh
diff --git a/test/run_fitting_tests.sh → tests/run_fitting_tests.sh b/test/run_fitting_tests.sh → tests/run_fitting_tests.sh
diff --git a/test/run_gradient_tests.sh → tests/run_gradient_tests.sh b/test/run_gradient_tests.sh → tests/run_gradient_tests.sh
diff --git a/test/run_precond_nmll_tests.sh → tests/run_precond_nmll_tests.sh b/test/run_precond_nmll_tests.sh → tests/run_precond_nmll_tests.sh
diff --git a/test/run_statlayer_tests.sh → tests/run_statlayer_tests.sh b/test/run_statlayer_tests.sh → tests/run_statlayer_tests.sh
diff --git a/test/run_tuning_tests.sh → tests/run_tuning_tests.sh b/test/run_tuning_tests.sh → tests/run_tuning_tests.sh
diff --git a/test/speed_tests/time_graphconv.py → tests/speed_tests/time_graphconv.py b/test/speed_tests/time_graphconv.py → tests/speed_tests/time_graphconv.py
diff --git a/test/speed_tests/time_hadamard_transform.py → tests/speed_tests/time_hadamard_transform.py b/test/speed_tests/time_hadamard_transform.py → tests/speed_tests/time_hadamard_transform.py
diff --git a/...atic_layer_tests/basic_statlayer_tests.py → ...atic_layer_tests/basic_statlayer_tests.py b/...atic_layer_tests/basic_statlayer_tests.py → ...atic_layer_tests/basic_statlayer_tests.py
@@ -19,26 +19,19 @@ class CheckStatLayerConstruction(unittest.TestCase):
 
     def test_static_layer_builders(self):
         """Test static layer construction and basic functions."""
-        test_online_dataset, test_offline_dataset = build_test_dataset(conv_kernel = True,
+        test_online_dataset, _ = build_test_dataset(conv_kernel = True,
             xsuffix = "testxvalues.npy", ysuffix = "testyvalues.npy")
         train_online_dataset, _ = build_test_dataset(conv_kernel = True)
 
         conv_statlayer = FastConv1d(test_online_dataset.get_xdim()[2],
                 device = "cpu", random_seed = RANDOM_SEED, conv_width = 3,
                 num_features = 512)
-        conv_dset = conv_statlayer.conv1d_pretrain_feat_extract(test_offline_dataset,
-                os.getcwd())
 
         xchunks = list(train_online_dataset.get_chunked_x_data())
-        x_trans = conv_statlayer.conv1d_feat_extract(xchunks[0][0], xchunks[0][1])
+        x_trans = conv_statlayer.predict(xchunks[0][0], xchunks[0][1])
         self.assertTrue(x_trans.shape[1] == 512)
         self.assertTrue(xchunks[0][0].shape[0] == x_trans.shape[0])
 
-        for xfile in conv_dset.get_xfiles():
-            os.remove(xfile)
-        for yfile in conv_dset.get_yfiles():
-            os.remove(yfile)
-
 
 if __name__ == "__main__":
     unittest.main()
diff --git a/test/tuning_tests/test_crude_tuning.py → tests/tuning_tests/test_crude_tuning.py b/test/tuning_tests/test_crude_tuning.py → tests/tuning_tests/test_crude_tuning.py
diff --git a/test/tuning_tests/test_tuning.py → tests/tuning_tests/test_tuning.py b/test/tuning_tests/test_tuning.py → tests/tuning_tests/test_tuning.py
diff --git a/test/utils/__init__.py → tests/utils/__init__.py b/test/utils/__init__.py → tests/utils/__init__.py
diff --git a/test/utils/build_classification_dataset.py → tests/utils/build_classification_dataset.py b/test/utils/build_classification_dataset.py → tests/utils/build_classification_dataset.py
diff --git a/test/utils/build_test_dataset.py → tests/utils/build_test_dataset.py b/test/utils/build_test_dataset.py → tests/utils/build_test_dataset.py
diff --git a/test/utils/evaluate_model.py → tests/utils/evaluate_model.py b/test/utils/evaluate_model.py → tests/utils/evaluate_model.py
diff --git a/test/utils/model_constructor.py → tests/utils/model_constructor.py b/test/utils/model_constructor.py → tests/utils/model_constructor.py
diff --git a/xGPR/__init__.py b/xGPR/__init__.py
@@ -1,6 +1,6 @@
 #Version number. Updated if generating a new release.
 #Otherwise, do not change.
-__version__ = "0.3.5"
+__version__ = "0.3.5.2"
 
 #Key imports.
 from .xgp_regression import xGPRegression

diff --git a/xGPR/model_baseclass.py b/xGPR/model_baseclass.py
@@ -176,7 +176,9 @@ def pre_prediction_checks(self, input_x, sequence_lengths, get_var:bool):
             mempool.free_all_blocks()
             x_array = cp.asarray(input_x)
             if sequence_lengths is not None:
-                sequence_lengths = cp.asarray(sequence_lengths)
+                sequence_lengths = cp.asarray(sequence_lengths).astype(cp.int32)
+        elif sequence_lengths is not None:
+            sequence_lengths = sequence_lengths.astype(np.int32)
 
         return x_array, sequence_lengths
 

diff --git a/xGPR/static_layers/fast_conv.py b/xGPR/static_layers/fast_conv.py
@@ -8,7 +8,6 @@
 like a fully connected layer on top of a convolutional layer.
 """
 import sys
-import os
 
 import numpy as np
 try:
@@ -17,7 +16,7 @@
     pass
 
 from ..kernels.convolution_kernels.conv_feature_extractor import FHTMaxpoolConv1dFeatureExtractor
-from ..data_handling.offline_data_handling import OfflineDataset
+
 
 class FastConv1d:
     """Provides tools for performing convolution-based feature
@@ -72,82 +71,7 @@ def __init__(self, seq_width:int, device:str = "cpu", random_seed:int = 123,
         self.device = device
 
 
-
-
-    def conv1d_pretrain_feat_extract(self, input_dataset, output_dir:str):
-        """Performs feature extraction using a 1d convolution kernel,
-        saves the results to a specified location, and returns an
-        OfflineDataset. This function should be used if it is
-        desired to generate features for sequence / timeseries data
-        prior to training. By use of this feature, the GP is essentially
-        turned into a three-layer network with a convolutional layer
-        followed by a fully-connected layer. Note that when
-        making predictions, you should use conv1d_feat_extract,
-        which takes an x-array as input rather than a dataset.
-
-        Args:
-            input_dataset: Object of class OnlineDataset or OfflineDataset.
-                You should generate this object using either the
-                build_online_dataset, build_offline_fixed_vector_dataset
-                or build_offline_sequence_dataset functions under
-                data_handling.dataset_builder.
-            output_dir (str): A valid directory filepath where the output
-                can be saved.
-
-        Returns:
-            conv1d_dataset (OfflineDataset): An OfflineDataset containing
-                the xfiles and yfiles that resulted from the feature
-                extraction operation. You can feed this directly to
-                the hyperparameter tuning and fitting methods.
-
-        Raises:
-            ValueError: If the inputs are not valid a detailed ValueError
-                is raised explaining the issue.
-        """
-        start_dir = os.getcwd()
-        try:
-            os.chdir(output_dir)
-        except:
-            raise ValueError("Invalid output directory supplied to the "
-                    "feature extractor.")
-
-
-        input_dataset.device = self.device
-        xfiles, yfiles = [], []
-        fnum, chunksize, max_class = 0, 0, 0
-
-        for xbatch, ybatch, seqlen in input_dataset.get_chunked_data():
-            xfile, yfile = f"CONV1d_FEATURES_{fnum}_X.npy", f"CONV1d_FEATURES_{fnum}_Y.npy"
-            xtrans = self.conv_kernel.transform_x(xbatch, seqlen)
-            if self.device == "gpu":
-                ybatch = cp.asnumpy(ybatch)
-                xtrans = cp.asnumpy(xtrans)
-
-            np.save(xfile, xtrans)
-            np.save(yfile, ybatch)
-            xfiles.append(xfile)
-            yfiles.append(yfile)
-            if ybatch.shape[0] > chunksize:
-                chunksize = ybatch.shape[0]
-            if ybatch.max() > max_class:
-                max_class = int(ybatch.max())
-            fnum += 1
-
-        xdim = (input_dataset.get_ndatapoints(), self.num_features)
-        updated_dataset = OfflineDataset(xfiles, yfiles, None,
-                            xdim, input_dataset.get_ymean(),
-                            input_dataset.get_ystd(),
-                            max_class = max_class,
-                            chunk_size = chunksize)
-
-        if self.device == "gpu":
-            mempool = cp.get_default_memory_pool()
-            mempool.free_all_blocks()
-        os.chdir(start_dir)
-        return updated_dataset
-
-
-    def conv1d_feat_extract(self, x_array, sequence_lengths, chunk_size:int = 2000):
+    def predict(self, x_array, sequence_lengths, chunk_size:int = 2000):
         """Performs feature extraction using a 1d convolution kernel
         and returns an array containing the result. This function should
         be used if it is desired to generate features for sequence /
@@ -186,10 +110,10 @@ def conv1d_feat_extract(self, x_array, sequence_lengths, chunk_size:int = 2000):
 
             if self.device == "gpu":
                 x_in = cp.asarray(x_array[i:cutoff,:,:]).astype(cp.float32)
-                seqlen_in = cp.asarray(sequence_lengths[i:cutoff])
+                seqlen_in = cp.asarray(sequence_lengths[i:cutoff]).astype(cp.int32)
             else:
                 x_in = x_array[i:cutoff,:,:]
-                seqlen_in = sequence_lengths[i:cutoff]
+                seqlen_in = sequence_lengths[i:cutoff].astype(np.int32)
 
             xtrans = self.conv_kernel.transform_x(x_in, seqlen_in)