Training and calibrating machine learning models requires benchmarking datasets that provide a point of reference to compare and evaluate algorithms. Benchmarking datasets minimizes variations in model performance due to differences in the type and quality of input data, data format and processing, and choice of assessment metrics, which allow modelers and developers to focus on model improvement and fine tuning.
TreeForCaSt-s, a stand-level Spatio-Temporal Asset Catalog (STAC) for Modeling Forest Composition and Structure in the Pacific Northwest, is a proof-of-concept benchmarking dataset for modeling forest composition and structure using field inventory stands and remote sensing data. TreeForCaSt-s has the following features: 1) it is provided as a Spatio-Temporal Asset Catalog (STAC), a standard data structure to represent geospatial information. 2) It is a multi-sensor and multi-resolution dataset, 3) is open source and it is entirely based on public datasets available online. TreeForCaSt-s is intended for training multi-task machine learning models and models that are robust to missing data.