-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new spatial foundational types #219
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a draft set of changes to the abstract spec detailing the new foundational types (e.g., PointCloud)?
(similar question for the composed types introduced or modified in #220 and #221)
In particular, the SpatialDataFrame introduces some significant new concepts, such as an object that is indexed by something other than a joinid. We need to document these concepts and nail down terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No concerns with code, but would very much like to see a companion PR that includes a draft revision of the design (aka abstract) spec.
- Add new MultiscaleImage - Add new SpatialDataframe abstract base class - Add PointCloud subclass of the SpatialDataFrame - Add GeometryDataFrame subclass of the SpatialDataFrame - Add shapely as a dependency
* Update docstrings * Add ValueError error messages in `SpatialRead` * Rename method `read_level`->`read_region`
34f7878
to
462678a
Compare
I added a new (WIP) PR #223 here. It should be moving out of draft form soon. |
Co-authored-by: nguyenv <vivian@tiledb.com>
The individual implementations may find a shared base class useful, but it doesn't need to be included in the abstract specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SpatialDataFrame
and PointCloud
have a coordinate_space
property? MultiscaleImage
has this.
To me, it's more important for the "coordinate" types since they can have units other than pixels.
Using the `value_filter` is a less confusing API.
The name `transform` was misleading.
Clarify that they should be in the same order in the docstring.
Co-authored-by: nguyenv <vivian@tiledb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I have a few specific comments, questions, and suggestions.
Also wondering if we could/should hold off on adding the GeometryDataFrame
class? Unless I'm mistaken, I don't believe it's needed for the spot-based assays we're targeting for the initial release? If we do leave it in, I think the docstrings for PointCloud
and GeometryDataFrame
should make it clearer how they differ from SOMADataFrame
and from each other.
uri: str, | ||
*, | ||
schema: pa.Schema, | ||
index_column_names: Sequence[str] = (options.SOMA_JOINID, "x", "y"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of point clouds, do you think it makes sense to always index by the defined axes (and joinid) to facilitate efficient spatial queries? User-defined indices make sense for SOMADataFrames, which are more general-purpose, but for PointClouds, it seems like indexing by the axes is the most common use case and what distinguishes a PointCloud from a DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spatial axes need to be index columns (see doc string), but this is still an argument so there is flexibility with:
- The order of the index columns.
- The option to include or exclude
soma_joinid
(or other columns) as index columns.
|
||
@property | ||
@abc.abstractmethod | ||
def coordinate_space(self) -> Optional[coordinates.CoordinateSpace]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should create()
have an argument to define the coordinate space at create tiem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generated from the axis names. We could switch the parameter to coordinate space instead. How would you feel about leaving that as a question for alpha users?
raise NotImplementedError() | ||
|
||
|
||
class GeometryDataFrame(base.SOMAObject, metaclass=abc.ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to align naming of the PointCloud
and GeometryDataFrame
classes. Both are essentially specialized DataFrame
s so the <specialization>DataFrame
naming scheme makes sense. Although PointCloudDataFrame
is a bit of a mouthful. I don't have strong feelings about which approach we use but I do think we should be consistent, unless there's a good reason not to.
index_column_names: Sequence[str] = ( | ||
options.SOMA_JOINID, | ||
options.SOMA_GEOMETRY, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too I'm not sure if we need to allow users to define the index columns on these specialized dataframes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. The user needs to include the geometry as an index column. An error is thrown if they do not. But there is flexibility in the other columns they may want to include.
Co-authored-by: Aaron Wolen <aaron@wolen.com>
Co-authored-by: Aaron Wolen <aaron@wolen.com>
Co-authored-by: Aaron Wolen <aaron@wolen.com>
* Expand PointCloud doc string * Expand GeometryDataFrame docstring * Double back ticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
New classes:
Note: This PR is ready for review, but will be merged after PR #218