-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading from GDAL virtual file systems (e.g. cloud storage) #1398
Labels
datasets
Geospatial or benchmark datasets
Comments
Edit: Better proposal below. Proposed changes: class RasterDataset(GeoDataset):
def __init__(
self,
..., # existing params
filenames: Optional[List[str]] = None
) -> None:
...
# Populate the dataset index
i = 0
if not filenames:
pathname = os.path.join(root, "**", self.filename_glob)
filepaths = [filepath for filepath in glob.iglob(pathname, recursive=True)]
else:
filepaths = [os.path.join(root, filename) for filename in filenames]
for filepath in filepaths:
# continue on line 366 in the original code and filenames should contain eventual subdirectories. |
Just found the listdir-method of fiona. It does not support recursive walks but will list sub-blobs in virtual file systems. from fiona.errors import FionaValueError
def listdir_vsi_recursive(root):
dirs = [root]
files = []
while dirs:
dir = dirs.pop()
try:
subdirs = fiona.listdir(dir)
dirs.extend([os.path.join(dir,subdir) for subdir in subdirs])
except FionaValueError:
files.append(dir)
return files
class RasterDataset(GeoDataset):
def __init__(
self,
..., # existing params
vsi: bool = False
) -> None:
...
# Populate the dataset index
i = 0
filename_regex = re.compile(self.filename_regex, re.VERBOSE)
if vsi:
filepaths = listdir_vsi_recursive(root)
else:
pathname = os.path.join(root, "**", self.filename_glob)
filepaths = [filepath for filepath in glob.iglob(pathname, recursive=True)]
for filepath in filepaths:
# continue on line 366 in the original code |
Note that we technically support this in 0.5.0, although the user has to manually pass in a list of files. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
torchgeo/torchgeo/datasets/geo.py
Lines 363 to 367 in 9e57f27
GDAL virtual file systems such as reading directly from Google Buckets (
/vsigs/
) are natively supported by rasterio (through gdal).The glob-matching (source code linked above) is the only thing stopping this currently.
What do you think the best way is to do this? My initial guess is that supporting the glob-matching for all the different file systems would take some effort.
The quickest fix (for me at least) would be to add an optional parameter
filenames:List
that is iterated, and the (already existing) try/except would handle if the filename is wrong.The text was updated successfully, but these errors were encountered: