Skip to content

Lazy loading, on-the-fly TorchGeo dataset creator method for ML usecases #1

@print-sid8

Description

@print-sid8

I imagine it would be useful to to have TorchGeo Dataset creator, for ML model training, and inference, by being able to create image chips of satellite data.

Thinking it might look something like this -

from rasteret import Rasteret

# Create regular collection
processor = Rasteret(
    custom_name="sentinel2",
    data_source="sentinel-2-l2a" 
)

collection = processor.create_collection(
    bbox=[10.1, 45.5, 10.5, 45.8],
    date_range=["2023-01-01", "2023-12-31"]
)

# Convert to ML dataset
dataset = collection.to_ml_dataset(
    chip_size=256,
    bands=["B02", "B03", "B04", "B08"],  # RGB + NIR
    geometries=[aoi_polygon]  # Optional
)

# Use with PyTorch/torchgeo
from torch.utils.data import DataLoader
loader = DataLoader(dataset, batch_size=32)


# Load trained model
model = torch.load("path/to/model.pth")
model.eval()


# Run inference
predictions = []
with torch.no_grad():
    for batch in loader:
        pred = model(batch)
        predictions.append(pred)

TorchGeo GeoDatasets and most of its other Classes already work with remote COGs. Im going to attempt to create the TorchGeo dataset via Rasteret to see if it makes it even faster or not.

Would love to hear thoughts on this.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions