GMM with Mini-Batches

Hi,

Like [#7](https://github.com/borchero/pycave/issues/7) and [#19](https://github.com/borchero/pycave/issues/19) I am trying to fit a GMM to a large dataset [10^10, 50] and want to (need to) use mini-batching.

However, in contrast to the previous answers, [`gmm.fit`](https://pycave.borchero.com/sites/generated/bayes/gmm/pycave.bayes.GaussianMixture.fit.html) only accpects a `TensorLike` and won't work with my data which is a `torch.utils.data.DataLoader`. Even if I input a `torch.utils.data.Dataset` it only computes a GMM on the first batch.

What is the preferred way to do what I want to do?

Ideally, I would want my code to work like this:

```python
from pycave.bayes import GaussianMixture as GMM
from torch.utils.data import Dataset, DataLoader

data = Data(DATA_PATH).dataloader(batch_size=256)
assert(type(data) == DataLoader)

gmm = GMM(num_components=3, batch_size=256, trainer_params=dict(accelerator='gpu', devices=1))
class_labels = gmm.fit_predict(data)
means, stds = gmm.model_.means, gmm.model_.covariances
```

Manually changing the code in `gmm/estimator.py` (among others) from
```python
num_features = len(data[0])
...
loader = DataLoader(
    dataset_from_tensors(data),
    batch_size=self.batch_size or len(data),
    collate_fn=collate_tensor,
)
is_batch_training = self._num_batches_per_epoch(loader) == 1          # Also, shouldn't this be > anyway?
```

to

```
num_features = data.dataset[0].shape[1]
...
loader = data
is_batch_training = True
```
allows the for error-free fitting and prediction but I am not sure if the output is trustworthy.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GMM with Mini-Batches #51

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GMM with Mini-Batches #51

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions