Data Generators

DataGenerator

class sconce.data_generators.DataGenerator(data_loader)[source]

A thin wrapper around a DataLoader that automatically yields tuples of torch.Tensor (that live on cpu or on cuda). A DataGenerator will iterate endlessly.

Like the underlying DataLoader, a DataGenerator’s __next__ method yields two values, which we refer to as the inputs and the targets.

Parameters:data_loader (DataLoader) – the wrapped data_loader.
batch_size

the wrapped data_loader’s batch_size

cuda(device=None)[source]

Put the inputs and targets (yielded by this DataGenerator) on the specified device.

Parameters:device (int or bool or dict) – if int or bool, sets the behavior for both inputs and targets. To set them individually, pass a dictionary with keys {‘inputs’, ‘targets’} instead. See torch.Tensor.cuda() for details.

Example

>>> g = DataGenerator.from_dataset(dataset, batch_size=100)
>>> g.cuda()
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 0)])
>>> g.cuda(False)
>>> g.next()
(Tensor containing:
 [torch.FloatTensor of size 100x1x28x28],
 Tensor containing:
 [torch.LongTensor of size 100])
>>> g.cuda(device={'inputs':0, 'targets':1})
>>> g.next()
(Tensor containing:
 [torch.cuda.FloatTensor of size 100x1x28x28 (GPU 0)],
 Tensor containing:
 [torch.cuda.LongTensor of size 100 (GPU 1)])
dataset

the wrapped data_loader’s Dataset

classmethod from_dataset(dataset, **kwargs)[source]

Create a DataGenerator from an instantiated dataset.

Parameters:
  • dataset (Dataset) – the pytorch dataset.
  • **kwargs – passed directly to the DataLoader) constructor.
classmethod from_pytorch(batch_size=500, data_location=None, dataset_class=<class 'torchvision.datasets.mnist.MNIST'>, fraction=1.0, num_workers=0, pin_memory=True, shuffle=True, train=True, transform=ToTensor())[source]

Note

This method is deprecated as of 0.8.0, and will be removed in 0.9.0.

Create a DataGenerator from a torchvision dataset class.

Parameters:
  • batch_size (int) – how large the yielded inputs and targets should be. See DataLoader for details.
  • data_location (path) – where downloaded dataset should be stored. If None a system dependent temporary location will be used.
  • dataset_class (class) – a torchvision dataset class that supports constructor arguments {‘root’, ‘train’, ‘download’, ‘transform’}. For example, MNIST, FashionMnist, CIFAR10, or CIFAR100.
  • fraction (float) – (0.0 - 1.0] how much of the original dataset’s data to use.
  • num_workers (int) – how many subprocesses to use for data loading. See DataLoader for details.
  • pin_memory (bool) – if True, the data loader will copy tensors into CUDA pinned memory before returning them. See DataLoader for details.
  • shuffle (bool) – set to True to have the data reshuffled at every epoch. See DataLoader for details.
  • train (bool) – if True, creates dataset from training set, otherwise creates from test set.
  • transform (callable) – a function/transform that takes in an PIL image and returns a transformed version.
num_samples

the len of the wrapped data_loader’s Dataset

real_dataset

the wrapped data_loader’s Dataset reaching through any Subsets

reset()[source]

Start iterating through the data_loader from the begining.

ImageDataGenerator

class sconce.data_generators.ImageDataGenerator(*args, **kwargs)[source]

A DataGenerator class with some handy methods for image type data.

New in 0.7.0

classmethod from_image_folder(root, loader_kwargs=None, **dataset_kwargs)[source]

Create a DataGenerator from a folder of images. See torchvision.datasets.ImageFolder.

Parameters:
  • root (path) – the root directory path.
  • loader_kwargs (dict) – keyword args provided to the DataLoader constructor.
  • **dataset_kwargs – keyword args provided to the torchvision.datasets.ImageFolder constructor.
classmethod from_torchvision(batch_size=500, data_location=None, dataset_class=<class 'torchvision.datasets.mnist.MNIST'>, fraction=1.0, num_workers=0, pin_memory=True, shuffle=True, train=True, transform=ToTensor())[source]

Create a DataGenerator from a torchvision dataset class.

Parameters:
  • batch_size (int) – how large the yielded inputs and targets should be. See DataLoader for details.
  • data_location (path) – where downloaded dataset should be stored. If None a system dependent temporary location will be used.
  • dataset_class (class) – a torchvision dataset class that supports constructor arguments {‘root’, ‘train’, ‘download’, ‘transform’}. For example, MNIST, FashionMnist, CIFAR10, or CIFAR100.
  • fraction (float) – (0.0 - 1.0] how much of the original dataset’s data to use.
  • num_workers (int) – how many subprocesses to use for data loading. See DataLoader for details.
  • pin_memory (bool) – if True, the data loader will copy tensors into CUDA pinned memory before returning them. See DataLoader for details.
  • shuffle (bool) – set to True to have the data reshuffled at every epoch. See DataLoader for details.
  • train (bool) – if True, creates dataset from training set, otherwise creates from test set.
  • transform (callable) – a function/transform that takes in an PIL image and returns a transformed version.
get_summary_df()[source]

Return a pandas dataframe that summarizes the image metadata in the dataset.

num_channels

The number of image channels, based on looking at the first image in the dataset.

plot_label_summary()[source]

Generate a barchart showing how many images of each label there are.

plot_size_summary()[source]

Generate a scatter plot showing the sizes of the images in the dataset.