antisplodge package¶
AntiSplodge functions¶
-
class
antisplodge.
CelltypeDeconvolver
(num_feature, num_class, number_of_layers_per_part, first_part_size, second_part_size, last_part_size, out_part_size, input_dropout, normalize_output=False)[source]¶ Bases:
torch.nn.modules.module.Module
A class extending the nn.Module from pytorch.
- Parameters
num_feature (int) – Number of input elements/features (usually gene-based).
num_class (int) – Number of output elements, usually cell types or similr classes.
number_of_layers_per_part (int) – Number of hidden layers per layer block/part.
first_part_size (int) – Number of neurons per layer in the first block/part.
second_part_size (int) – Number of neurons per layer in the second block/part.
last_part_size (int) – Number of neurons per layer in the last block/part.
out_part_size (int) – Number of neurons in the last layer immediate before the final output layer.
input_dropout (float) – Dropout in the input layer, used to simulate spareness or missing genes during training.
normalize_output (bool) – Normalize output by scaling each tensor to 1, directly from the model and before computing the error. This sometimes speeds up the training for datasets with low number of classes.
-
Get
(key)[source]¶ Used to retrieve members in the class’s member dictionary.
- Parameters
key (str) – Key in the member dictionary.
- Returns
Returns the value of the member.
- Return type
Anything
-
Set
(key, val)[source]¶ Used to store members in the class’s member dictionary.
- Parameters
key (str) – Key in the member dictionary.
val (Anything) – Value to be stored.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
¶
-
class
antisplodge.
DeconvolutionExperiment
(SC)[source]¶ Bases:
object
A deconvolution experiment class used to keep track of everything that is required to do a full AntiSplodge experiment.
- Parameters
SC (AnnData) – A single-cell dataset, formatted as an AnnData object.
-
generateTrainTestValidation
(num_profiles, CD)[source]¶ Generate training, testing, and, validation profiles. This function will call multinomialSampler, getConvolutedProfilesFromDistributions, and, getProportionFromCountVector, in that order, for each dataset. This create members: X_train_counts, X_val_counts, X_test_counts, X_train, X_val, X_test, Y_train, Y_val, Y_test, Y_train_prop, Y_val_prop, Y_test_prop, num_features.
- Parameters
num_profiles (list of ints, length = 3) – A list of lengths 3, controlling the number of profiles used for training, testing, and, validation (index 0, 1, and, 2, respectively).
CD (list of ints, length = 2) – A list of lengths 2, controlling the number of cell densities used (index 0 is the minimum number of CDs, and index 1 is the maximum number of CDs). The same CD will be used for the training, testing, and, validation dataset, respectively.
-
loadCheckpoint
(checkpoint)[source]¶ Loads a checkpoint file (.pt) containing the state of a neural network onto the model member variable.
- Parameters
checkpoint (str) – The path to the checkpoint file
-
setCellTypeColumn
(name)[source]¶ Column in the SC dataset, that holds the cell types. This create members: celltypes_column, celltypes, num_classes.
- Parameters
name (str) – Name (key) of the column.
-
setVerbosity
(verbose)[source]¶ Sets the verbosity level of the prints of the experiment, either True or False.
- Parameters
verbose (bool) – Verboisty of the prints (True or False), this is False when the experiment is inititalized.
-
setupDataLoaders
(batch_size=1000)[source]¶ Will process the profiles generated by the generateTrainTestValidation method into ready-to-use data loaders. This create members: train_loader, val_loader, test_loader.
- Parameters
batch_size (int (1000, optional)) – The number of samples in each batch, defaults to 1000
-
setupModel
(cuda_id=1, dropout=0.33, fps=512, sps=256, lps=128, ops=64, lp=1, normalize_output=False)[source]¶ Initialize the feed forward neural network model. We recommend about half number of nodes per part for each subsequent layer part. The first layer should be smaller than the input. Check out the member variable num_features. This create members: model, device.
- Parameters
cuda_id (int (or "cpu") (1, optional)) – The id of the CUDA device, this can be either an int for the id or “cpu” (to use CPU device), defaults to 1
dropout (float (0.33, optional)) – [ParamDescription], defaults to 0.33
fps (int (512, optional)) – Nodes for each layer for the first part/block, defaults to 512
sps (int (256, optional)) – Nodes for each layer for the second part/block, defaults to 256
lps (int (128, optional)) – Nodes for each layer for the last part/block, defaults to 128
ops (int (64, optional)) – Number of nodes in the last hidden layer just before the output layer, defaults to 64
lp (int (1, optional)) – Layers per part/block, defaults to 1
-
setupOptimizerAndCriterion
(learning_rate=0.001, optimizer=None, criterion=None)[source]¶ Set the optimizer and criterion, and bind it to the model. This create members: optimizer, criterion.
- Parameters
learning_rate (float (0.001, optional)) – The learning rate of the optimizer, if you supply another optimizer, remember to set it yourself, defaults to 0.001
optimizer (Pytorch optimizer (None, optional)) – The neural network optimizer, defaults to None, and will then use pytorch’s optim.Adam.
criterion (Pytorch criterion or loss function (None, optional)) – The neural network criterion, defaults to None, and will then use pytorch’s nn.SmoothL1Loss.
-
splitTrainTestValidation
(train=0.9, rest=0.5)[source]¶ Split the SC dataset into training, validation and test dataset, the splits are strattified on the cell types. This create members: trainIndex, valIndex, testIndex, SC_train, SC_val, SC_test.
- Parameters
train (float (0.9, optional)) – A number between 0 and 1 controlling the proportion of samples used in the training dataset, defaults to 0.9 (90%)
rest (float (0.5, optional)) – A number between 0 and 1 controlling the proportion of samples used in the training dataset (the rest will be in the validation dataset), defaults to 0.5 (A 50%/50% split)
-
class
antisplodge.
SingleCellDataset
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.Dataset
A simple class used to store X and y relations as a paired dataset. We use it to store gene-based profiles (X) that are related to class-based profiles (y). This function is used to store tensors intended to train, validate or test the models generated.
- Parameters
X_data (Tensor) – A tensor where each element is a list of gene counts or gene proportions.
Y_data (Tensor) – A tensor where each element is a list of cell type counts or cell type proportions
-
antisplodge.
getConvolutedProfilesFromDistributions
(adata, cell_types, cell_type_key, distributions, normalize_X=False)[source]¶ A function that converts the profiles generated with multinomialSampler, into gene-based profiles by sampling cells from the SC dataset, corresponding to the number of counts found in each profile.
- Parameters
adata (AnnData) – An AnnData object, this is usually the SC dataset, in the experiment class DeconvolutionExperiment.
cell_types (List) – A ordered list of cell types, found in adata`s `cell_type_key.
cell_type_key (str) – The key/column found in adata, the method will look for in the observations (obs) data frame.
distributions ([ParamType]) – The profiles that should be processed to be convoluted, usually generated using multinomialSampler.
normalize_X (bool (False, optional)) – If True, each convoluted profile is scaled to sum to 1 (assuming cell types already are scaled to 1), defaults to False.
- Returns
A dict containing three lists. X_list, a gene-based list of convoluted profiles, each profile is a list of genes. Y_list, a list of cell types used to produce X_list, each element is class-based list. I_list, a list of indicies, to traceback what cells were used to generate the X_list. Each list is index-based related, so the first element of X_list is related to the first element of Y_list and the first element of I_list.
- Return type
Dict
-
antisplodge.
getMeanJSD
(experiment, split_dataset='test')[source]¶ Get the mean Jensen-Shannon Divergence for one of the split datasets.
- Parameters
split_dataset (String either "train", "validation", or, "test" (default, "test")) – A string indicating which split dataset to use.
- Returns
A float containing the mean JSD.
- Return type
float
-
antisplodge.
getProportionFromCountVector
(Y_list)[source]¶ A function that will convert the count vectors into proportions. This is used to go from count vectors of cell types to proportions of cell types. Each profile will sum to 1.
- Parameters
Y_list (List) – Converts count profiles to proportion profiles.
- Returns
A list of proportion profiles.
- Return type
List
-
antisplodge.
multinomialSampler
(Nc, M, CD_min, CD_max)[source]¶ A multinomial sampler with a temperatured step function, making sampling of classes/cell types go from equally likely to more extreme (singleton-like).
- Parameters
Nc (int) – The number of cell types. Usually in the range of 5-50.
M (int) – The number of profiles generated, for each CD. see CD_min and CD_max for more information.
CD_min (int) – CD is cell density, and it is the measure of how many cells contribute to a particular profile. CD_min is the miniumum number of cells contributing to a profile, and together with CD_max they form a range of CDs, going from CD_min to CD_max.
CD_max (int) – CD is cell density, and it is the measure of how many cells contribute to a particular profile. CD_max is the maximum number of cells contributing to a profile, and together with CD_min they form a range of CDs, going from CD_min to CD_max.
- Returns
Return a list of profiles, with the number of profiles equal to Nc x M x (CD_max - CD_min + 1). Each profile contains a count value (positive integer, including 0) for each class/cell type.
- Return type
List
-
antisplodge.
predict
(experiment, test_loader=None)[source]¶ Predict profiles using the current model found in the experiment, this will test dataset, if test_loader has not been set. You should load a loader yourself if you want to predict spots.
- Parameters
test_loader (Dataloader (None, optional)) – A test_loader with profiles to deconvolute, defaults to None, in which case the test profiles will be used.
- Returns
A list of deconvoluted cell types (profiles).
- Return type
List
-
antisplodge.
train
(experiment, patience=25, save_file=None, auto_load_model_on_finish=True, best_loss=None, validation_metric='jsd')[source]¶ Train the model found in an experiment, this will utilize the train and validation dataset.
- Parameters
patience (int (25, optional)) – Patience counter, the training will stop once a new better loss hasn’t been seen in the last patience epochs, defaults to 25
save_file (str or None (None, optional)) – The file to save the model parameters each time a better setting has been found. This is done each time the validation error is better (lower) than the best seen. Defaults to None, in which case a time-stamped file will be used.
auto_load_model_on_finish (bool (True, optional)) – If the best model settings should be loaded back onto the model when the training stops, defaults to True
best_loss (float or None (None, optional)) – A loss function to beat in order to save the model as the new best, used for warm restarts, defaults to None.
validation_metric (String ("jsd", optional)) – Whether the validation check should be meassured in “jsd” (JSD), or based on loss (!=”jsd”).
- Returns
A dictionary with keys: train_loss and validation_loss, containing the train and validation loss for each epoch.
- Return type
Dict