Utils
- tka.utils.docs()[source]
Upon calling the function, the website with the documentation will pop up on screen.
- tka.utils.is_valid_smiles(smiles: str)[source]
Returns True if SMILES representations is valid and False otherwise.
- tka.utils.load_l1000_ordered_feature_columns(gene_id)[source]
Loads L1000 ordered features in a list format based on the specified gene_id
- Parameters:
gene_id (str) – one of “affyID”, “ensemblID” or “entrezID”
- Raises:
ValueError – If either gene_id is not of the allowed probes.
- Returns:
L1000 ordered features in a list format based on the specified gene_id
- Return type:
list
- tka.utils.load_mobc_ordered_feature_columns(model_id: str = '2023-02-mobc-es-op')[source]
Loads cell morphology ordered features in a list format. Currently all models use CellProfiler features.
- Parameters:
model_id (str) – One of [“2023-02-mobc-es-op”, “2023-01-mobc-es-op”, “2021-02-mobc-es-op”, “2024-01-mobc-es-op”].
- tka.utils.transform_l1000_ids(from_id, to_id, gene_ids, dataset_path='l1000_mapped.csv', ignore_missing=False) Dict [source]
Transforms L1000 gene IDs from one format to another.
- Parameters:
from_id (str) – The source probe type (“affyID”, “entrezID”, “ensemblID”).
to_id (str) – The target probe type (“affyID”, “entrezID”, “ensemblID”).
gene_ids (list) – List of L1000 gene IDs to transform.
dataset_path (str) – Path to the DataFrame containing L1000 gene IDs for each probe type.
ignore_missing (bool) – If set to True, it will not raise an error on missing or invalid probe IDs.
- Raises:
ValueError – If either from_id or to_id is not one of the allowed values.
ValueError – If any of the gene IDs in the dataset is not within the scope of L1000.
- Returns:
Original and transformed L1000 gene IDs as keys and values respectively.
- Return type:
dict
- tka.utils.transform_moshkov_outputs(identifier_col_vals: List[str], output: List[List], model_id: str, auc_threshold: float = 0.0, use_full_assay_names: bool = False) DataFrame [source]
Transform Moshkov outputs into a Pandas DataFrame.
- Parameters:
identifier_col_vals (List[str]) – List of id strings corresponding to input data points (or any other identifiers).
output (List[List[]]) – List of lists containing output data (shape: X, 270).
auc_threshold (float, optional) – If supplied, assays whose prediction accuracies are lower than auc_threshold, will be dropped. Allowed auc_threshold values are any floating point values between 0.5 and 1.0.
model_id (str) – One of [“2023-02-mobc-es-op”, “2023-01-mobc-es-op”, “2021-02-mobc-es-op”, “2024-01-mobc-es-op”].
use_full_assay_names (bool, optional) – Whether to use full assay names from the CSV. Defaults to False.
- Returns:
df with identifier_col_vals as the first column and assay data columns.
- Return type:
pd.DataFrame