drugforge.ml.dataset.GroupedDockedDataset
- class drugforge.ml.dataset.GroupedDockedDataset(*args: Any, **kwargs: Any)[source]
Bases:
DatasetVersion of DockedDataset where data is grouped by compound_id, so all poses for a given compound can be accessed at a time.
- __init__(compound_ids: list[str] = [], structures: dict[str, dict] = {}, random_iter=False)[source]
Constructor for GroupedDockedDataset object.
- Parameters:
compound_ids (list[str]) – List of compound ids. Each entry in this list must have a corresponding entry in structures
structures (dict[str, dict]) – Dict mapping compound_id to a pose dict
random_iter (bool, default=False) – Iterate through the dataset randomly each time
Methods
__init__([compound_ids, structures, random_iter])Constructor for GroupedDockedDataset object.
from_complexes(complexes[, exp_dict, ...])Build from a list of Complex objects.
from_files(str_fns, compounds[, ignore_h, ...])- classmethod from_complexes(complexes: list[Complex], exp_dict={}, ignore_h=True, random_iter=False)[source]
Build from a list of Complex objects.
- Parameters:
complexes (list[Complex]) – List of Complex schema objects to build into a DockedDataset object
exp_dict (dict[str, dict[str, int | float]], optional) – Dict mapping compound_id to an experimental results dict. The dict for a compound will be added to the pose representation of each Complex containing a ligand witht that compound_id
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
random_iter (bool, default=False) – Iterate through the dataset randomly each time
- Return type:
- classmethod from_files(str_fns, compounds, ignore_h=True, extra_dict=None, num_workers=1, random_iter=False)[source]
- Parameters:
str_fns (list[str]) – List of paths for the PDB files. Should correspond 1:1 with the names in compounds
compounds (list[tuple[str]]) – List of (crystal structure, ligand compound id)
ignore_h (bool, default=True) – Whether to remove hydrogens from the loaded structure
extra_dict (dict[str, dict], optional) – Extra information to add to each structure. Keys should be compounds, and dicts can be anything as long as they don’t have the keys [“z”, “pos”, “lig”, “compound”]
num_workers (int, default=1) – Number of cores to use to load structures
random_iter (bool, default=False) – Iterate through the dataset randomly each time