This page was generated from /home/docs/checkouts/readthedocs.org/user_builds/asapdiscovery/checkouts/latest/docs/_collections/notebooks/running_md_simulations.ipynb.

Running MD simulations

Running vanilla molecular dynamics (MD) simulations on protein-ligand complexes can provide information about the dynamics of a given ligand inside the binding site and can be used to assess the quality and stability of a proposed binding mode.

Below we will demonstrate an example of running MD simulations on a protein ligand complex, following docking, using the drugforge framework.

We will be running this example on an ASAP target, the SARS-CoV-2 nsp3 Mac1 macrodomain; this removes ADP ribose from viral and host cell proteins. The removal of this post-translational modification reduces the inflammatory and antiviral responses to infection — facilitating replication. For more information on Mac1, follow this link.

Required modules

To execute this example, the following drugforge modules to be installed:

data,
docking,
modeling
simulation

To enable the visualization of the docking results, the following modules will also need to be installed:

dataviz
spectrum.

Note this example requires users have an OpenEye license.

Please refer to the installation instructions for more details.

[1]:

# import some dependencies
from drugforge.data.testing.test_resources import fetch_test_file
from drugforge.docking.docking import DockingInputPair
from drugforge.docking.openeye import POSITDocker
from drugforge.data.schema.complex import Complex, PreppedComplex
from drugforge.data.schema.ligand import Ligand
from drugforge.simulation.simulate import VanillaMDSimulator

Docking an arbitrary ligand to Mac1

Let us fetch an example structure of Mac1, hosted in the ASAP testing repository. For convenience, we will utilize the fetch_test_file function (part of the data module) to download this file and return its location.

[2]:

# fetch a PDB file from the test suite, in this case a PDB from the COVID MOONSHOT.
protein_pdb_file = fetch_test_file("SARS2_Mac1A-A1013.pdb")

# print out the location of the pdb file we just downloaded.
print(protein_pdb_file)

/Users/hugomacdermott/Library/Caches/asapdiscovery_testing/SARS2_Mac1A-A1013.pdb

Now we will create a Complex object from the .pdb file (see the tutorial on base level ASAP abstractions for more info)

[3]:

# make a complex
mac1_complex = Complex.from_pdb(protein_pdb_file, ligand_kwargs={"compound_name": "A1013"}, target_kwargs={"target_name": "SARS2_Mac1A"})

We will need to create a Ligand object to dock into the structure. In this case we will generate the ligand from a SMILES string. Note, in addition to SMILES, the ligands can also be created from InChI strings, SDF files, and OEMol instances.

[4]:

# make the ligand we want to dock, a simple alkane
ligand = Ligand.from_smiles("CCCCCCC", compound_name="alkane")

Next, we will run protein preparation.

[5]:

# prepare our structure
prepped_mac1_complex_complex = PreppedComplex.from_complex(mac1_complex)
# pair it up with the ligand we want to dock.
docking_input_pair = DockingInputPair(complex=prepped_mac1_complex_complex, ligand=ligand)

Warning: No BioAssembly transforms found, using input molecule as biounit: ---_LIG
DPI: 0.12, RFree: 0.28, Resolution: 1.48
Processing BU # 1 with title: ---_LIG, chains AB

Now, we dock it to our protein. Note, use_dask is set to False, disabling parallel execution, as it is not required for this example (Note, dask is discussed at the end of this tutorial).

[6]:

# run OpenEye POSIT docking,
docker = POSITDocker(use_omega=False)
results = docker.dock([docking_input_pair], use_dask=False)

[7]:

print(results[0].posed_ligand.tags)

{'docking-confidence-POSIT': 0.019999999552965164, '_POSIT_method': 'FRED'}

Vizualise the docked pose

Let us now vizualise our results! For more information on vizualisations, see the vizualisation notebook hosted in the examples directory.

[8]:

# create a visualization factory.
from drugforge.dataviz.html_viz import HTMLVisualizer

html_vizualizer = HTMLVisualizer(
        target="SARS-CoV-2-Mac1",
        color_method="subpockets",
        align=True,
        output_dir="tutorial_files/running_md_simulations/",
        write_to_disk=True,
    )
vizs_from_docked =  html_vizualizer.visualize(inputs=results, outpaths=["visualise_docked.html"], use_dask=False)

2024-05-10 13:03:23,875 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 13:03:23,875 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 13:03:23,875 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 13:03:23,875 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 13:03:24,022 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmpsqm6hmwx/

[9]:

from IPython.display import IFrame
IFrame(vizs_from_docked["html_path_pose"][0], 1000, 1000)

[9]:

Running a single MD simulation

Great, our pose looks good! Lets use the VanillaMDSimulator to run simulations of the protein-ligand complex.

The VanillaMDSimulator has many options for running simulations in different configurations, however a basic configuration should be sufficient for this example. For the purposes of this tutorial we will keep the simulations very, very short (simulation time <1 minute for a single simulation on a typical GPU-enabled computer).

[10]:

VanillaMDSimulator?

Init signature:
VanillaMDSimulator(
    *,
    output_dir: pathlib.Path = 'md',
    debug: bool = False,
    collision_rate: pydantic.types.PositiveFloat = 1,
    openmm_logname: str = 'openmm_log.tsv',
    openmm_platform: asapdiscovery.simulation.simulate.OpenMMPlatform = <OpenMMPlatform.Fastest: 'Fastest'>,
    temperature: pydantic.types.PositiveFloat = 300,
    pressure: pydantic.types.PositiveFloat = 1,
    timestep: pydantic.types.PositiveFloat = 4,
    equilibration_steps: pydantic.types.PositiveInt = 5000,
    reporting_interval: pydantic.types.PositiveInt = 1250,
    num_steps: pydantic.types.PositiveInt = 2500000,
    rmsd_restraint: bool = False,
    rmsd_restraint_atom_indices: list[int] = [],
    rmsd_restraint_type: Optional[str] = None,
    rmsd_restraint_force_constant: pydantic.types.PositiveFloat = 50,
    truncate_steps: bool = True,
    small_molecule_force_field: str = 'openff-2.2.0',
    **extra_data: Any,
) -> None
Docstring:      Base class for Simulators.
Init docstring:
Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.
File:           ~/Desktop/asap/asapdiscovery/asapdiscovery-simulation/asapdiscovery/simulation/simulate.py
Type:           ModelMetaclass
Subclasses:

Let us set up the simulator:

[11]:

md_simulator = VanillaMDSimulator(
            output_dir="tutorial_files/running_md_simulations/",
            equilibration_steps=1,
            num_steps=1,
            reporting_interval=1)

To run the simulation, we will pass the output from the docking performed early (saved as results) to the simulation function of the VanillaMDSimulation instance. This will launch a simulation using the OpenMM simulation package.

[12]:

simulation_results = md_simulator.simulate(
            results,
            use_dask=False)

This function returns a list, where each entry contains an instance of SimulationResult corresponding to each of the simulations executed. In this case, we only ran a single simulation, and to query the output we just need to set the index to be 0. For example, let us check to see if the simulation completed successfully:

[13]:

simulation_results[0].success

[13]:

True

The simulator makes unique paths for the resulting simulations, which can be access via the simulation_results output. The pathes that are returned are relevative to the directory where the simulation was launched. Note, future releases will add additional flexibility with regards to simulation output.

[14]:

print(simulation_results[0].traj_path)

tutorial_files/running_md_simulations/SARS2_Mac1A-b27f22555232d2d68273612ffce5a119d6e22526d95ce3eb0db9012632bcdaf6+FHHVXLFEHODNRQ-XCZWEQHLNA-M_alkane-IMNFDUFMRHMDMM-UHFFFAOYNA-N/traj.xtc

[15]:

print(simulation_results[0].final_pdb_path)

tutorial_files/running_md_simulations/SARS2_Mac1A-b27f22555232d2d68273612ffce5a119d6e22526d95ce3eb0db9012632bcdaf6+FHHVXLFEHODNRQ-XCZWEQHLNA-M_alkane-IMNFDUFMRHMDMM-UHFFFAOYNA-N/final.pdb

Running multiple simulations in parallel with dask-cuda

We can pass a series of DockingResults to the VanillaMDSimulator and have dask-cuda parallelize work over available GPU resources (see here).

For this we will need a LocalCUDACluster. You will only see parallelism here if you have more than one GPU, e.g on an HPC cluster, otherwise, the simulations will run sequentially.

NOTE dask_cuda is not available for MacOS computers.

[ ]:

# create a dask_cuda LocalCUDACluster
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
gpu_client = Client(cluster)

[ ]:

# drugforge provides a convenience function to do this
from drugforge.data.util.dask_utils import DaskType, make_dask_client_meta
gpu_client = make_dask_client_meta(DaskType.LOCAL_GPU)

We can see by inspecting the signature of the simulate method that it can accept a dask Client

[ ]:

VanillaMDSimulator.simulate?

Lets extend the results list of inputs so that these could be run in parallel; this will result in 3 simulation instances.

[ ]:

results_par = results + results + results # duplicate X3

[ ]:

md_simulator = VanillaMDSimulator(
            output_dir="tutorial_files/running_md_simulations/",
            equilibration_steps=1,
            num_steps=1,
            reporting_interval=1)

simulation_results_parallel = md_simulator.simulate(
            results_par,
            dask_client=gpu_client,
            failure_mode="skip",
            use_dask=True)

The simulation results are stored in a list, where the outputs for each simulation can be accessed in the same way as demonstrated above:

[ ]:

for i, sim_result in enumerate(simulation_results_parallel):
    print(f"simulation {i}: ", sim_result.traj_path)

[ ]: