Visualizing ASAP targets

A key aspect of communicating computational chemistry is having easy to interpret visual aids that assist decision making. To this end we have developed easy ways to visualize ASAP targets in portable and easy to interpret ways.

This includes visualizing protein-ligand conformations, molecular dynamics simulations and viral fitness data.

HTML views of protein-ligand conformations

Protein-ligand conformations are central to the drug design DMTA cycle and need to be viewed quickly and in large numbers. To this end we developed a portable interactive HTML representation of protein-ligand conformations for our targets based on 3DMol that can easily be shared between team members and outside collaborators, embedded into various platforms and hosted on cloud repositories.

To make one of these HTML representations, follow the steps below!

[1]:
# import some dependencies

from drugforge.data.backend.openeye import oechem
from drugforge.data.schema.complex import Complex, PreppedComplex
from drugforge.data.schema.ligand import Ligand
from drugforge.data.testing.test_resources import fetch_test_file
from drugforge.dataviz.html_viz import HTMLVisualizer
from drugforge.docking.docking import DockingInputPair
from drugforge.docking.openeye import POSITDocker, POSITDockingResults
from drugforge.simulation.simulate import SimulationResult
from IPython.display import display, HTML, IFrame

To learn more about how the base level abstractions such as Ligand, Complex etc work, it is reccomended to run through the working_with_data tutorial (see Tutorial index).

We have designed the Visualization module (and others) so that they work seamlessly with multiple levels of abstraction. Here we will be exploring making HTML renders from a PDB file, an in-memory Complex object and from a set of docking results. This gives flexibility to work with data that is more or less structured with ease.

From a PDB file

[2]:
protein = fetch_test_file(
    "Mpro-P2660_0A_bound-prepped_complex.pdb"
)  # fetch a PDB file from the test suite, in this case a PDB from the COVID MOONSHOT.

We will use the HTMLVisualizer factory class to create our renders. Let’s inspect its arguments:

[3]:
HTMLVisualizer?
Init signature:
HTMLVisualizer(
    *,
    target: asapdiscovery.data.services.postera.manifold_data_validation.TargetTags,
    color_method: asapdiscovery.dataviz.html_viz.ColorMethod = <ColorMethod.subpockets: 'subpockets'>,
    debug: bool = False,
    write_to_disk: bool = True,
    output_dir: pathlib.Path = 'html',
    align: bool = True,
    fitness_data: Optional[Any] = None,
    fitness_data_logoplots: Optional[Any] = None,
    reference_protein: Optional[Any] = None,
) -> None
Docstring:
Class for generating HTML visualizations of poses.

The main method is `visualize`, which takes a list of inputs and returns a list of HTML strings or writes them to disk, optionally a list of output paths can
be provided.

The `visualize` is heavily overloaded and can take a list of `DockingResult`, `Path`, `Complex`, or a tuple of `Complex` and a list of `Ligand`.

Parameters
----------
target : TargetTags
    Target to visualize poses for
color_method : ColorMethod
    Protein surface coloring method. Can be either by `subpockets` or `fitness`
debug : bool
    Whether to run in debug mode
write_to_disk : bool
    Whether to write the HTML files to disk or return them as strings
output_dir : Path
    Output directory to write HTML files to
align : bool
    Whether to align the poses to the reference protein
Init docstring:
Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.
File:           ~/Desktop/asap/asapdiscovery/asapdiscovery-dataviz/asapdiscovery/dataviz/html_viz.py
Type:           ModelMetaclass
Subclasses:
  • We need to provide an ASAP target for the target argument, e.e SARS-CoV-2-Mpro

  • We would like to colour by subpocket (more on other options later)

  • We would like to align to a canonical reference structure align=True

  • For the purposes of this notebook we will write to a folder called html

[4]:
# create a visualization factory.
html_vizualizer = HTMLVisualizer(
    target="SARS-CoV-2-Mpro",
    color_method="subpockets",
    align=True,
    output_dir="tutorial_files/visualizing_asap_targets/",
    write_to_disk=True,
)

Fantastic! Ok now let’s run our renders, passing in our list of inputs. We can optionally use dask to parallelize over our list of inputs for higher performance. This is important when dealing with lots of structures or inputs, but should be unnessecary for now.

[5]:
# create our visualizations, explicitly specifying an output path
vizs = html_vizualizer.visualize(
    inputs=[protein], outpaths=["subpockets_render.html"], use_dask=False
)
2024-05-10 10:39:08,032 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 10:39:08,032 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 10:39:08,032 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 10:39:08,032 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 10:39:08,251 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmpwk7rm22z/
[6]:
vizs  # result is a dataframe
[6]:
ligand_id target_id SMILES html_path_pose
0 Mpro-P2660_0A_bound-prepped_complex_ligand Mpro-P2660_0A_bound-prepped_complex_target CNC(=O)CN1C[C@]2(CCN(C2=O)c3cncc4c3cc(cc4)Cl)c... tutorial_files/visualizing_asap_targets/subpoc...

Ok now we have our render in memory, lets try and display it in this notebook!

[7]:
from IPython.display import IFrame

IFrame(vizs["html_path_pose"][0], 1000, 1000)
[7]:

Wow! Very cool, we now have an interactive way to view ligand-protein complexes of ASAP targets, annotated with key interactions and important protein subpockets for the target of interest. Our medicinal chemists find this very useful for quickly viewing key interactions in docked virtual designs and crystal structures.

From an in-memory Complex representation.

We can follow similar steps to render an in-memory representation of our ligand to an HTML view.

[8]:
# make a complex
sars_cov_2_complex = Complex.from_pdb(
    protein,
    ligand_kwargs={"compound_name": "Mpro-P2660-bound-target"},
    target_kwargs={"target_name": "Mpro-P2660"},
)
[9]:
# we can re-use our factory from before
vizs = html_vizualizer.visualize(
    inputs=[sars_cov_2_complex], outpaths=["subpockets_from_complex.html"], use_dask=False
)
2024-05-10 10:39:09,241 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 10:39:09,241 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 10:39:09,241 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 10:39:09,241 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 10:39:09,448 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmpr6r76eog/
[10]:
from IPython.display import IFrame

IFrame(vizs["html_path_pose"][0], 1000, 1000)
[10]:

Note that you can also easily open it with your web browser. e.g google-chrome render.html

Docking a new structure!

We have shown pre-prepared examples here so far. What if we want to dock and visualize a new structure?

Note that docking will not be covered in depth here (see Docking and Scoring tutorial for more information. Lets dock our structure

[11]:
# make the ligand we want to dock, a simple alkane
ligand = Ligand.from_smiles("CCCCCCC", compound_name="alkane")
[12]:
# prepare our structure
prepped_sars_cov_2_complex = PreppedComplex.from_complex(sars_cov_2_complex)
# pair it up with the ligand we want to dock.
docking_input_pair = DockingInputPair(complex=prepped_sars_cov_2_complex, ligand=ligand)
Warning: No BioAssembly transforms found, using input molecule as biounit: DesignUnit Components_LIG
Warning: Iridium - Structure: DesignUnit Components_LIG has no REMARK data
Processing BU # 1 with title: DesignUnit Components_LIG, chains AB
[13]:
# run OpenEye POSIT docking,
docker = POSITDocker(use_omega=False)
results = docker.dock([docking_input_pair], use_dask=False)

# results is a list of POSITDockingResults, lots of info in here
print(results)
[POSITDockingResults(type='POSITDockingResults', input_pair=DockingInputPair(complex=PreppedComplex(target=PreppedTarget(target_name='Mpro-P2660', ids=None, data_format=<DataStorageType.b64oedu: 'b64oedu'>, target_hash='2353f6855b9359b5c6693a8e1dccd24b33c634f839f72d192b68e55b0e7d78b5'), ligand=Ligand(compound_name='Mpro-P2660-bound-target', ids=None, provenance=LigandProvenance(isomeric_smiles='CNC(=O)CN1C[C@]2(CCN(C2=O)c3cncc4c3cc(cc4)Cl)c5cc(ccc5C1=O)Cl', inchi='InChI=1S/C24H20Cl2N4O3/c1-27-21(31)12-29-13-24(19-9-16(26)4-5-17(19)22(29)32)6-7-30(23(24)33)20-11-28-10-14-2-3-15(25)8-18(14)20/h2-5,8-11H,6-7,12-13H2,1H3,(H,27,31)/t24-/m1/s1', inchi_key='JZJCSVMJFIAMQB-XMMPIXPASA-N', fixed_inchi='InChI=1/C24H20Cl2N4O3/c1-27-21(31)12-29-13-24(19-9-16(26)4-5-17(19)22(29)32)6-7-30(23(24)33)20-11-28-10-14-2-3-15(25)8-18(14)20/h2-5,8-11H,6-7,12-13H2,1H3,(H,27,31)/t24-/m1/s1/f/h27H', fixed_inchikey='JZJCSVMJFIAMQB-DLYUOGNHNA-N'), experimental_data=None, expansion_tag=None, tags={}, conf_tags={}, data_format=<DataStorageType.sdf: 'sdf'>)), ligand=Ligand(compound_name='alkane', ids=None, provenance=LigandProvenance(isomeric_smiles='CCCCCCC', inchi='InChI=1S/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3', inchi_key='IMNFDUFMRHMDMM-UHFFFAOYSA-N', fixed_inchi='InChI=1/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3', fixed_inchikey='IMNFDUFMRHMDMM-UHFFFAOYNA-N'), experimental_data=None, expansion_tag=None, tags={}, conf_tags={}, data_format=<DataStorageType.sdf: 'sdf'>)), posed_ligand=Ligand(compound_name='alkane', ids=None, provenance=LigandProvenance(isomeric_smiles='CCCCCCC', inchi='InChI=1S/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3', inchi_key='IMNFDUFMRHMDMM-UHFFFAOYSA-N', fixed_inchi='InChI=1/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3', fixed_inchikey='IMNFDUFMRHMDMM-UHFFFAOYNA-N'), experimental_data=None, expansion_tag=None, tags={'docking-confidence-POSIT': 0.019999999552965164, '_POSIT_method': 'FRED'}, conf_tags={'compound_name': ['alkane'], 'provenance': ['{"isomeric_smiles": "CCCCCCC", "inchi": "InChI=1S/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3", "inchi_key": "IMNFDUFMRHMDMM-UHFFFAOYSA-N", "fixed_inchi": "InChI=1/C7H16/c1-3-5-7-6-4-2/h3-7H2,1-2H3", "fixed_inchikey": "IMNFDUFMRHMDMM-UHFFFAOYNA-N"}'], 'data_format': ['sdf'], 'docking-confidence-POSIT': ['0.019999999552965164'], '_POSIT_method': ['FRED']}, data_format=<DataStorageType.sdf: 'sdf'>), probability=0.019999999552965164, provenance={'oechem': '20230910', 'oeomega': '20230910', 'oedocking': '20230910'})]
[14]:
vizs_from_docked = html_vizualizer.visualize(
    inputs=results, outpaths=["subpockets_from_docked.html"], use_dask=False
)
2024-05-10 10:39:55,212 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 10:39:55,212 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 10:39:55,212 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 10:39:55,212 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 10:39:55,387 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmplo04f74m/
[15]:
from IPython.display import IFrame

IFrame(vizs_from_docked["html_path_pose"][0], 1000, 1000)
[15]:

We can see our alkane was docked nicely to the active site!

Note that for embedding into applications you can also set write_to_disk=False to get the raw HTML string, for example

[16]:
# create a visualization factory.
html_vizualizer = HTMLVisualizer(
    target="SARS-CoV-2-Mpro",
    color_method="subpockets",
    align=True,
    write_to_disk=False,
)

vizs_from_docked_raw = html_vizualizer.visualize(
    inputs=results, use_dask=False
)
2024-05-10 10:39:57,096 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 10:39:57,096 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 10:39:57,096 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 10:39:57,096 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 10:39:57,257 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmpes4h5c1i/
[17]:
vizs_from_docked_raw
[17]:
0
0 <!DOCTYPE HTML>\n<html lang="en">\n <head>\n ...

HTML views of protein-ligand conformations with fitness data

ASAP’s targets are viral protein, and thus are highly mutable. An effective therapeutic must not only bind to the predominant variant currently circulating, but also regions of accessible sequence space. For this reason it is beneficial to select for interactions with highly conserved residues.

ASAP has worked with the Bloom lab to obtain Deep Mutational Scanning (DMS) data for SARS-CoV-2-Mpro (https://doi.org/10.1101/2023.01.30.526314) and SARS-CoV-2-Mac1 (https://doi.org/10.1101/2023.01.30.526314) which can be visualized on the 3D protein structure to inform medicinal chemists if designed compounds are interacting with conserved or non-conserved residues.

These vizualisations also contain logoplots that can inform the viewer about the sequence space for each residue.

We are in the process of spinning out this fitness viewer in a self contained package called choppa (https://github.com/asapdiscovery/choppa), so watch this space!

You can easily make these visualizations by setting the color_method keyword to fitness

Residues highlighted in red are highly mutable, white are less mutable and blue are missing data.

[18]:
# create a visualization factory.
html_vizualizer = HTMLVisualizer(
    target="SARS-CoV-2-Mpro",
    color_method="fitness",
    align=True,
    write_to_disk=True,
    output_dir="tutorial_files/visualizing_asap_targets/",

)

vizs_from_docked_fitness = html_vizualizer.visualize(
    inputs=results, outpaths=["fitness_from_docked.html"], use_dask=False
)
/Users/hugomacdermott/Desktop/asap/asapdiscovery/asapdiscovery-dataviz/asapdiscovery/dataviz/html_viz.py:815: UserWarning: Warning: no unfit residues found for residue 108 in chain A.
  warnings.warn(
2024-05-10 10:40:16,776 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 10:40:16,776 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 10:40:16,776 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 10:40:16,777 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 10:40:16,957 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmpz9xu9gin/
[19]:
from IPython.display import IFrame

IFrame(vizs_from_docked_fitness["html_path_fitness"][0], 1000, 1000)
[19]:

Working with your own fitness data

Currently the process to add fitness data to the drugforge repo is convolouted and labour intensive. It is better to use our prototype standalone fitness renderer choppa. We are really excited about choppa which makes it easy to work with your own fitness data and render HTML and PyMol views easily.