Using the drugforge-alchemy CLI

The drugforge-alchemy CLI provides a series of automated workflows and convince functions that when combined create and end-to-end pipeline enabling the routine running of state-of-the-art alchemical free energy calculations at (Alchemi)scale! The CLI is designed to get you up and running as quickly as possible and has tried and tested defaults, but also allows you to customise every part of the workflow if required. To build custom workflows see the Alchemy API tutorial which explains the API in detail including the customisation options available. Here we will give a very quick over view of the CLI and how they should be used in production.

drugforge-alchemy Pipeline

The drugforge-alchemy allows for the preparation, planning and prediction of alchemical free energy calculations at scale. Each step of the pipeline can be run via the command line. The commands can be viewed at any time by running:

drugforge-alchemy --help

Now lets walk through a typical application starting with prep.

drugforge-alchemy Prep

Prep offers a pipeline of tools to prepare our ligand series for binding free energy calculations including state enumeration, constrained pose and partial charge generation. To view the default prep workflow we can use the following command to write workflow to file where it can be edited although this is much easier using the API:

drugforge-alchemy prep create -f "prep-workflow.json"

The prep workflow can then be executed on a set of ligands (in a local file smi/sdf) using the following command:

drugforge-alchemy prep run --factory-file "prep-workflow.json"  \
                      --dataset-name "example-dataset"     \
                      --ligands "ligand_file.sdf"          \
                      --receptor-complex "receptor.json"   \
                      --processors 4

or if you use postera you can provide the name of the molecule set to pull the ligands from provided your POSTERA_API_KEY is exported as an environment variable:

drugforge-alchemy prep run --factory-file "prep-workflow.json"   \
                      --dataset-name "example-dataset"      \
                      --postera-molset-name "ligand-series" \
                      --receptor-complex "receptor.json"    \
                      --processors 4

Warning

This feature is highly experimental and it is recommended that you check the reference structure carefully

If you are not sure which reference crystal you would like to use when generating the poses for the ligands you can provide a directory of prepared structures using the drugforge-prep CLI and one will be selected for you.

drugforge-alchemy prep run --factory-file "prep-workflow.json"   \
                      --dataset-name "example-dataset"      \
                      --postera-molset-name "ligand-series" \
                      --structure-dir "receptor-cache"      \
                      --processors 4

Warning

This feature is highly experimental and it is recommended that you check the injected experimental compounds carefully

Note

You must export the CDD_API_KEY and CDD_VAULT_NUMBER as environment varibales to enable the CDD interface.

Experimentally measured ligands can also be injected into the series at this stage via an interface to the CDD vault. By providing a protocol name the prep workflow will automatically download all ligands screened as part of this protocol and filter for ligands with an activity within the assay sensitivity range, fully defined stereochemistry and no covalent warhead. These will then be posed using the same protocol as the target ligands and marked as experimental via an SD tag.

drugforge-alchemy prep run --factory-file "prep-workflow.json"   \
                      --dataset-name "example-dataset"      \
                      --postera-molset-name "ligand-series" \
                      --structure-dir "receptor-cache"      \
                      --processors 4                        \
                      --experimental-protocol "assay-1"

Once the prep workflow has finished you will find a new directory has been created named after the --dataset-name argument. Within this you will find a PDB file of the receptor along with an SDF of ligands in their constrained pose along with a csv detailing any ligand for which a pose could not be generated and the reason why. An prepared_alchemy_dataset.json file will also be present which can be used in the next stage of the workflow.

drugforge-alchemy Plan

We are now ready to plan an alchemical free energy network using a state-of-the-art workflow built on the OpenFE infrastructure. Our default workflow plans a minimal spanning tree network with redundancy to ensure each ligand is connected to at least two other ligands in the network, using the Lomap atom mapping and scoring function. Again this can be configured via the API or via manually editing the workflow file which can be generated using:

drugforge-alchemy create "alchemy-factory.json"

We can now plan our network using the default workflow and the ligands we have just posed using the prep pipeline from the previous stage. The prepared_alchemy_dataset.json file contains everything needed for this next stage including the ligands, a dataset name and the receptor. The network is then generated by running:

drugforge-alchemy plan --alchemy-dataset "prepared_alchemy_dataset.json"

Or if you have posed the ligands using some other pipeline you can provide them as an SDF file and the receptor can be provided as a PDB and should already be protonated:

drugforge-alchemy plan --name "my-network"      \
                  --ligands "ligands.sdf"  \
                  --receptor "protein.pdb"

If you use the CDD vault to store experimental data and wish to upload your results to postera later you can also set the name of the assay protocol and biological target which should be associated with this network to save having to supply them each time you make a prediction later in the workflow:

drugforge-alchemy plan --name "my-network"                \
                  --ligands "ligands.sdf"            \
                  --receptor "protein.pdb"           \
                  --experimental-protocol "assay-2"  \
                  --target "SARS-CoV-2-Mac1"

After running the plan workflow you will find another new directory has been created named after the --name argument which contains a free energy calculation network in a file named planned_network.json and an ligand_network.graphml file which can be viewed as an interactive network using the OpenFE CLI:

openfe view-ligand-network ligand_network.graphml

drugforge-alchemy Submit

Note

The commands submit, status, restart, stop, gather and predict assume the network file is in the working directory allowing you to avoid passing the argument explicitly.

At ASAP we make extensive use of the fantastic Alchemiscale:

a high-throughput alchemical free energy execution system for use with HPC, cloud, bare metal, and Folding@Home

This allows us to plan and execute thousands of OpenFE based calculations on distributed compute simultaneously, and provides a convent API to track and manage calculations rather than having to manually sort though hundreds of local files.

Note

Make sure to have your ALCHEMISCALE_ID and ALCHEMISCALE_KEY exported as environment variables

We can now submit our planned_network.json and execute the tasks on Alchemiscale using:

drugforge-alchemy submit --network "planned_network.json"    \
                    --organization "my_org"             \
                    --campaign "testing_asap_alchemy"   \
                    --project "target_1"

This command has created the network on Alchemiscale under a Scope defined by the combination of the organization, campaign and project, then created tasks for each transformation and submitted them to be executed! A unique network key is generated during this process which allows you to quickly look up the network on Alchemiscale and is stored in the planned_network.json file.

drugforge-alchemy Status

To track to progress of the alchemical network on Alchemiscale you can use the following command:

drugforge-alchemy status

If your network has some errored tasks we can also retrieve the errors and tracebacks using:

drugforge-alchemy status --errors --with-traceback

or if you would like to view the status of all currently actioned networks on Alchemiscale under your scope you can use:

drugforge-alchemy status --all-networks

drugforge-alchemy Restart

Sometimes calculations can fail due to a verity of reasons, some of which can be cleared by simply restarting the tasks. Until automatic restarting is built into Alchemiscale we provide a command which allows you to restart all the errored tasks in a network:

drugforge-alchemy restart

drugforge-alchemy Stop

If for any reason you want to stop a network, which removes all currently actioned tasks, you will need the network key which can be found in the status command:

drugforge-alchemy stop --network-key "network-key"

drugforge-alchemy Gather

Once our network has completed all its tasks we can gather the results and store them locally for analysis using:

drugforge-alchemy gather

if the network has some incomplete edges this command will fail, you can however bypass this check using:

drugforge-alchemy gather --allow-missing

This will create a new copy of the network with the results called result_network.json.

drugforge-alchemy Predict

Finally, with our local results we can now estimate the binding affinity of our ligands using:

drugforge-alchemy predict

This will produce two CSV files one containing the relative and the other the absolute binding affinity predictions.

If you provided the experimental-protocol during the plan stage, experimental data will be extracted from the named protocol in the CDD vault and automatically used to assess the accuracy of the calculations. The absolute estimates will also be shifted to be centred around the mean of the experimental values and interactive HTML reports will be generated to help analyse the results in more detail.

If you did not provide the protocol earlier you can provide it as an argument to the prediction command:

drugforge-alchemy predict --experimental-protocol "assay-1"

or if you keep you experimental data in a different source you can provide it as a formated csv file which matches the CDD data:

drugforge-alchemy predict  --reference-dataset "assay_data.csv" --reference-units "pIC50"

If you use postera and would like to upload the results you can provide the molecule set name and a biological target if not provided earlier:

drugforge-alchemy predict --target "SARS-CoV-2-Mac1" --postera-molset-name "alchemy-ligands-1"