Introduction
AHoJ is a structural bioinformatics tool that allows automated search and alignment of APO structures for a given HOLO structure (and vice versa) in the PDB.
It can be used to create customized Apo-Holo datasets.
AHoJ is an application for finding structures that belong to the same protein with a user-specified query structure, and annotating them as apo or holo. It can be used for making a single search and visualizing/downloading the results, or serialized by entering multiple queries to generate a dataset of structures. The user starts the search by providing their ligand of interest - or its binding site) and setting the preferred parameters.
How it works?
Quick description
It features multiple modes of search, but its main functionality is ligand-centric, meaning that the user specifies a particular binding site (by entering a binding residue) - or a ligand directly - and AHoJ will find the same binding site in the other structures (by superimposing the two at a time) and interogate it for ligands.Specifying the ligand or the binding site
This is done by providing the i)structure, ii)chain, iii)name of ligand or residue (using the PDB 3-character code) and iv)position of ligand or residue in the sequence (PDB residue index is used). Specifying all these four arguments is recommended as it avoids ambiguity in the case that more than one ligand molecules of the same type exist within the same chain. When specifying a residue, it is obligatory to specify the position. However, the minimum number of arguments is one - structure. In such case, AHoJ will try to automatically detect chains, ligands and their positions. It can work with ligands that are designated as heteroatoms in the PDB, which means small and medium-sized ligands but not protein subunits.
Therefore, the main and default search is starting with a holo structure, where the user knows the ligand that will be used as a starting point. This user-specified ligand will then define the search and annotation of the results. Any other ligands in the query structure will not play a role in characterising results as apo or holo. In the case that the user does not know the ligand or the binding site, AHoJ can automatically detect available ligands in the query structures if told to do so. If the query structure however does not bind any ligands (apo), the user can still use the "reverse search" mode, where AHoJ will look for structures that belong to the same protein with the query, but it will not focus on a particular binding site. Instead it will list any ligand that it detects in the resulting chains.
Search for Apo-Holo pairs
Query format & examples
Query Format
<pdb_id> <chains> <ligand> <position> # comment
pdb_id
: This is the 4-character code of a PDB protein structure. This argument is obligatory and only 1 PDB ID can be input per line. (i.e. “1a73” or “3fav” or “3FAV”). If it is the only argument (because the user does not know the ligand that binds to the structure or is using "reverse search", it will trigger automatic detection of ligands in the structure.chains
: A single chain or multiple chains separated by commas (without whitespace), or “ALL” or “” in the case of all chains (i.e. “A” or “A,C,D” or “ALL” or “”). This argument is obligatory if the user intends to provide any argument after that (i.e. ligands or position).ligand
: A single ligand, multiple ligands separated by commas (without whitespace), or no ligands can be input per line (i.e. “HEM” or “hem” or “ATP” or “ZN” or “HEM,ATP,ZN”). This argument is non-obligatory, if omitted, the user should activate the automatic detection of the ligands in the structure from the available option, unless the user is starting with an apo structure, in which case they will need to activate the reverse mode (search for holo from apo). Note: if planning to specify the position argument, you cannot use more than one ligand per query.position
: This argument is an integer (i.e. “260” or “1”). It refers to the PDB index of the previously specified ligand or binding residue. This argument can only be specified when there is one ligand or residue specified.
pdb_id
are optional.
Example Query
1a73 A ZN 201 # consider ZN ligand in position 201 in chain A of 1a73
The application will fetch the structure 1a73, get chain A, and look for zinc+2 (ZN) ligand in position 201 of the sequence to verify the input argument. If ZN is found in chain A and position 201 of 1a73 (1a73A), it will retrieve all other known chains that belong to the same protein with 1a73A, it will align them with 1a73A and look for ZN (and also other ligands) at the superimposed binding site of ZN in 1a73A. If it finds protein chains with ZN, it will list them as HOLO, if the superimposed site is empty of ligands, the chain will be listed as APO. If another ligand is detected on that site instead of ZN, the chain will be listed as APO or HOLO, depending on the value of --lig_free_sites parameter (if the user wants APO with no other ligands there, it will be listed as HOLO, and if the user does not mind other ligands in this binding site, it will be listed as APO).
Example of an alternative query that leads to the same result as the previous example:1a73 A HIS 134 # consider ligands near residue HIS134 in chain A of 1a73 (the detected ligand will be ZN 201 in chain A)
More examples
1a73 A,B ZN # consider ZN ligands in chains A and B of 1a73
1a73 ALL ZN # consider ZN ligands in all chains of 1a73
1a73 # find and consider all ligands in all chains of 1a73
1a73 A # find and consider all ligands in chain A of 1a73
1a73 A ZN,MG # consider ZN and MG ligands in chain A of 1a73
1aax # protein tyrosine phosphatase - long search
4est # porcine pancreatic elastase