A: 3D-numbers make life easy
Login at app3dm.bio-prodict.nl with your 3DM account. If you don't have a 3DM account you can request one via the "Sign-up" button. To be able to do this course you need at least a course login. After you have requested an account you can request a course login by sending an email to firstname.lastname@example.org.
For this exercise, you need either Pymol or Yasara installed with the 3DM plugin functional. If you don't have one of these or you are missing the 3DM functionalities, please consult the installation instructions. Before you start this exercise make sure you have the latest version of Yasara or Pymol installed.
After entering the login details you will see a link to the “Nuclear Receptors Ligand Binding Domain 2012” database. Open this 3DM system.
At the starting page of each 3DM database you see the 3DM data cycle. The icons in the circle represent links to the most important 3DM options. These options are also available on the left.
Let’s say you found a paper that reports a mutation (Y401A) of the Pig vitamin D3 receptor to have an effect on specificity. Your research is actually about the human androgen receptor. You are wondering if a mutation at the same position in your protein would have the same effect. The first step would of course be searching the literature.
This is normally very difficult to do. Due to gaps and insertion homologous proteins will have different numberings. This means that in other homologous sequences Y401 (or better: the structural equivalent residue of Y401) often has a different number and can also be a different residue type. It is therefore very difficult to know that E322, for instance, actually is the structural equivalent of Y401 of the Pig vitamin D3 receptor.
The protein name is VDR_PIG or A3RGC1. Searching with the keyword “vitamin D3” in the filter option will give a list of proteins. Searching with “pig” in the resulting webpage will give the correct protein. You can click on the protein name. This will link to the “protein detail page”. The conversion between WT sequence numbering and 3D numbers can be found on this page under the sequence projection tab. Residue 401 had 3D number 183.
Note the overall structure of 3DM: The menu on the left, containing the icons for the 3DM tools, is available on all 3DM pages. In the menu the highlighted option tells you what type of data is currently displayed. Often the data is split up over several subpages. If you click on the 'Alignment statistics' item, for instance, the page now shows you two tabs containing 'Data plots' and 'Custom plots' options.
Finding articles describing structural equivalent residues in homologous sequences with 3DM is very easy. Open the alignment page by selecting this item in the menu. In the upper right corner, you will find several options to customize the visualization of the alignment. The sequences displayed in the default view ("consensus alignments”) give a nice quick overview about what trends can be seen in the alignment that the evolutionary pressures have left behind in the alignment. Click on alignment position 183 in the overall consensus. Consult the “Mutations” tab. Here you see a list of mutations that were retrieved from the literature.
Below the “Mutations” tab the data is shown in a table. At the right top of this table there is a filter option. A keyword search using this filter option only searches in the text in the table. This is why the filter option is located right above the table. Giving the keyword "pig" or "Y401" will result in several proteins among which one is the pig vitamin D3 receptor. If you click on “PubMed” you will be redirected to the abstract in PubMed.
The table shows only 10 mutations by default. At the top left corner you can switch to another page of the results table. A search for “androgen” will reveal that the WT human AR is P10275. There are many articles describing a mutation at 3D position 183 in the human AR. You can see that there are 166 articles describing mutation T877A in this 3DM system.
Human AR has a Threonine at 3D position 183 and has residue number 877. Without the 3D numbers synchronizing the sequences in this protein family it would be difficult to find this because you don’t know that T877 is the structural equivalent of Y401.
There are many papers describing mutations in the human androgen receptor and we still haven’t found a paper reporting a mutation that has an effect on specificity. You might just read all these papers, but there is an easier way to do this using Yasara or Pymol:
B: Easy visualization of data in structure files
Structural biologists will confirm that the visualization of data in structures is time-consuming. 3DM offers many different ways to visualize many different data types in any of the available structures. 3DM uses Yasara or PyMol for this purpose. Let’s have a look:
- Go to the "Visualize" page. In the first 'Structures' section you will see the template structure 2OCFA. This structure is the "System default template" and is selected by default. (You can change this by clicking the settings icon.)
- Click on the clear selection button () to clear away this structure. For this demonstration we will not be using this structure.
- Now click the "Add structure" button. You should see a list of structures appear below.
- Disable the "Include compounds" checkmark right above the list of structures. This will include the associated compounds automatically when a structure is selected, but for this demonstration we will disable it.
- We will select two templates from the structures list: 1G2NA and 1HG4A. you should now see both in the 'Structures' section at the top. These two structures are part of a list of templates that were used to build the sequence alignments of the subfamilies.
- You can use the "Quick filter" to quickly select between "show only templates", "show only models" or "show all" structures. "Show only templates" is selected by default. Let's select the "Show all" option in the quick filter.
- The first structure in the list should now be 1A28A. The last column ('Compounds') shows the compound that is associated with this structure. Use the "Select compounds" add button () to add the ligand compound 'Progesterone'. You should now see Progesterone in the 'Compounds' section at the top.
In the 'Positions' section at the top you can select different data types (e.g. “Correlated mutation”, “conservation”) that can be visualized in the selected structures. Have a look at these by clicking the "Add positions" button.
- Go to the Correlated mutations tab and select the top checkmark to select the correlated mutations.
- Now go to the conservation tab and select the to checkmark to select the conserved positions.
If you followed the steps correctly you should see two structures (1G2NA and 1HG4A) in the 'Structures' section, one Progesterone ligand in the 'Compounds' section and correlated and conserved positions in the 'Positions' section.
- You can toggle between the visualization programs (YASARA or PyMOL) with the option on top of the visualize button. Your visualization program preference will be saved for future use.
- Now click on the 'Visualize' button.
You will see a loading bar above the 'Visualize' button and after a few seconds you can download a Yasara or Pymol scene. Save the file and open it with Yasara or Pymol.
Note that the first time you probably need to manually select Yasara of Pymol to open the file as your computer doesn't know yet that .sce files of .pse files are scene files. You can tell your computer to always use Yasara to open .sce files or Pymol to open .pse files.
- Can you see there are two protein structures and one ligand? Do you see that the two structures are superimposed? Note that all protein structures and co-crystalized compounds are superimposed in 3DM. You can therefore insert any of the co-crystalized compounds in any of the protein structures.
In Yasara the 3DM module is usually active, but it may happen that you will need to install it. To do this you will need Python 2.6 or Python 2.7 in your path. If you are a Pymol user you first need to start the plugin. To start the 3DM plugin use "plugin → "initialize plugin system". Then "plugin" → "legacy plugins" → "3DM". A new window should start. Use your 3DM credentials to log in. At the left top of this window, the 3DM menu should appear.
Conserved residues are the most important in performing the function of a protein. Often these residues can’t be mutated without destroying the function. Usually in enzymes conserved residues are surrounding the substrate pocket (they are performing the reaction). In contrast, in receptors the conserved residues often are not located around the ligand-binding pocket. Although the binding of the ligand is important, it often is not the most important feature of receptors. In nuclear receptors the most important feature is the binding of the co-factor. The co-factor binds to the region located at the conserved FDQ motif (3D numbers 47,54,55). 47 is the most conserved residue with 98,43% of the sequences having an F.
The purple residues. They are important for ligand binding. They show this correlated mutation behavior due to the fact that there are different types of NRs that each bind a different ligand. These positions are conserved within such a group, but they mutate between the different groups to facilitate the binding of the different ligands.
What are correlated mutations? In large superfamily alignments correlated mutations (also called co-evolution of residues) are almost always functionally related. Residues that mutate simultaneously often share a function. These can be different functions. Sometimes correlated mutations are related to enzyme activity, or enantioselectivity, or co-factor binding, but most of the time they are related to changes in specificity. Being important for a certain function created an evolutionary pressure that resulted in restricted mutation rates. If a function changes during evolution (e.g. the specificity of the enzyme changed) then the residues involved in this function need to mutate to facilitate this change (e.g. the binding of a new substrate).
Take home message for protein engineers: Correlated mutations can often be found surrounding the ligand/substrate pocket. When they do, they often are correlated with specificity changes and are therefore specificity hotspots. If you want to change specificity make a mutant library at these positions. If your library becomes too large, take only residues that are common in the alignment.
3DM is connected to Yasara and Pymol. The 3DM menu contains different options to select different data types to visualize in protein structures. The "literature hotspots" option is a really powerful tool. Select "Specificity" in this option and login with your 3DM login data if asked. It will select positions for which mutations have been reported in the literature to have an effect on specificity.
There are 33 unique mutations reported in the literature that have an effect on specificity at position 183.
As explained in the answer to question 6, correlated mutation behavior often is correlated with specificity changes. So mutations occurred when the specificity changed during evolution.
One article that according to 3DM contains mutation data reporting specificity changes at position 183 is: "Broadened ligand responsiveness of androgen receptor mutants obtained by random amino acid substitution of H874 and mutation hot spot T877 in prostate cancer". Open the paper here. If you read the first sentence of the abstract you can see that the mutation indeed has an effect on specificity. The title reveals that T877 is a prostate cancer hotspot. You wonder if there are mutations at other positions known to cause prostate cancer. How would you normally solve this problem (don't actually do it)?
Yes, position 183 really is the hotspot for prostate cancer. There are 94 mutations reported related to prostate cancer for position 183. There are reports at other positions too. Position 180 is hotspot number two with 34 reports.
Look in Yasara or Pymol to see if position 183 makes a contact with the ligand. Now you would like to know if position 183 is also a hotspot where ligands bind. The best way to find out is to open all 789 available structure files and count the number of contacts this position makes with co-crystalized ligands, right?
- In Yasara or Pymol: select "Ligand contacts" from the "show super-family data" 3DM option and click OK.
- In Yasara the HUD displays a list that shows the data that was selected from 3DM.
- In Pymol you have to select the structure in which the program should show the ligand contacts. Just select 1G2NA_prot. Then use again the "3DM" → "show scene content details" option to get the list of ligand contacts.
Yes, it is number twelve in the list of positions that contact ligands. We will come back to why mutations at this position can cause prostate cancer.
Nuclear receptors can either be activated or inhibited by small molecules (3DM calls these ligands). Activating compounds are called agonists and inhibiting compounds are called antagonists. Say you would like to know where activating compounds bind, where inhibiting compounds bind, and if there is a difference.
C: The search options and subset generation.
In 3DM many different search options are available to select a subset of sequences (consult the different tabs of the search module and see if you understand all the search options). The resulting sequences of a search can be saved in a subset with the subset window. To display the subset window click on the 'SUBSETS' button on the right hand-side of the page header. With this subset window a mini 3DM can be generated for a subset. All 3DM functionalities will work in all 3DM webpages and even in Yasara if you switch to using subset data. How to generate a subset and use the subset window will be discussed in more detail below.
Nuclear receptors can be inhibited or activated by ligands (small organic molecules). We want to investigate the difference between inhibiting-, and activating ligands. The differences between these two might reveal the mechanism of inhibition/activation. To investigate the different binding modes two subsets need to be generated. One containing only PDB files that have an activating compound in the ligand-binding pocket and a second containing PDB files with an inhibitor. Comparing the positions where ligands bind in one group to the other could reveal a different binding mode.
Because the generation of the mini 3DM systems for these two subsets takes time, we pre-generated an activator and inhibitor subset. To generate the subsets we used the 3DM search option (structures subpage). Because some PDB files contain both an agonist and an antagonist we have to use a trick, which nicely indicates how to use the subset window. Try the searches yourself to see if you find the same number of sequences. To make the antagonists subset we first used the keyword “antagonist”. The antagonist search results in 90 structures. If you open the subset window by clicking "Subsets" in the top right corner to make a new subset you can add these 90 by first selecting all of them (with the All proteins radio button in the yellow 'Edit subset' panel) and then by clicking on "add to subset". Note that you can select and deselect sequences manually with the checkboxes in front of each sequence.
Next we ran a search for the keyword "agonist" - with the 'Match whole words' option selected (to exclude all "antagonists" in this search). You can remove those by first selecting all of them and then click "remove from subset" in the yellow box. This deletes the overlap of the two searches. So the PDB files that contain both an agonist and an antagonist will be deleted from the 90 antagonists. Now 68 structures are left in the subset.
Next we ran a search for keyword "agonists" with "match whole words" active (184 structures) and repeat the subtraction step from the previous search. Now we have 66 structures left in the subset. We saved the subset, named this subset “inhibitors for course” and generated a mini 3DM for it (option: save and regenerate). Note that the option “save” will only save the sequence in the subset window for later use without making a mini 3DM for the subset.
The reverse search was done to generate the agonist subset. Searching for “agonist” without "match whole word" gives 440 structures. Removing with "match whole word" the "antagonist" (81) and "antagonists" (2) gives 357 structures. This subset is called “activators for course”.
Please note that in this example we have many structures and therefore this trick actually works. In real life, you would manually add the chains we have not deleted with an activator or inhibitor to the correct subset. But for doing this course, this is too much work and we will take this simple shortcut for now.
From the 'Subset' dropdown menu on the top of the page, choose the 'activators for course' subset - now all data will include only the sequences that are part of this subset.
To show the effects of subsets first go to the "Alignment statistics" page by clicking the Alignment statistics icon. The histograms plot different data in relation to the 3D numbers (x-axes). Have a look at what kind of data is plotted in these histograms by scrolling down. You can switch between subsets using the middle menu item at the top of 3DM. Using this selection menu you can see the "activators for course" and "inhibitors for course" subsets. The data in these histograms depends on the subset that is selected.
Go to the second option (Custom plots) of the alignment statistics. Here you can select different data types and subsets. In the left box select "activators for course" and "inhibitors for course", in the right box select ligand contacts and click the "Generate" button. Evaluate the resulting histogram.
Position 31 is contacted only by inhibiting compounds.
MISC-1881Getting issue details...
There are many more structures available with an inhibiting compound making the comparison a bit “unfair”. If you normalize the data 3DM acts as if there are just as many structures in both sets.
- Use the "visualize" option of 3DM and open structure 1BSXA in either Yasara or Pymol (you need to switch back to the 'Full dataset' subset to be able to find this structure on the "visualize" page and choose "Show all" in the Quick filter). Visualize the amino acids with 3D numbers 31, 34 and 192 in 1BSXA.
MISC-1882Getting issue details...
- In Yasara you can right-click on residues (either in the structure or in the residue bar at the bottom of Yasara) and select "show" → "residue". You can also give them a color if you want.
- In Pymol:
- Click on a residue inside the structure (or inside the residue bar at the top), it will be highlighted.
- Click on the command bar on the bottom (indicated by `PyMol>`)
- To show the selected residue, type: `show sticks, sele` and press Enter
- Or, to color the selected residue type: `color red, sele`and press Enter
They contact each other, but they don’t contact the ligand 1A28A.
In Yasara or Pymol load the ligand from 1ERRA using the "3DM" → "Structures" → "load structure from 3DM" option.
The ligand from PDB file 1ERRA, which is an inhibitor, makes clear contacts with 31 and 34. This ligand is located where residue 192 normally binds and 192 won’t bind there anymore.
Inhibiting compounds are in competition with helix 12. Helix 12 docks to position 31 and 34 in the active state. If an inhibitor is bound it pushes away helix 12 away to bind to the helix of position 42 so the co-factor can’t bind there anymore.
D: Designing drugs
In 3DM use "Search" → "Structures" to find a human androgen receptor structure with its natural ligand dihydrotestosterone bound in the ligand-binding pocket.
If you use the “structures” subpage from the search menu and you give the keyword “dihydrotestosterone” in the “Keyword or part of a keyword” search box you will find 57 hits.
Use the "visualize" option of 3DM to visualize in Yasara or Pymol both the protein chain and the ligand from the first hit from the previous search (1I37A). An easy way to do this is to click on 1I37A in the search result which links to the "protein detail page" of 1I37A.
You can use the visualize icon () to go directly to the "visualize" module of 3DM where this PDB is then selected. Just click on "visualize" and you are ready (note that on some computers you need to close Yasara first before opening a new one). Once you have this structure loaded in Yasara or Pymol use the 3DM → Literature Hotspot → Specificity option to show the hotspots for specificity.
This is position 183 with 33 mutations published.
Considering the fact that 183 is both a specificity hotspot and a hotspot for prostate cancer this very well might be the reason.
Take home message: With 3DM you try to find correlations between different data types and learn from these correlations underlying biological meaning. You have seen above that if correlated mutations are close to the ligand/substrate they are likely important for specificity. You have combined two different data types: e.g. correlated mutation data with structural location. They overlap and therefore you have learned something. If you want to answer a biological question with 3DM you should do these two general steps: 1 Which subsets do I need to generate and 2. What data do I need to compare? Sometimes you don’t need to make subsets and just comparing two data types is already sufficient.
If specificity changes in the androgen receptor T877A mutation are indeed the underlying cause for prostate cancer it is not unlikely that the mutant can now be activated by a different ligand present in humans. It is also not unlikely that it can now accept another nuclear receptor ligand because of the similarity of the ligands that are recognized by the members of this protein family. A scientist working in this area would realize that progesterone is very similar to dihydrotestosterone. So it seems a good hypothesis that a T877A mutated human androgen receptor is overly active because this mutant has changed specificity and can now be activated by progesterone thereby causing cancer, but we need additional proof for this idea.
Use the 3DM → "structures" option in Yasara or Pymol to load the ligand of structure 1A28A (progesterone) and compare this ligand with dihydrotestosterone that should already be loaded in Yasara. If it is not loaded also load the ligand of 1I37A. For easy comparison change the ligands to stick visualization.
The screenshot below shows what you should have in Yasara now.
The two compounds are almost identical except at the left side of the molecules. DTH has oxygen whereas progesterone has a bigger ester group (two carbons and oxygen).
The picture clearly shows that this difference is exactly at the famous threonine 877 (3D number 183)
Yes. It all seems to fit nicely with our hypothesis.
Yes, mutating to an alanine might very well make room for the bigger ester group.
- In Yasara you can make the mutation by right-clicking on the threonine and choose swap → residue → alanine.
- In Pymol:
- In the main PyMol menu, click "Mutagenesis"
- Select the residue you want to mutate.
- On the right side of the screen, click on "No mutation" and select the residue you want tot swap in (alanine).
- Click apply.
After this experiment, it seems very likely that progesterone is capable of activating the T877A androgen receptor mutant. Most drugs that are used to treat prostate cancer are general androgen receptor inhibitors but are not specific for treating prostate cancer caused by the T877A mutation. Some drugs are available, such as hydroxyflutamide (also called S-1), and some derivative compounds, but these are not very effective in patients with the T877A mutations. It would therefore be helpful to have an alternative drug that specifically targets the T877A androgen receptor.
We should therefore design a compound that could compete with the binding of progesterone in the T877A mutant but does not activate this mutated androgen receptor. We would have to make a compound that is similar to progesterone (simply because we now know progesterone can bind) but has a larger group around position 183 because we know this is how inhibitors work as we have learned during this exercise (do you agree with this hypothesis?). Luckily, using simply chemistry, it is easy to add groups at the ester group (O=C-C-R) of progesterone that is near position 183.
If this idea would succeed we have designed a drug for treating prostate cancer in patients with the T877A mutation. Wouldn't that be great? Unfortunately, we are not the first to make this type of compound for treating prostate cancer. Several progesterone derivatives have been published as androgen T877A inhibitors. But the above process nicely shows how combining data from a 3DM system can be used as guidance in the early phases of drug design.