Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel
borderColor#C8E6C9
bgColor#C8E6C9

Note: Before doing this practice it is best to first do the practical on NR ligand binding/wiki/spaces/DOC/pages/327684, which is the introduction to 3DM.

...

Expand
titleAnswer

If they are independent the residues in one of the networks mutate in sync with each other, but the residues from the other network mutate in a different rate. For example, If you compare two subfamilies the positions of one network all have a different residue in each of the two subfamilies (so they mutate between the two subfamilies), and the positions of the other network might be 100% conserved over the two subfamilies.

You can exploit explore this idea by making an alignment with two groups of proteins. Each group should contain functionally the same similar proteins but this function differs between the groups. For instance, a group can be made with enzymes all converting the same substrate, and a second group that convert another substrate. The correlated mutations then often reveal hotspots that determine the substrate specificity change. A very important note is that correlated mutations reflect functional changes and not environmental changes. This can easily be explained: You cannot use correlated mutations to find hotspots for thermostability. You could make an alignment of two groups: one composed of proteins from thermophilic organisms and another made from non-thermophilic organisms. Then the differences between these groups should show up as correlated mutations, right? This is not the case, because in fact these two groups cover all sequences of the complete superfamily alignment. You can investigate the differences between these groups just by comparing the amino acid contents between the groups. This concept will be discussed later. One more important note: The problem with functional grouping is that the functional annotations of proteins is often based only on sequence similarity and can therefore be wrong. You need to make sure that at least a high percentage of the sequences share the same function to make functional based hotspot finding work. There are examples though were the alignment was successfully grouped simply by using keyword searches in the protein descriptions and used to find novel enzymes. The group of Prof Uwe Bornscheuer has found R-selective amine transferases (Only S-selective enzymes where known) by deleting from the protein family groups of proteins that had other activities based on motifs that were associated with the different functions. This exercise was repeated with 3DM. The protein family was first grouped by giving keywords that were related to enzymes with unwanted reaction mechanism. The keywords lyase, for instance, was used to select of enzymes with likely the unwanted lyase activity. Then, this set of sequences was used to find a sequence motif specific for lyase activity. All sequences that had this lyase specific sequence motif were subsequently deleted from the alignment. This step ensures that all lyases that were not annotated as such still get deleted from the alignment. This procedure was repeated for all different unwanted enzyme activities. This exercise resulted in 42 sequences that most likely are all R-selective amine transferases.

...

Expand
titleAnswer

If set on 2 then 3DM will only show positions in the network for which at least two different mutations have been published that had an effect on specificity. This is to ensure no falls positive false positives are included. In protein families for which a lot of mutations are available it is usually smart to leave this at at least two. If there is not so much data available it might be better to set it on 1 to get reliable E-scores.

...

Expand
titleAnswer

They have the same number using the class A numbering scheme: 3.25a. In the class B alignment this residue has number 3.39B29b. Obviously there are more structural conserved positions before this residue in the class B alignment compared to the class A alignment. When this common numbering scheme was designed they didn't realize that they could have made a numbering scheme that could be applied to all classes. As everyone uses this numbering scheme now, it is too late to synchronize the numbering schemes of the different families.

...

Select a template and make a model. Make sure you have the GPCR class A alignment selected. Use Yasara or Pymol to open it. Making the model may take a few minutes. If 3DM wasn't able to model parts of the sequence, these parts will be missing in your model. These missing positions are indicated as purple dots in Yasara and as lines in Pymol. Usually this is because the alignment between the sequence and the template cannot reliably be made (often the parts outside the core) due to very low sequence similarities. Realize that those parts cannot be modeled reliably using the selected template, because the sequence similarity is so low that the two proteins will likely fold differently in those parts. Sometimes it helps making a model choosing a different template (if available), but usually this means that those parts can simply not be modeled reliably.If 3DM wasn't able to model parts of the sequence, you will see purple dots in Yasara. Usually this is because the alignment between the sequence and the template cannot reliably be made (often the parts outside the core) due to very low sequence similarities. Realize that those parts cannot be modeled using this template because if the sequence similarity is so low the two proteins will likely fold differently in those parts. Sometimes it helps making a model choosing a different template (if available), but usually this means that those parts can simply not be modeled reliably.

...

Click on "alignment statistics" in 3DM . Make sure you have the GPCR class A family selected and consult the "Human variation/Position" histogram. You can compare this histogram with other data types using the "compare with" option above the histogram. Choose here "Amino acid conservcationconservation". Set the cut-off for conservation on 50% and the SNP cut-off on 0,3.

...

A hotspot basket is nothing more than a selection of alignment positions. You can generate hotspots for different protein features (e.g. correlated mutations, specificity hotspots, thermostability hotspots, etc) and those can be found using 3DM. At a later stage you can open the basket in different 3DM tools. Let's see how this works.

...

Expand
titleAnswer

The B-factor method doesn't seem to be a good method for the GPCRs. Likely this is because the transmembrane helices are so tightly bound that automatically the rest of the protein is much more flexible. It is known that making mutations in the transmembrane helices can make GPCRs more stable. This is nicely demonstrated by the "thermostability" literature search. Clearly the two methods do not overlap indicating that the B-factor method does not apply to GPCRs. It is in this protein family probably much better to first try the positions that are already published to have an effect on the stability. The best way to do this is to find amino acids described in literature that are know known to stabilize GPCRs. If the stabilizing residue is not in your target GPCR then it might be a good idea to try that residue. If your target sequence has a different residue than the consensus it might be smart to try the consensus residue as there are several papers demonstrating that making the consensus often has a beneficial effect on the stability. (Note that if it is easy to screen your protein for thermostability than randomizing each hotspot might be smarter). There are many other tricks too. Here are some examples: 

  1. Introducing prolines at positions where a proline is common and your target doesn't have a proline
  2. Inserting negative charges at the N side of a helix (if there isn't any). Or a positive residue at the C-side. This is known as helix capping because the N terminus of a helix is slightly positive and the C-terminus is slightly negative.
  3. Creating salt-bridges by inserting positive or negative residues at positions on the outside of the protein. Always check if the residue you want to use is actually common in the alignment at that position.
  4. Replacing glycines where, according to the alignment, glycines can be replaced by something else.

...

The idea of the panel design tool is to select sequences from the alignment such that the selected sequences are maximally distributed over the superfamily. This is done in two steps: First the sequences of the superfamily are grouped. This can simply be based on sequence similarity (similar sequences are within the same group), but groups can also be based on sequence motifs found at user selected positions. The last option is used to group sequences based on a protein feature. For instance, the user can pick positions important for specificity. The idea is that sequences that have the exact same residues (the same motif) at those positions they are likely to have the same specificity. Both methods can be combined. In the second step a user defined number of sequences (usually one or two) are selected from each group. The selection step contains all kinds of options to maximize the chance that these proteins are likely to express. Lets Let's see how it works. 

Select the "panel design" option in 3DM. First we will divide the super-family based on sequence motifs. Because we want maximize the specificity range in the panel, we will use the "specificity hotspot" basket you have generated in question 3130. This way all sequences with the same motif at our specificity hotspots (thus will likely have the same specificity) will be in one group. Use the "add hotspots" button to select the hotspot basket you made in Q30 (it should contain 9 positions).

...

Panel
borderColor#C8E6C9
bgColor#C8E6C9

Note that sometimes you need to make a panel of a subset of sequences. For instance, say you want to find the most active enzyme with a certain specificity. Then you should first make a subset that contains only sequences that have this specificity. This sounds simple, but due to wrong notation of proteins is tricky. The best way to do this is to first do a keyword search to find enzymes likely to have the correct specificity and make a subset of this set of sequences. Then use this subset to make a motif with 4 to 7 amino acids that is specific for this subset. It is best to use the "subset specific residues" plot (consult the OAH questions about this plot). Make a new subset that contains all sequences that have this motif. With this approach you will not only find sequences with the correct specificity but that are annotated as "hypothetical protein", but you will also delete the sequences which are wrongly annotated. This approach doesn't make sure you have all sequences with the correct specificity, but it does maximise the chance that the once ones that are in your subset all indeed have the correct specificity.

...

  • Select the "Manual" tab in the left box and type V117C. To add this mutation to the experiment, simply drag this mutation into the right box.
  • Save the mutations by clicking "save" in the right box.
  • Click on "Sequences" on the left.
  • At "Maximum # of mutations per sequence" select 4 and select "fill up each sequence to contain maximum # of mutations". This will ensure that all sequences will have 4 mutations. If you don't use this option then sequences can have single, double, and triple mutations. Be sure to check the option “create demo measurement data”.
  • Select 96 sequences to generate and set the minimum number of observations on to 2 and click on "convolute mutations".

...