How to find a 3DM System by sequence?
System search summary
Important things to look for when choosing the right System for your project:
Does the System contain the region that you are interested in (query sequence coverage)?
What is the System size?
How close is the best hit to your query sequence?
Introduction
If you have a target protein sequence, the best way to pick the right 3DM System is using the Find by sequence option. In this article we’ll explain to you what are the important things that you should take into account when picking a 3DM System to use in your project.
First, from the 3DM dashboard, click on the Find by sequence item either in the menu on the left or via the link on the dashboard (Figure 1). Doing so will take you to the System search page.
Here, you can simply copy your sequence and click the SEARCH button. You query sequence can be either in the form of a plain amino acid sequence (one-letter codes), or a FASTA sequence.
A lot of data needs to be inspected to find the right Systems. Therefore, the search might take up to 2 minutes.
Query sequence coverage
The colourful bars at the top of the Systems list that is displayed in the output represent Pfam domains that are found in this sequence. When choosing a 3DM System you need to make sure that the part of sequence that you are interested in is actually covered by this System. This is indicated by the darker sections in the Query sequence column. If part of the bar is light grey, that means this section is not covered by this System (Figure 2).
For example, in case you are specifically interested in the Malt_amylase_C domain (displayed in green), you can already exclude the second System in the list since it does not cover this domain.
System size
The next thing to pay attention to is how much data the Systems contain. This can be estimated from the Sequences and Mutations values. In general, the larger the number of sequences, the better a System matches your needs.
Best protein hit
Best protein hit and BLAST e-value (which is displayed when you expand by clicking the on the right side) give you information about how close the closest protein in this System is to your query sequence. When a System contains your query sequence, the BLAST e-value will be 0 and an ✓ exact match tag will be displayed. Systems with an exact match to your query are preferred. In case no such Systems are available, you would want to pick the System with the lowest possible E-value.
Detailed System view
When you click on the right side of a table row, you can unfold the System view to see some more information about this particular System (Figure 3).
On the left hand side you can find some overall information about the System itself, e.g. the number of sequences, mutations, subfamilies, and sequences clusters. Additionally, a word cloud of protein descriptions in this System is created. The larger the word is displayed, the more abundant it is in protein descriptions. The word cloud can be especially helpful with the PDB clusters Systems that do not have informative names. In such cases, you can look at the word cloud to see what family/domain is covered by the System.
If your sequence is not present in a System, you can use the ADD AS PRIVATE PROTEIN button to add your sequence to this System. In Figure 3, the button is disabled because the query sequence is already present in the System.
In the right-most part of the row you find information about the best protein hit, e.g. in which subfamily it is aligned and how close is it to the subfamily template (core identity). As a rule of thumb you can assume that the closer the hit, the higher the alignment quality.