How to find a 3DM system by sequence

 

System search summary

Important things to look for when choosing the right system for your project:

  • does it contain the region that you’re interested in (query sequence coverage)

  • system size (number of sequences and mutations in the system)

  • how close is the best hit to your query sequence (Blast E-value)

 

If you have a target protein sequence the best way to pick the right 3DM system is using the Find by sequence option. In this article we’ll explain to you what are the important things that you should take into account when picking a 3DM system to use in your project.

First, from the 3DM dashboard, click on the Find by sequence item - this will take you to the system search page.

 

Here, you can simply copy your sequence and click the SEARCH button.

We have to go through a lot of data to find the right systems, thus the search might take up to 2 minutes.

Sequence can be either a plain amino acid sequence (one-letter code) or a FASTA sequence.

Query sequence coverage

First of all, the colourful bars at the top of the systems list represent Pfam domains found in this sequence and the bars below show you which part of the sequence is covered by the given system.

When choosing the right 3DM system we have to make sure that the part of sequence we’re interested in is actually covered by this system - this is indicated by the grey bars in the Query sequence column (if part of the bar is light grey, that means it’s not covered by this system):

 

Let’s say you’re interested in the catalytic domain (orange), then you can for example exclude the 2nd system in the list since it only covers the C-terminal hemopexin domains (red). In terms of coverage the last system seems like the best choice because it covers the full target sequence (but it’s not always the case - sometimes there are only systems that cover parts of the sequence).

System size

Next thing to check is how much data is there in these systems. This you can estimate from the two values Sequences and Mutations. So in this case, if you’re interested in the catalytic domain then the first system will be your best choice - it has the highest number of both sequences (7297) and mutations (5653). However, if you want to investigate the interactions of the catalytic domain with the hemopexin domains then you should pick the last system that covers both parts - although it is significantly smaller (only 1580 sequences).

 

Best protein hit

Yet another things that can help us identify the best system are the Best protein hit and BLAST e-value - this gives you information about how close to your query sequence is the closest protein in this system.

In case of this protein all of the systems contain the query sequence - hence the 0 BLAST e-value and an ✓ exact match tag. But if this wasn’t the case - you’d prefer to pick a system with the lowest possible E-value.

Detailed system view

If you click on the downward pointing arrows on the right side you can unfold the system view, to see some more information about this particular system.

On the left hand side is some overall information about the system itself - number of sequences, mutations, subfamilies, and sequences clusters, and a word cloud of protein descriptions in this system (the larger the word the more abundant it is in protein descriptions). The word cloud can especially be helpful with the pdb clusters systems that don’t have informative names - you can see from the word cloud what family/domain is covered by the system.

In the middle part you can find information about your query sequence. If your sequence is not present in this system you can use the ADD AS PRIVATE PROTEIN button to add your sequence to this system. In this case the button is disabled because the query sequence is already present in the system.

In the right-most part of this view you’ll find information about the best protein hit - in which subfamily is it aligned, how close is it to the subfamily template (Core identity) - as a rule of thumb you can assume that the closer it is the higher the alignment quality. If you want to go directly to analysing this protein in the system you can click on either the OPEN PROTEIN DETAIL or the OPEN PROTEIN ANALYSIS button to go the protein detail or protein analysis page, respectively.