Index

1 Index
2 Introduction
- 2.1 Setting up
  - 2.1.1 Registration
  - 2.1.2 YASARA or PyMol
- 2.2 System selection
  - 2.2.1 System overview
- 2.3 Protein detail page
  - 2.3.1 General protein information
  - 2.3.2 Models
  - 2.3.3 Mutations
- 2.4 Alignment
- 2.5 Alignment statistics
- 2.6 Correlated mutations
- 2.7 Visualize
  - 2.7.1 YASARA
- 2.8 Panel design
- 2.9 Search options
- 2.10 Other functionalities

Introduction

Welcome to the 3DM walkthrough! Here, the most important elements of our 3DM Systems are explained.

3DM is designed to facilitate the exploration of sequence-structure-function relationships. Our platform builds on high-quality alignments and integrates many types of protein data. From discovery to optimization, you can enhance your research with advanced AI and analytics.

To get maximum value out of 3DM, we advise you to follow the freely available https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/229377. The course contains a large practical component in which participants learn to solve real life problems from different fields (protein engineering, homology modeling, and drug design), using our 3DM Systems.

In case you have any questions about this

Setting up

Registration

To follow this walkthrough, you will need a Bio-Prodict account. If you already have one you can skip this step. Otherwise, please start by filling out the form on the registration page. Are you eligible for an academic license? Make sure to register with your academic email address. Once you submit the registration form, you will receive a confirmation email with further instructions on how to activate your account.

YASARA or PyMol

In case you would like to use 3DM's protein visualization features, you need to install YASARA or PyMol and the 3DM plugin. Please refer to the https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/8323108 for guidance through this process.

This walkthrough uses YASARA to demonstrate the visualization options. In case you prefer to use PyMol, this is also possible. However, keep in mind that if you decide to use PyMol, the screenshots and instructions below might not all be applicable.

System selection

Once you have successfully registered for a Bio-Prodict account and installed your protein visualization tool of choice, go to the login page of 3DM and enter your credentials. After logging in, you will land on the dashboard page where you can select the system that you want to work on. In case you have already purchased any systems, they will be listed here. For this walkthrough we will use a nuclear receptors system, which is publicly available. You can read more about finding and selecting a suitable 3DM System here.

To select the 3DM System that is used in this walkthrough, click on PUBLIC and choose the Nuclear Receptors - Ligand Binding Domain (Demo 2008) system from the list (Figure 1). You will be redirected to the system's main page.

Figure 1. Selecting the Nuclear Receptors system.

System overview

At the starting page of each 3DM System you can find some system stats and links to quickly navigate to specific tools or pages (Figure 2). All tools are also available from the menu panel on the left. From this panel, click on System > System info. On the system info page you will find a summary of the system, which gives you an idea on the amount of data that this system contains, e.g. how many sequences are aligned, how many mutations are mapped on proteins in the system, etc.

Figure 2. System start page.

Protein detail page

The content of the 3DM protein detail page will be discussed using a human androgen receptor (ANDR_HUMAN, UniProt accession P10275) as example. You can follow this link to visit the page yourself, or go to Search > Proteins by keyword in the left menu and search for “ANDR_HUMAN”.

Figure 3. Protein detail page for ANDR_HUMAN (P10275).

General protein information

The INFORMATION tab on the protein detail page displays the most basic information about your protein, e.g. a description, the gene name, and the organism that is associated to the protein (Figure 3).

Mutations reflect the number of mutations that were found in literature for this protein. Notice that for ANDR_HUMAN (P10275), 1540 mutations were found. It would be practically undoable to gather all that information manually. This is a clear example of the power of 3DM. You can investigate these mentions/mutations in more detail the mutations tab (described below).

The aligned in subfamily field gives you more information about the subfamily alignment of your protein. The identifier is de PDB identifier of the structure that was used as a template for the subfamily alignment. When you click on Show protein in subfamily alignment, you will be redirected to the Alignment page. Here, the alignment of your protein within this specific subfamily is displayed. Your protein is highlighted in this alignment. The content of the Alignment page is described in more detail below.

The core identity is the sequence identity of the query protein to the subfamily template. Usually, the higher the core identity, the higher the quality of the protein's alignment.

On the right side of the page, a word cloud is displayed. This is a set of the most abundant keywords annotated on the proteins that are most closely related to your protein of interest. These keywords can give you a hint on, for example, the most important functions or classifications of your protein. Below the word cloud, you can view, copy, or BLAST the Fasta sequence of the protein.

At the very bottom of the page there is a view in phylogenetic tree button. After you click OPEN in the pop-up window that appears, a new window will open that displays the generated tree (Figure 4). Your protein of interest is marked with a red circle. The nodes in the tree represent all the structural alignment templates. You can choose to include a custom number of alignment clustering representatives (50 by default) to give you a better idea of your protein in the context of the whole alignment. After changing the number of these additional proteins (Representatives), make sure to regenerate the tree to make them visible.

You can also access the phylogenetic from any other page within a 3DM System by clicking on Phylogeny in the left menu.

Figure 4. Phylogenetic tree for ANDR_HUMAN (P10275), including subfamily templates and 50 alignment clustering representatives.

Models

In the MODELS tab you can create a homology model for your sequence (Figure 5). Under Available model templates, a selection of potential good templates, based on their sequence similarity with the protein that you are investigating, is displayed. You can select the template(s) that you are interested in and download the homology model in the required format at the bottom of the page. In the free https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/229377 material you can learn more about selecting the best template for your use case.

Figure 5. Models tab for protein ANDR_HUMAN (P10275).

Mutations

The first graph in the MUTATIONS tab displays the number of mutations that are found per residue position in a protein (Figure 6). Mutations in core positions are colored green, mutations in variable positions are in red. Scrolling down, you will find tables containing information about mutations that were retrieved from literature, about possible mutations that were retrieved from literature, and annotations of residues.

Figure 6. Mutations tab for protein ANDR_HUMAN (P10275).

Alignment

Figure 7. Alignment of subfamily 1XJ7A, highlighting your protein of interest (ANDR_HUMAN).

After clicking on Show protein in subfamily alignment on the Protein information page as described above, you are redirected to the Alignment page. Here, the alignment of the subfamily to which your protein of interest is aligned is displayed. The protein is marked with a slightly lighter shade (Figure 7). The displayed residues are only residues that are aligned in the core regions (the parts of the alignment that are aligned across the whole superfamily). In the subfamily view that we are in now, you can display the variable regions by clicking on the variable regions toggle at the right top. The variable regions will appear in a lighter color shade, whereas the core regions keep their bright color in the alignment. Since only the aligned parts of the variable regions are displayed, you generally do not see the full sequence in this view.

When you click on the black arrow that is located above the protein identifiers on the left side, you are redirected to the main alignment. This is also where you enter the Alignement page when you click it in the left menu. Here, you see the alignment of all the subfamily templates in your system. By clicking on the identifier of a subfamily template in the list, you will again move to a subfamily view similar to the one that is displayed in Figure 7.

Alignment statistics

Let's now navigate to the alignment statistics page, which you can access from the menu on the left side of the page. The plots here represent different kinds of data, e.g. ligand contacts, amino acid conservation, and molecular contacts, mapped onto the alignment positions (Figure 8). Next to the title of each plot is a grey sign. If you mouse over it, you can get more detailed information about the data that is presented in the plot. You can also view the protein data that is available for a specific position in the alignment by clicking on its corresponding bar in one of the plots. All plots have blue sliders which allow you to play around with value cut-offs of the displayed data.

Figure 8. Alignment statistics page.

Besides investigating the individual plots, finding correlations between the data in the different plots can be very insightful. For example, residues with high correlated mutation score are often involved in the same function (e.g. ligand binding). The compare with dropdown menus allow you to display two distinct kinds of data and investigate any correlations (Figure 9). This information can also be visualised in YASARA. To prepare the required file, you need to click the button with a protein helix symbol . By doing so, you will be redirected to the visualize page, which is explained further down this walkthrough.

Figure 9. Comparing mutations.

Correlated mutations

A network of correlated mutations is visualised on the correlated mutations page under the CORRELATION NETWORKS tab. You can play around with the cut-off value to filter the network on correlation score using the slider on the right.

In the Literature & Mutations section you can investigate whether positions in the full network have any mutations that are associated with certain keywords in literature. In Figure 10, we searched for literature mutations associated with the keyword specificity. The positions for which mutations associated with ligand binding specificity are found in literature are displayed in cyan in the correlation network. The table below the keyword search contains an overview of how many of these mutations were found per position.

Figure 10. Correlation network with positions associated with the keyword “specificity” marked in cyan.

When you click on a node in the network, the Amino acid distribution section on the right shows you what residues are most abundant on this alignment position (Figure 11). Clicking on an edge between 2 positions will give you a pie chart showing the correlating pairs of residues on the given positions.

Figure 11. Amino acid distribution of position 22.

Scrolling down the Correlated mutations page, you will see two types of enrichment plots. The first provides an overview of the abundance of mutation keywords in the correlated mutations network. The second plot show the same type of information, but for the position with the highest conservation. The plot as shown in Figure 12 is generated for the sequences that have a threonine (T) on position 22. It tells you that 'substrate' and 'specificity' are the keywords with the most significant enrichment. This is a strong hint that the residue on position 22 might influence ligand binding specificity.

Figure 12. Enrichment plot for automatically generate subset 22T.

Visualize

When you enter the Visualize page via the menu on the left side of the page, one structure as already preselected. This structure belongs to the template of the subfamily to which our protein of interest is aligned. When you click ADD STRUCTURE, a structures section appears where you can select more structures and compounds that you would like to add to your visualization (Figure 13). Once you are satisfied with your selection, select your program of choice (we use YASARA in this walkthrough) and click VISUALIZE. You can now open the downloaded scene to view your selected structures.

Figure 13. Adding structures to a visualization.

YASARA

After opening the downloaded scene in YASARA, all your selected structures are made visible. At the right top, you can configure the scene content to in- or exclude your structures of choice. Some parts of the backbone are green(ish) and others are grey (Figure 14). The grey color indicates that these residues fall in the variable regions, while green(ish) ones are core residues.

Figure 14. IA28A visualized in YASARA.

You can access additional 3DM data from within YASARA using the 3DM plugin which is located in the menu bar under the 3DM tab. There are numerous options of mapping data onto the structure(s) in your scene. For example, you can project mutation data by going to the 3DM tab > Show superfamily data > Mutations. Click OK in the pop-up window to move forward with default settings. Residues corresponding to alignment positions with the highest number of mutations are now shown in your visualization. Most of them are located in the pocket where you previously saw a lot of residues with ligand contacts. It makes sense that these residues are the ones that are most often mutated by researchers to investigate their function and the effect of mutations on ligand binding. You can switch between the views with the tabs right below the menu bar to get back to your main visualization.

If you want to learn more about how to work with YASARA and gain some hands on experience working with the tool, please follow the https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/229377.

Panel design

The panel design tool facilitates the design of enzyme selection panels. Using this tool, you can select sequences from the alignment such that they are maximally distributed over the superfamily.

A selection of positions can be entered to create motif groups based on sequence similarity, sequence motifs found at user selected positions, and/or a specific protein feature. The minimum group size can be adjusted to exclude small groups. Motifs that contain gaps can be excluded with the switch button onder options. You can also divide the alignment phylogenetically. This separation can be combined with the motif grouping. In the box under phylogenetic groups you can enter the number of groups in which the superfamily will be divided, purely based on phylogenetic distances.

After defining the groups, you can select sequence(s) from each group for your panel. There are several options that can be used to determine which sequences to select. Using these options maximizes the chance that selected sequences can be expressed.

Are you interested in learning more about the panel design tool and working your way through an example yourself? Please follow the https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/8257537.

Search options

There are multiple ways to search for proteins in a 3DM system, e.g. by keyword or BLAST. While most of them are quite straightforward, we will have a closer look at the Search proteins by position motif and Search proteins by sequence motif options. These two search modes provide a similar functionality, the difference being that in the search by sequence motif the specified motif can appear on any position in the sequence.

In the case of searching proteins by position motif, we are looking for specific motifs on specific positions. For example, we might want to find all proteins that have a tyrosine on alignment position 35 and an isoleucine on alignment position 90. Then, we will use the search term "Y35,I90" (Figure 15). Using searching proteins by sequence motif, you will find all proteins that have a tyrosine followed by a random amino acid and an isoleucine in their sequence by searching for the motif “YXI” (Figure 16).

Figure 15. Search proteins by position motif.

Figure 16. Search proteins by sequence motif.

Other functionalities

Do you want to learn more about how to get the most out of your 3DM System, for example by creating subsets of proteins for your analysis, creating custom numbering schemes, or using other advanced tools? Please follow the https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/229377 and consult our https://bioprodict.atlassian.net/wiki/spaces/DOC/pages/595394573 page. In case you want to learn more about a certain page in 3DM, click the Help button with the image at the right top to find the relevant documentation. Any remaining questions? Get in touch with us!

Bio-Prodict Docs

3DM Walkthrough