Index

1 General
2 Introduction
3 Correlated mutations
4 Families and numbering schemes
5 Making a structural model with 3DM
6 Visualize data in structures
7 Panel design
8 Patent analysis tool
9 Designing mutation experiments with the de-convolution tool

Before doing this exercise it is best to first do the Introduction: 3DM applied to the nuclear receptorscourse.

General

The complete course consists out of three parts:

To follow these courses, you need a Bio-Prodict account. If you do not have one yet, please start by filling out the form on the registration page. Are you eligible for an academic license? Make sure to register with your academic email address. Once you submit the registration form, you will receive a confirmation email with further instructions on how to activate your account.

For some exercises in this section, you need either PyMOL or YASARA installed which has the 3DM plugin. If you do not have YASARA or PyMOL or you are missing the 3DM functionality, please consult the Installation instructions 3DM. Make sure you have the latest version installed before you start this exercise.

In case you have any questions about this course, please get in touch with our support team via support@bio-prodict.com.

Introduction

After entering the login details on the login page, you land on the 3DM dashboard page. Here, you can find an overview of all available systems. In the 3DM COURSE tab, click on the GPCR (2016) system. This is the 3DM system we will be working with during this course.

By default, the GPCR 3DM database shows data of the GPCR family class A sequences. There are other classes of GPCRs too which you will investigate later. Unless specifically requested otherwise, always use the GPCR class A protein family to answer the questions in this course.

Correlated mutations

Open 3DMs correlated mutation analysis tool CorNet by clicking on Correlated mutations in the left menu. You can see two big networks and a few smaller ones. Start thinking about whether the two bigger networks really are two separate networks, or if they are simply disconnected because of the fact that the default cut-off for the number of positions in the network is 25.

Question 1: What does "independent networks" mean? When are two networks independent?

Question 2: How could you investigate if two networks truly are two independently co-evolving networks?

Question 3: Enter the keyword "specificity" in the searchbox of the literature & mutations window on the right. What do you find? Does this provide a clue about the separation of the two networks?

Question 4: Do you think it is likely that position 92 or 93 have an important role for specificity? How would you find out?

Correlated mutations often divide the superfamily into several groups based on sequence motifs. Sometimes this is just an evolutionary separation, for example in the case of bacterial sequences versus those of eukaryotes. However, because 3DM uses superfamily alignments containing proteins with different functions, the correlations are often a result of the changes between the different functional groups.

Question 5: What is the difference between evolutionary grouping and grouping based on functional differences?

Click on the main nodes of the two big networks (position 88 and 74). Clicking on a node gives information about this position, such as the amino acid distributions.

Question 6: How many different residues do you see in the distribution pie if you click on position 88? And what about 74?

Question 7: Click on the other members of each network. Do you see a pattern? What does this tell you about the separate co-evolving networks?

You can investigate the correlation between residues by selecting them simultaneously. This can be done by making a selection box around the positions, or by selecting them while holding the Ctrl or command key.

Question 8: Select position 105 and 197 simultaneously. Looking at the amino acid distribution pie chart, do you think the residues on these positions might form a salt-bridge? You can visualize this by clicking on SELECTED NODES in the Visualize alignment position in structure section below the pie chart.

Again, enter the keyword “specificity” in the keyword search box of the Literature & Mutations section. Now, look at the enrichment-score (overall enrichment) below the search box.

Question 9: What does this value tell you? Is this a high/meaningful value? Is it what you expect after seeing the network or do you expect something to be going wrong? You hover over the information icon if you do not know what the enrichment-score (E-score) is.

In the Literature & Mutations section you can change the Keyword cut-off. With the default value of 2, 3DM will only show positions in the network for which at least two different mutations have been published that had an effect on specificity. This is to ensure no false positives are included. In protein families for which a lot of mutations are available, it is usually smart to leave this at at least two. If there is not so much data available you can change it to 1 to get reliable E-scores.

Question 10: Now select all the nodes (both the green and the light blue ones) of the sub-network with the light blue position (e.g. specificity position). What is the enrichment score of this selection?

Different protein features (e.g. activity, dimerisation) can be the underlying cause of an evolutionary pressure that has resulted in correlated mutations. In 3DM we pre-calculate the E-scores of the different protein features at different correlated mutation cut-offs. The resulting plot shows quickly which evolutionary pressure is behind a correlated mutation network. At high correlated mutation cut-offs specificity mutations are clearly over-represented in the GPCR CorNet network (Figure 1).

Figure 1. E-scores of different keywords.

Question 11: With the keyword cut-off set to 1, you can see that there is only one article describing a mutation that has effects on specificity at position 93. Do think that it is likely to be a false positive or is it more likely that this really is a specificity related position? And what about position 248?

With the keyword cut-off set back to the default value of 2, select the other big network (with position 74 in its center) and color the nodes blue with the node coloring option that appears on the right when you select positions in the window. Select positions 247 and 248 and make them yellow. Select positions 42 and 32 and make them Magenta. Finally, select positions 60, 84, and 98 and make them red. Click on ALL NODES in the Visualize alignment position in structure menu. Keep the default settings on the Visualize page and create your scene by clicking on VISUALIZE. Open the downloaded file in YASARA or PyMol.

Question 12: Do the nodes of the specificity related network cluster together in the structure? How about the other big network from the correlated mutation (coloured in blue) ?

Question 13: Do you think that position 60 really belongs to the specificity related network?

Question 14: What about the small separate network 247 and 248 (in yellow) for which we found one article describing effects on specificity? Might they actually belong to the specificity related network? And what about the other two small networks (e.g 32-42 and 153-251). Do you think they are related to the bigger network too?

In case you are using YASARA, load the 1F88A co-crystalized compound using via 3DM → Structures → Load structure from 3DM.

If you use Pymol, start the 3DM plugin using plugin → initialize plugin system. Then, plugin → legacy plugins → 3DM. A new window should start. Use your 3DM credentials to login and click in this window on 3DM menu → structures → load structures from 3DM, select 1F88A and only select Download drug.

Question 16: Do you think it makes sense to find the ligand there?

Families and numbering schemes

There are different families in the GPCR database. An alignment has been generated for each distinct GPCR class. These alignments are structure based and all have their own structural conserved core. Because these cores differ in length, the different alignments have their own numbering schemes. Via the Alignment menu at the top of the 3DM website you can switch between the different families.

In the left menu, click on System → System info to open a nice overview of the different family alignments.

Question 17: How many aligned sequences do you see in the class A family?

Question 18: How many sequences are in the other alignments?

Question 19: Group A is by far the largest group of GPCRs. Superfamily 3DM is a combination of all the classes. Why do you think the superfamily 3DM is not as big as all groups in total?

Now using the Alignment scroll down menu select Superfamily 3DM and look at where the conserved residues are (which 3D numbers) using the Alignment statistics page. Now do the same for the group A sequences (this you can do best in another tab so you can switch between the tabs).

Question 20: What is the first residue that is conserved > 95% in the full alignment? What is it's 3D number in the full alignment? And in the class A alignment? Can you explain this?

Select from the Numbering scheme scroll-down menu at the top of 3DM the gpcr_A numbering in both alignments.

Question 21: Compare the 3D numbers of the two conserved Cys residues again. What do you see? What is the number when using the class B numbering scheme?

Realise that between different 3D alignments, the 3D numbers can not be compared (you can noy select gprc_a numbers if you have the GPCR class B alignment selected). Because the size of the structural conserved cores differ between the different families, different numbering schemes have been developed for the different subclasses. Each of these numbering schemes indicates for which subclass the numbering is. Although it would have been even better if they had used a common numbering for the transmembrane parts that are shared by all classes, such that the conserved Cys would have had the same number in all classes. Although this is not the case, the commonly used numbering schemes designed for the GPCRs is a luxury that is only available in the GPCR protein family and has not been developed for any other protein family (as far as we know).

Making a structural model with 3DM

3DM contains an automated homology-modeling module. It uses the 3DM alignment between a sequence and a structure to generate models. At the protein detail page of a sequence you can find a MODELS tab. Here you can select a suitable template to generate a homology model of the protein. 3DM already makes a pre-selection of suitable templates based on sequence similarity. You can choose any of the pre-selected templates. Which of these templates is best is for you to decide. This decision should be based on what you want to do with your model. For instance, enzymes can be in the open or closed conformation. If you want your model in the closed conformation, then start with a template that is also in the closed conformation. Other factors can be e.g. with or without ligand, with an activating or inhibiting ligand bound in the ligand binding pocket, etc. So, always first investigate the different starting templates to see which one fits best to your needs.

Hirschsprung disease is a disease that is caused by mutations in a GPCR.

Question 22: Using the Literature option of the Search module, try to find a protein that is known to be involved in Hirschsprung disease.

Question 23: Go to the protein detail page of P24530. In 3DM you can simply click on the protein name. On this page you see MODELS tab where you can create a structural model of a protein. Explain the different numbering options that are available under Download models.

Select a template and make a model. Make sure that you have the GPCR class A alignment selected. Use YASARA or PyMol to open it. Creating the model may take a few minutes. If 3DM was not able to model parts of the sequence, these parts will be missing from your model. These missing positions are indicated as purple dots in YASARA and as lines in PyMol. Usually, this is because the alignment between the sequence and the template can not reliably be made (often the parts outside the core) due to very low sequence similarities. Realise that those parts can not be modeled reliably using the selected template, because the sequence similarity is so low that the two proteins will likely fold differently in those parts. Sometimes it helps to make a model choosing a different template (if available), but usually this means that those parts can simply not be modeled reliably.

Question 24: Find positions where mutations can cause Hirschsprung disease using the Literature hotspots option of the 3DM menu in YASARA or PyMol. How many positions can you find?

Visualize data in structures

To visualize data in a structure you can use the Visualize tool in 3DM. Here you can select multiple structures that are all superimposed. You can select one or more of the templates that are used to generate the alignment if you select the Show only templates option from the quick filter menu. If you activate the advanced setting, a PDB type can be selected. The Aligned setting gives any of the other structures that could be superimposed on the templates. The Non-aligned setting contains structures that were detected by BLAST but which 3DM could not superimpose on the templates. These structures usually do not belong to the superfamily, or are structurally too distantly related. Models that you have generated with 3DM can be selected from the Model PDB type setting. The name of a model indicates which protein is modeled, and between brackets you see which structure was used to make the model. In the Compounds section you can select any co-crystalized compound. These are also superimposed, so usually they will be in the pocket of the structures. You can select any combination of protein structure and ligand, so you can insert any ligand into any structure.

Question 25: Select a couple of structures and ligands. Click on, ADD POSITIONS in the Positions section, go to the CORRELATED MUTATIONS tab and select the top five correlated mutations. From the CONVERSATION tab, select all residues that are >90% conserved. From the CONTACTS tab, select all positions make at least 100 contacts with a ligand (note the different contact types). Click on VISUALIZE and open the resulting file. Did you get back what you expected?

Navigate to Alignment statistics in 3DM. Scroll down to the Keyword mutations plot. Here, you can enter a keyword to see how many mutations related to this specific keyword are found for a given position.

Make sure you have the GPCR class A family selected under Alignment at the top of the page and consult the Human variation/Position histogram. Compare this histogram with the Amino acid conservation histogram using the compare with option above the histogram. Set the cut-off for conservation on 50% and the SNP cut-off on 0,3.

Question 26: Do you find more SNPs at Conserved regions? Does this finding make any sense?

Question 27: Now compare the Human variation/Position histogram with the histogram of Ligand contacts. Do you see a common SNPs at a ligand contacting position?

Question 28: Reset the SNP graph by clicking Reset all. Move the cut-off to 1% using the slider bar and export the result to YASARA or PyMol using the Visualize structure button left to the slider bar. Which positions were exported to the structure? Which of these very common SNPs is located near the ligand binding pocket?

Note that the

button can be found at many places in 3DM for direct visualization of data in.

There are other export buttons too, such as the Export to hotspot basket button. Let's see how this works. Put the cut-off of the SNP graph again on 1 and open the hotspot basket tool by opening the Hotspots menu in the top right corner of 3DM. At the SNP histogram this sign will appear (possibly with a different numnber):

. This icon is for the insertion of selected positions in a hotspot basket. Click this button to upload the SNP positions into a new hotspot basket, give the basket a name, and save the basket.

Question 29: Go back to YASARA or PyMol and use the 3DM → Hotspot Baskets → Load a Hotspot Basket to open the basket that you just created. In PyMol you need to define which hotspots you want to visualize from the basket. Simply select all of them. Is the result the same as you had before?

A hotspot basket is nothing more than a selection of alignment positions. You can generate hotspots for different protein features (e.g. correlated mutations, specificity hotspots, thermostability hotspots, etc.) and those can be found using 3DM. At a later stage you can open the basket in different 3DM tools. Let's see how this works.

Question 30: Go to the Correlated mutations page in 3DM and use the keyword “specificity” in the Literature & Mutation window with a keyword cut-off of 2. Make sure you have the GPCR class A selected. Open the hotspot basket tool and select a NEW hotspot basket. Select the 9 positions of the sub-network that contains the mutations reported to effect specificity upon mutation. Click on the Add to hotspots button at the left top corner of the network visualisation to store these specificity related mutations in a NEW basket. Give it a logical name and save it. We will use it later on.

A good trick for increasing the thermostability of proteins is to make mutations at flexible regions. Go to the Alignment Statistics module of 3DM. Here you will find two histograms that display flexible positions, the RMSD (a measure for how tight the structures superimposed) and the average B-factor. The B-factor is a measure for how sharp the X-ray diffraction was for the atoms of a residue. A very sharp X-ray indicates that the amino acid is tightly positioned in the structure. The average B-factor is the average over all amino acids from all structures at an alignment position. Often these two plots show a similar pattern. If at a position both plots are high usually this is a good hotspot for changing thermostability.

Question 31:

A: Make sure you have the hotspot basket window open. Increase the B-factor cut-off using the slider bar until you have <20 positions selected and store these in a new basket. Open YASARA or PyMol and open this basket in any GPCR you prefer. Does the result make any sense?
B: Do you think these are positions that are likely good candidates to improve thermostability of GPCRs, or is there an artifact in the B-factor method in GPCRs? Can you think of a way to solve this?
C: Use the Literature hotspot option with the keyword “thermostability” in YASARA or PyMol to find mutations from the literature that have an effect on thermostability. Do these overlap with the B-factor hotspots?
D: Do you think this is a better way of finding hotspots for thermostability in this protein family?

Panel design

The idea of the panel design tool is to select sequences from the alignment such that the selected sequences are maximally distributed over the superfamily. This is done in two steps: First, the sequences of the superfamily are grouped. This can simply be based on sequence similarity (similar sequences are within the same group). Groups can also be based on sequence motifs found at user selected positions. The last option is used to group sequences based on a protein feature. For instance, the user can pick positions important for specificity. The idea is that sequences that have the exact same residues (the same motif) at those positions they are likely to have the same specificity. Both methods can be combined. In the second step a user defined number of sequences (usually one or two) are selected from each group. The selection step contains all kinds of options to maximise the chance that these proteins are likely to express. Let's see how it works.

Select the Panel design option in 3DM. First, we will divide the super-family based on sequence motifs. Because we want maximise the specificity range in the panel, we will use the "specificity hotspot" basket you have generated in question 30. This way, all sequences with the same motif at our specificity hotspots (thus with likely the same specificity) will be in one group. Use the Add hotspots option to select the hotspot basket that you made in question 30.

Once they are selected, click on SHOW GROUPS.

Question 32: Based on these criteria, into how many protein groups is the alignment divided?

Question 33: By default, the tool only selects groups with at least two members and deletes sequences that have a gap in one of the selected positions. What is the difference between "unique motifs" and "total protein groups"?

You can also divide the alignment phylogenetically. This separation can be combined to the motif grouping. In the box under Phylogenetic groups you can give a number. The superfamily will then be divided into this number of groups purely based on phylogenetic distances. Reload the panel design tool (refresh the page), reload the hotspot basket, type 10 in the Phylogenetic groups size box and click again on SHOW GROUPS.

Question 34: How many groups are added by this exercise?

After defining the groups, you have to select from each group the sequences that you would like to have in the panel. There are several selection options that can be used to pick sequences from each group. First, you select the number of sequences per group with the Proteins per group option in the expanded Protein selection panel. Then, several options can be used to determine which sequences are selected. These options are there to maximize the chance that selected sequences can be expressed. If literature is available for a sequence for instance, or if there is a structure available, the chance that this protein can be expressed is higher since someone else has done it before. The different selection options can be combined using Must have or Prefer options. Must have will result in a smaller number of groups, because not all groups will have a structure available. Usually, it is a good idea to start with a larger number of groups that you want to have in your panel and delete some groups with these different options. Now play with the different options and see if you understand the result. To see the effect that the different options have, you can click on SELECT PROTEINS. Make a panel of approximately 96 sequences for which you require SWISS-PROT proteins, literature- and structure data.

Note that you can make the total panel smaller by clicking the Panel size option. This will remove sequences by ensuring maximum diversity in the remaining panel sequences.

Sometimes you need to make a panel of a subset of sequences. For instance, say you want to find the most active enzyme with a certain specificity. In this case you should first make a subset that contains only sequences that have this specificity. This sounds simple, but due to wrong notation of proteins it is actually quite tricky. The best way to do this is to first do a keyword search to find enzymes that are likely to have the correct specificity and make a subset of this set of sequences. Use this subset to make a motif with 4 to 7 amino acids that is specific for this subset. It is best to use the Subset specific residues plot (consult the OAH questions for more information). Make a new subset that contains all sequences that have this motif. With this approach you will not only find sequences with the correct specificity that are annotated as hypothetical protein, but you will also delete the sequences that are wrongly annotated. This approach does not make sure you have all sequences with the correct specificity, but it does maximise the chance that the ones that are in your subset all indeed have the correct specificity.

Patent analysis tool

3DM contains an advanced tool for the analysis of patent data for complete protein families. How the tool works is explained in the Patent analysis presentation. Go through the slides and then try out the tool yourself in the GPCR protein family via the Patents option in the 3DM menu.

3DM also contains a tool for analysing patent data in a target sequence. You upload a target sequence to 3DM via System → Custom proteins → Add custom protein.

Designing mutation experiments with the de-convolution tool

The de-convolution tool is a protein-engineering tool that is made for designing mutation experiments for a specific target sequence. With this tool, many positions can be targeted at once without having to screen a large number of clones. The results of the measurements can be uploaded to the system and the system will calculate which mutations have the biggest effect on the feature(s) (e.g. activity or thermostability). The positive mutations can be combined to make the best performing sequence that can subsequently be used for a next round of engineering. Let's see how the tool works.

Go to Genomes (genomes.bio-prodict.nl) and login with these details:

Login: course_demo@bio-prodict.nl
Password: K7XlQAK23QvE8Bl%1i^L5#Ax460lC3

Once logged in you will see that we have set-up the de-convolution tool for one GPCR protein (P07550). At the right top you see an Erlenmeyer flask.

Click on the Erlenmeyer to open the tool.
Click on Create new project.
Give the project a name.
Select the target protein as the template of this protein engineering experiment.
Create the project
Click on this project once it is available (sometimes the page needs a refresh to make it visible).
Click on Add round to start designing the first round of mutations.
Give it the name Round 1 and save it.
Select Use the round 0 starting sequence (wild-type protein). Note that if you already have uploaded experiment data in a previous round, you can use a sequence from that round.
Click on Save base sequence.
Now we can select mutations to introduce to the sequence. Click on Mutations on the left.
We will use the hotspot basket generated previously. Select the Hotspots tab and select the thermostable basket.
Click on Add all with most frequent as variant type. This option ensures that each hotspot is mutated to the most common amino acid according to the alignment. If the target sequence already has the most common amino acid it selects the second most common residue.
Click Save.

Note that the left box is a tool for the selection of mutations. Once you have the desired mutations you can drag them into the box on the right. Now the system automatically generated the mutations for all of the hotspots based on amino acid occurrences, but you can also manually add mutations.

Select edit.
Select the "Manual" tab in the left box, type “V117C” and hit enter. Click Add all.
Save the mutations by clicking Save.
Click on Sequences on the left side of the page.
At Maximum # of mutations per sequence enter 4 and select Fill up each sequence to contain maximum # of mutations. This will ensure that all sequences will have 4 mutations. If you do not use this option, sequences can have single, double, and triple mutations. Be sure to check the option Create demo measurement data.
Select 96 sequences to generate and set the minimum number of observations to 2.
Click on Convolute mutations.

This convolution step calculates the best sequences to measure. The tool maximally distributes the mutations over the 96 sequences. In real life the 96 sequences are ordered at a gene synthesis company and the thermostability of these 96 clones are then measured in the lab. These measurements must be stored in an excel file which can be uploaded to the system. You can download the template of this file at the bottom of the sequence list (export CSV). In this file, you should enter the measured values and then upload the file via the "measurements" option on the left.

Because you are doing the course and we, of course, can not ask you to make the measurements in the lab we generate random measurement values for your 96 constructs and you can right away jump to the "analysis" option on the left.

Click on Analysis in the menu on the left.

To calculate the contributions of each mutation to the observed thermostability, click on Deconvolute measurements. When done, the results will appear in a chart. There is a lot of information in these charts, but for now, let's keep it simple. In the top chart, deselect all coloured boxes except the lm: single mutations option. This will leave only the contributions calculated from a simple linear model visible. For more details about the various models available, see box 1.

Question 35: What is the best performing mutation? Are there mutations that have a destabilising effect?

Clicking on the point in the chart will load a box with details about that mutation on the right side of the chart. Select the top-performing mutations. You can do this by dragging a box around them in the chart. Note that the selected mutations appear below the chart as Selected mutations. Click on Add to selection. This makes the selected mutations available for further use. After clicking, they end up in a box on the right of the chart. Here you can visualise them in the structure. Select a structure and make a visualisation. Review it in YASARA or PyMol.

Structure visualisations combined with detailed information about individual variants allow you to get a very thorough understanding about what might be happening in your protein.

Now, let’s build a novel round.

In the left bar, add a novel round by clicking on the Add round link.
Give your round a nice name and click on Create round.

Now you would like to use a different base sequence than what you used in the first round. It makes sense to pick the best performing sequence from the previous round as the starting point for the new round. Let's do that.

Click Select the best performing sequence from a previous Round based on a metric. Select the first round and pick the thermostability metric.
Hit the Save base sequence button.

Now it is time to pick the mutations that you want to test in the next round.

Click on Mutations in the bar on the left.
Open the Selections tab.

Here you can see the mutations that you selected in the previous round. These were the best performing mutations from your previous round. It makes sense to include them in this round.

Click on Add all to move them to the right.

Question 39: You might see warnings like this: 'wildtype of mutation M279V did not match round base sequence'. How can this happen?

Your list of mutations for the next round is not complete yet. What you would do now in a real life scenario is to use all the insights you gained by analysing the mutations you previously made to select novel ones. Which positions worked, which did not, etc. When your set of mutations is complete, you would generate novel sequences, measure them in the lab, analyse the individual contributions and build novel rounds. Iterate you way towards a great protein!

Box 1. Detailed descriptions of the statistical models used in ‘classic’ deconvolution.

Linear model: single mutations

The simplest model available is the linear model that takes into account only single mutations. This is a very straightforward model, basically just one formula in the form of:

EffectSize = a * mutation1 + b * mutation2 + c * mutation3 + ...

Given all the measurements (effect sizes) the numbers a,b,care optimised so that the final formula approaches the actual measured values best. The numbers a,b,c. are what you see in the deconvolution chart.

Glmnet: single mutations

This is basically a little smarter version of the linear model described above. What this model tries to do is the same as the linear model, but it tries to throw out all the parameters (or, mutations) that minimally affect the models performance. This model will usually consist of less mutations, and that is because it has thrown away mutations it deemed unnecessary. So, without those mutations, the model is still able to predict the actual measurements.

Same as with the linear model, the numbers a,b,c ... are what you see in the deconvolution chart.

Random forest regressor: single mutations

This is the least interpretable of the models I guess. The predictive model that tries to predict the measured values based on the mutations present in the sequences is a random forest model. This model consists of a number of 'decision trees' that are optimised to predict the measure value. A tree is basically a series of decisions, like "if mutation1 is present, but not mutation2 , predict measurement value x". The good thing about this is that interactions between mutations are inherently taken into account. The values that you see in the deconvolution chart are derived from the number of times a mutation is used in a "split" in the tree, together with the position of the split in the tree, where higher-up splits are more important.

Then there are a number of models that include only interactions, or interactions combined with single mutations. Interactions are terms in the models that basically 'group' multiple mutations as one term. So if you have a sequence where mutation1 and mutation2 are present, their interaction will be one term (e.g. mutation1+2 ). So in the case of a linear model you get something like

EffectSize = a * mutation1 + b * mutation2 + c * mutation3 + d * mutation1+2

This is an example of a model with both single mutations and interactions, but you can also do this of course with just the interaction terms.
The downside of including interactions is that you get a very large number of parameters and you tend to overfit very easily, which is why there is no linear model that includes interaction terms, but only glmnet ones.

Bio-Prodict Docs

3DM advanced: 3DM applied to GPCRs