www.fgks.org   »   [go: up one dir, main page]

Skip to content
This repository has been archived by the owner on Jul 30, 2019. It is now read-only.

Learning Protein Constitutive Motifs from Sequence Data: RBM toolbox

License

Notifications You must be signed in to change notification settings

elifesciences-publications/ProteinMotifRBM

 
 

Repository files navigation

ProteinMotifRBM

This code is associated with the paper from Tubiana et al., "Learning protein constitutive motifs from sequence data". eLife, 2019. http://dx.doi.org/10.7554/eLife.39397

Summary

Restricted Boltzmann Machines are graphical models that jointly learn a probability distribution and a representation of data. We have recently shown (see https://arxiv.org/abs/1803.08718) that RBM can be tailored to model efficiently distributions of protein sequences within multiple sequence alignments.

The features inferred by the model are sequence motifs located on coevolving sites that reflect the various structural, functional and phylogenic constraints of the protein. For instance, a sequence motif may be located on two or more sites that are distant in the sequence but in contact on structure and coevolve, e.g. so as to maintain opposite charges. The features may also be related to the protein’s functionality: we find feature localized on a protein’s binding loops, and whose input distribution can separate protein subclasses with different function.

Furthermore, RBM define a probability distribution over sequences that includes pairwise and higher-order interactions. The learnt probability can be used for sequence scoring, contact map prediction and sequence generation. Combining Interpretable features with sequence generation allows to generate sequences with prescribed phenotype.

Examples:

We provide Jupyter notebooks for training RBM on the Kunitz domain, WW domain, Hsp70 protein and Lattice Proteins, as well as for reproducing the figures 2-7 of the article. Structures were visualised using the software VMD https://www.ks.uiuc.edu/Research/vmd/ (not included). Please see the examples in the notebooks for an introduction to the package.

Learning Protein Constitutive Motifs of the WW Domain

More specifically:

  • Training RBMs: See WW, Kunitz, LP, Hsp70 notebooks
  • Visualizing weight logos, input distributions,…: See WW, Kunitz, LP, Hsp70 notebooks
  • Contact Prediction: See Kunitz notebook.
  • Sequence scoring (likelihood function): See WW, Kunitz, LP notebooks.
  • Sequence generation: See WW, Kunitz, LP notebooks.

Installation:

The package requires a standard Python2.7 installation with numpy, cython, matplotlib,… as well as jupyter notebook. See e.g. https://www.anaconda.com/download/

To run the Hsp70 protein example, please download first the alignment, data & model by running first:

sh download_Hsp70_data.sh

About

Learning Protein Constitutive Motifs from Sequence Data: RBM toolbox

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.0%
  • Python 10.0%