ProteinMotifRBM

This code is associated with the paper from Tubiana et al., "Learning protein constitutive motifs from sequence data". eLife, 2019. http://dx.doi.org/10.7554/eLife.39397

Summary

Restricted Boltzmann Machines are graphical models that jointly learn a probability distribution and a representation of data. We have recently shown (see https://arxiv.org/abs/1803.08718) that RBM can be tailored to model efficiently distributions of protein sequences within multiple sequence alignments.

The features inferred by the model are sequence motifs located on coevolving sites that reflect the various structural, functional and phylogenic constraints of the protein. For instance, a sequence motif may be located on two or more sites that are distant in the sequence but in contact on structure and coevolve, e.g. so as to maintain opposite charges. The features may also be related to the protein’s functionality: we find feature localized on a protein’s binding loops, and whose input distribution can separate protein subclasses with different function.

Furthermore, RBM define a probability distribution over sequences that includes pairwise and higher-order interactions. The learnt probability can be used for sequence scoring, contact map prediction and sequence generation. Combining Interpretable features with sequence generation allows to generate sequences with prescribed phenotype.

Examples:

We provide Jupyter notebooks for training RBM on the Kunitz domain, WW domain, Hsp70 protein and Lattice Proteins, as well as for reproducing the figures 2-7 of the article. Structures were visualised using the software VMD https://www.ks.uiuc.edu/Research/vmd/ (not included). Please see the examples in the notebooks for an introduction to the package.

More specifically:

Training RBMs: See WW, Kunitz, LP, Hsp70 notebooks
Visualizing weight logos, input distributions,…: See WW, Kunitz, LP, Hsp70 notebooks
Contact Prediction: See Kunitz notebook.
Sequence scoring (likelihood function): See WW, Kunitz, LP notebooks.
Sequence generation: See WW, Kunitz, LP notebooks.

Installation:

The package requires a standard Python2.7 installation with numpy, cython, matplotlib,… as well as jupyter notebook. See e.g. https://www.anaconda.com/download/

To run the Hsp70 protein example, please download first the alignment, data & model by running first:

sh download_Hsp70_data.sh

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
RBM		RBM
data		data
models		models
utilities		utilities
Figure_Lattice_Proteins.png		Figure_Lattice_Proteins.png
Hsp70 Protein Notebook.ipynb		Hsp70 Protein Notebook.ipynb
Kunitz Domain Notebook.ipynb		Kunitz Domain Notebook.ipynb
LICENSE.txt		LICENSE.txt
Lattice Proteins notebook.ipynb		Lattice Proteins notebook.ipynb
README.md		README.md
WW Domain Notebook.ipynb		WW Domain Notebook.ipynb
download_Hsp70_data.sh		download_Hsp70_data.sh
figure_readme.png		figure_readme.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProteinMotifRBM

This code is associated with the paper from Tubiana et al., "Learning protein constitutive motifs from sequence data". eLife, 2019. http://dx.doi.org/10.7554/eLife.39397

Summary

Examples:

Installation:

About

Releases

Packages

Languages

License

elifesciences-publications/ProteinMotifRBM

Folders and files

Latest commit

History

Repository files navigation

ProteinMotifRBM

This code is associated with the paper from Tubiana et al., "Learning protein constitutive motifs from sequence data". eLife, 2019. http://dx.doi.org/10.7554/eLife.39397

Summary

Examples:

Installation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages