Research

Dark protein space

Nature has explored only a tiny part of all possible sequences that can be constructed from the canonical amino acid alphabet. This project explores the structural potential of the unevolved sequence space and tests early hypotheses that assumed that existing proteins adopt specific three dimensional folds because natural selection has evolved rare sequences with such ability, while it was implied that unevolved sequences would likely be disordered.

Effect of the amino acid alphabet on protein structure


The canonical amino acid alphabet has remained largely unchanged ever since it arose, more than 3 billion years ago. Diverse lines of evidence indicate that earlier life built proteins using a smaller alphabet, and that other amino acids were available for use from different potential prebiotic sources. Apparently, an “early amino acid alphabet” of approximately 10 members was used to construct early proteins and peptides and the remaining canonical amino acids developed through pathways of biochemical synthesis, after the genetic code originated. This project studies the relationship between the evolution of the amino acid alphabet, and the consequent repertoire of protein structures, using both bioinformatics and experimental techniques.

Protein reverse evolution

It is largely unknown if contemporary proteins could be made entirely of “early” amino acid alphabet and how the effect would differ for different natures of protein folds. However, many lessons from protein design have agreed that functional proteins can be built using approximately half of the canonical amino acid alphabet. This project aims to explore whether the early amino acid alphabet has the ability to support comparable protein structures/functions as the current alphabet (and the genetic code is redundant in this respect) or whether the code had to expand in order to encompass the protein world repertoire as we know it.