Optimization Energy Landscapes Protein Folding Course will be introduce mathematical/theoretical concepts and demonstrate their relevance to practical biological problems Pre-requisite: knowledge of Computational Chemistry 1 lecture Course tries to minimize overlap with Computational Chemistry 2 lecture 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 1 Content 1 Introduction Biomolecular systems: Proteins, membranes, phenomena of protein folding, protein complexes 2 Protein folding on lattices Review of statistical thermodynamics (deltaG, deltaS) Exact enumeration of all states Folding via Monte-Carlo algorithm, which moves? Folding funnel Roughness of the energy landscape 3 Protein folding on lattices (II) HPCC Algorithmus à la Ken Dill, work by Rolf Backofen 4 Calculation of energies in biomolecular systems (do we need this?) Molecular force fields, solvent effect Replace by lecture on membrane protein structure and folding? 5 Off lattice protein folding simulations involving all atom simulations MD simulations characterization of the free energy landscape for folding Replica exchange simulations Restraints to generate partially unfolded states 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 2 Content (II) 6 Calculation of chemical rates Transition state theory Kramer theory Folding at home 7 Diffusion: Smoluchowski equation Langevin equation → Ermak-McCammon-algorithm for brownian dynamics 8 Application: Association kinetics of protein A with protein B Energy landscape for 6 degrees of freedom (3× translation, 3× rotation) Computation of kon rates from Brownian dynamics simulations Calculation of entropies from trajectory analysis Compare boltzmann-weighted energies for protein B on lattice with protein A 9 Protein Assemblies 10 Electron transfer (Marcus theory), proton transfer 11 Photo physics of photoactive molecules Conformational dynamics on electronic surfaces, conical intersections 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 3 Literature lecture slides will be available 0-2 days prior to lecture suggested reading: links will be put up on course website http://gepard.bioinformatik.uni-saarland.de/teaching... 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 4 Schein = successful written exam The successful participation in the lecture course („Schein“) will be certified upon successful completion of an oral exam in February/March 2006. Participation at the oral exam is open to those students who have mastered the 3 - 4 assignments. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 5 literature 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 6 My systems of interest Proteins - folding landscape - membrane proteins recent progress on folding of membrane proteins! Protein assemblies - molecular machines (stable complexes) - transient complexes Membranes - formation - dynamics Protein membrane association Partitioning of proteins in membranes 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 7 Das Rätsel der Proteinfaltung I Was ist das Problem? „Levinthal‘sches Paradoxon“ II Lösung: Energielandschaft hat die Form eines Faltungstrichters Studium der Energielandschaft mit Gittersimulationen III gegenwärtiges Neuland ungefaltete Proteinabschnitte Proteinmissfaltung im Prion-Protein 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 8 Levinthal-Paradoxon Für ein Protein mit 100 AS und jeweils 2 Konformationen für jede Aminosäure ergeben sich 2100 = 1.27 x 1030 mögliche Konformationen des Proteins. Wenn das Protein 10-13 sec brauchen würde, jede einzelne Konformation abzusuchen, zu „samplen“, dann würde es 10-13 x 1.27x1030 = 1.27 x 1017 s = 4 x 109 Jahre brauchen bis es alle seine Konformationen abgesucht hätte und eventuell die energetisch günstigste gefunden hätte. Dies ist offensichtlich nicht möglich. Daher muß es Faltungshilfen oder spezielle Faltungspfade geben, so dass das Protein nicht alle theoretisch mögliche Zustände absuchen braucht. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 9 Faltungspfade Es gibt mehrere Hypothesen für die driving forces der Proteinfaltung: • hydrophober Kollaps; die entfaltete Proteinsequenz kollabiert in einen kompakten Klumpen. Anschließend falten sich die Sekundärstrukturelemente und bilden sich die richtigen/optimalen dreidimensionalen Kontakte um eines der zulässigen Faltungsmuster (folds) anzunehmen. ODER • die Sekundärstrukturelemente falten sich zunächst selbständig (framework model) und lagern sich anschließend zusammen. Für beide Faltungsszenarien gibt es experimentelle Beispiele. Oft liegt die Wahrheit “in der Mitte”. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 10 „New view of protein folding“: Faltung entlang trichterähnlichen Energielandschaften Bryngelson, Wolynes, PNAS (1987) Gradient beschleunigt Faltung Rauhigkeit bremst Faltung “Frustration” Brooks, Gruebele, Onuchic, Wolynes, PNAS 95, 11037 (1998) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 11 Energielandschaften (H. Frauenfelder/UIUC) Links ein sehr einfache und rechts eine sehr komplizierte Energielandschaft links, Energielandschaft von Ammoniak, NH3. Die konformationelle Koordinate (xAchse) beschreibt den Abstand des Stickstoffatoms von der Ebene der 3 Wasserstoffatome. rechts , Eine stark vereinfachte Energielandschaft eines Proteins. In Wirklichkeit ist die Energielandschaft eine Funktion von 3N Koordinaten, wobei N (die Anzahl der Atome des Proteins) sehr groß ist. Frauenfelder & Leeson, Nature Structural Biology 5, 757 - 759 (1998) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 12 Moleculare Chaperone: Proteine, die anderen globulären Proteinen helfen, ihre korrekte Faltung einzunehmen “molekulares Rotes Kreuz” • Molekulare Chaperene wie hsp60 oder GroEL (rechts gezeigt) sind eine Klasse von Proteinen, die in der Zelle anderen Proteinen helfen, ihre korrekte Faltung einzunehmen • Dazu können molekulare Chaperone sehr effektiv an nach außen gewandte hydrophobe Regionen von teilweise gefalteten Strukturen binden. • “In die Jacke helfen”. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 13 Fold Optimierung • Einfache Gittermodelle (HP-Modelle) – Zwei Sorten von Seitenketten: hydrophob und polar – 2-D oder 3-D Gitter – Treibende Kräfte: hydrophober Kollaps – es ist günstig, Kontakte zwischen hydropoben Seitenketten zu bilden – Bewertung = Anzahl an HH Kontakten 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 14 HP-Gittermodelle Ken Dill ~ 1997 Vorteil solch einfacher Modelle: man kann den Konformationsraum systematisch absuchen. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 15 The importance of being unfolded? Anscheinend sind nicht wenige Proteine der Zelle einen Großteil der Zeit teilweise entfaltet (P.E. Wright, H.J. Dyson, J. Mol. Biol. 293, 321 (1999)) Dies klingt sehr unerwartet. Was wären mögliche biologische Vorteile davon? (1) Entfaltete Proteine können schneller abgebaut werden kann für Regulation eines schnellen Zellzyklus erforderlich sein. (2) Molekulare Erkennung ist schneller, wenn Faltung und Bindung gekoppelt sind (3) Loopstrukturen können viele biologische Targets erkennen wichtig für Kommunikation und Regulierung bzw. Bildung großer Komplexe? (4) Entfaltete Proteine können schnell in andere Zellkompartments transportiert werden. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 16 NORS regions: no regular secondary structure NORS regions are defined to have at least 70 consecutive residues with less than 12% regular secondary structure (helix or strand). Rost and co-workers found 4 types of proteins. (A) Connecting loops: long loops that connect two domains or chains (shown Formate Dehydrogenase H, 1AA6). (B) Loopy ends: long N- or C-terminal regions that lack regular secondary structure (shown Hexon from adenovirus type 2, 1DHX). (C) Loopy wraps: long loopy regions wrapping around globular domains (shown Class II chitinase, 2BAA. (D) Loopy domains: entire structures that have almost no regular secondary structure (shown extra-cellular domain of T beta RI, 1TBI). 1. Lecture SS 20005 Liu, Tan, Rost, J Mol Biol (2002) 332, 53-64 Optimization, Energy Landscapes, Protein Folding 17 Many NORS regions predicted in proteomes Liu et al. predicted many NORS regions in 31 entirely sequenced organisms. NORS proteins appeared particularly abundant in eukaryotes. (A) gives the percentage of proteins in respective proteome for which at least one NORS region is predicted. High enrichment in eukaryotic proteomes! (B) illustrates the percentage of all the residues of the respective proteome for which a NORS region is predicted. (C) gives the percentage of all predicted NORS regions that are between N and N+10 residues long (note that, by definition, NORS regions are longer than 70 residues). Surprisingly, almost 15% of all the predicted NORS regions extend over more than 200 residues (inset of C). 1. Lecture SS 20005 Liu, Tan, Rost, J Mol Biol (2002) 332, 53-64 Optimization, Energy Landscapes, Protein Folding 18 NORS regions use particular amino acids The height of the one-letter amino acid code is proportional to the abundance of the respective acid in each data set. The actual value is the difference in occurrence with respect to the frequency observed in a sequence-unique subset of PDB: p p2 z 1 P P . Inverted letters indicate acids that are less frequent than 'expected'. The amino acids are sorted by 'flexibility' , with the more rigid ones on the left. Overall, NORS regions are as abundant in more flexible residues as loop regions in PDB . However, we found considerably more Serine (S), Glutamine (Q), and Glycine (G) and considerably fewer Arginine (R), Aspartic acid (D), Glutamic acid (E), Tryptophan (W), and Phenylalanine (F) in NORS regions than in loop regions, in general. 1 2 Liu, Tan, Rost, J Mol Biol (2002) 332, 53-64 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 19 Prion: ein ungeklärtes Beispiel für misgefaltete Proteine c Das Prion-Protein PrP : ist ein normales zelluläres Glycoprotein - ist an die Plasmamembran über einen GPI-Anker angehängt - hat 209 Aminosäuren Seine genaue Funktion ist unbekannt. Cu2+ Speicherung, Erinnerung? Struktur aus NMR-Bestimmungen bekannt: Die N-terminale Region 23-120 ist sehr flexibel und meist ungeordnet. C-terminale Region enthält 3 -Helices, 2 kurze -Stränge PrPc wird schnell durch Proteinase K abgebaut 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 20 Die mit Krankheit assoziierte Form PrPsc PrPsc: oligomerische -reiche Struktur teilweise Resistenz gegenüber Verdau durch Proteinase K starke Tendenz, in unlösliche Plaques zu aggregieren die 3D-Struktur von PrPsc ist nicht bekannt! Nur-Protein Hypothese (Prusiner 1980s und 1990s): der Umfaltungsprozeß PrPc PrPsc wird durch PrP Protein autokatalysiert Stanley Prusiner, Nobelpreis für Physiologie oder Medizin 1998 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 21 Modelle für die Bildung von PrP-res aus PrPc Caughey Trends Biochem Sci 26, 235 (2001) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 22 Modelle, die auf Polymerisation beruhen Caughey Trends Biochem Sci 26, 235 (2001) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 23 Gegenwärtiges Verständnis von Prionen Die molekularen Mechanismen für die Umordnung von PrPc nach PrPsc sind immer noch unklar. Theoretische Methoden konnten (leider ) noch nicht viel beitragen. Der Übergang PrPc PrPsc ist ein kooperatives Phänomen. Daher kann man es wohl nicht durch die Untersuchung von PrP Monomeren verstehen. Das „Seed“-Modell scheint plausibel. Der Übergang nach PrPsc könnte über ein Faltungsintermediat I gehen. Dies würde erklären, warum Mutanten anfällig für Krankheiten sind, bei denen diese Faltungsintermediate stärker besetzt ist bzw. bei denen der Grundzustand (F) weniger stabil gegenüber I ist als bei Gesunden. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 24 Fluid-Mosaic-Model of the cell membrane Like a mosaic, the cell membrane is a complex structure made up of many different parts, such as proteins, phospholipids and cholesterol. The relative amounts of these components vary from membrane to membrane, and the types of lipids in membranes can also vary. The membrane structure is highly dynamic. Its viscosity is only about 100 times larger than that of water. http://www.nature.com/horizon/livingfrontier/background/membrane.html 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 25 Membrane bilayers Edidin, Nature Reviews Cell Biol 4, 414 (2003) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 26 Membrane bilayers Membranes are not structureless. „Domains“ or „lipid rafts“ rich in cholesterol and sphingo-lipids may form transiently. Edidin, Nature Reviews Cell Biol 4, 414 (2003) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 27 How do helical membrane proteins fold? White, FEBS Lett. 555, 116 (2003) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 28 Hydrophobicity Scales White, FEBS Lett. 555, 116 (2003) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 29 Translocon-assisted folding of TM proteins? White & von Heijne, Curr Opin Struct Biol 14, 397 (2004) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 30 Translocon crystal structure of translocon in closed state. White & von Heijne, Curr Opin Struct Biol 14, 397 (2004) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 31 Types of TM-proteins orientation of C- and N-terminus depends on charge. Cytoplasm contains more negatively charged lipids. By mutating the charges one can invert topology. White & von Heijne, Curr Opin Struct Biol 14, 397 (2004) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 32 Folding paradigm Back to the folding models of soluble proteins (hydrophobic collapse vs. framework model). Obviously, hydrophobic collapse doesn‘t apply here. Using FRET labels (fluorescent non-natural amino acids) it could be shown that the newly synthesized peptide assumes a compact = partially folded structure. White & von Heijne, Curr Opin Struct Biol 14, 397 (2004) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 33 Insertion of TM helices into bilayer This is an ingenious experiment to identify the code for TM helix Two glycolization sites engineered around H. partioning into the bilayer. If H is inserted in membrane only G1 is glycosilated, otherwise G1 and G2. Hessa et al , Nature 433, 377 (2005) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 34 Hydrophobicity scales Results from this work correlate well with partitioning of peptides between water and octanol (Fig c) partioning of TM helices into membrane is determined by standard physico-chemical principles. Hessa et al , Nature 433, 377 (2005) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 35 Open and closed complexes distinguish between two different types of supra-molecular complexes: Closed complexes are relatively stable assemblies of different molecules with a fixed stoichiometry, resulting in large molecular machines like ribosomes, polymerases and ATPases. Although these complexes may be dynamic due to their respective function (like capturing and releasing elongation factors for ribosomes or transient phosphorylation for allosteric proteins), they have a well defined structure and are degraded only as a whole (typically by proteasomes after ubiquitylation). In contrast, open complexes are in a constant exchange of their molecular components with the environment. Both the total number of components and their relative stoichiometry can vary within a certain range. A typical example are the cytoplasmic plaques of focal adhesions, which have typical lifetimes of minutes to hours, while the turnover time for the single proteins building up the plaque is on the order of seconds. In contrast to closed complexes, open complexes are not assembled and degraded as a whole, but in a gradual way. 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 36 Focal adhesion points Focal adhesions are the most prominent sites of adhesion when cell-matrix adhesion is studied on rigid surfaces (glass or plastic). In a physiological (soft) environment, similar sites of adhesion exists, although they tend to be smaller and of somehow different molecular composition. Focal adhesions consist of four layers (see Fig. from bottom to top): - an external layer of ECM ligand, - a layer of transmembrane receptors from the integrin family, - a cytoplasmic plaque consisting of more than 50 different proteins, and - a layer of actin connecting the focal adhesion to the cytoskeleton. Focal adhesions strongly signal to the cytoskeleton, mainly through the small GTPases from the Rho family. They also trigger other signalling pathways like the MAP kinase pathway, thus influencing gene expression and cell fate. Focal adhesions are also the main sites for force transmission between the extracellular environment and the cell. They seem to function as mechanosensors which convert both internal and external force into protein aggregation and signalling. In particular, cells might sense the mechanical properties of their environment by actively pulling on it through actomyosin contractility and focal adhesions. How can one model all this? 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 37 Virus assembly: idealized examples of closed complexes (a) (b) (c) Schematic representation of a T = 3 quasiequivalent lattice, corresponding to a rhombic triacontahedron, the geometrical architecture of black beetle virus (BBV). Each of the trapezoids represents a single subunit with the same amino acid sequence. The T = 3 particle is formed of 180 subunits that lie in three structurally unique positions (labeled A, B, and C). Subunits labeled with same letter are related by icosahedral symmetry axes corresponding to twofold, threefold, and fivefold rotations identified by white ovals, triangles, and pentagons, respectively. Subunits marked with different letters are related to one another by quasisymmetry axes corresponding to twofold and threefold local rotation axes identified, respectively, by yellow ovals and triangles. The subunits labeled A, B, and C are related by quasi-threefold symmetry; they form an icosahedral asymmetrical unit (protomer) of the T = 3 particle. pseudo T = 3 surface lattice. In this lattice there are three types of trapezoids (VP1, VP2, and VP3) representing subunits with different amino acid sequences. The subunits identified by the same label are related by icosahedral symmetry elements, twofold, threefold, and fivefold, identified by white ovals, triangles, and pentagons. black beetle virus (BBV). blue, red, green = A, B, and C subunits. The average diameter of the particle is 312 Å. Icosahedral and quasisymmetry elements are identified by white and yellow labels. (d) icosahedral asymmetrical unit (protomer) of BBV made up of the A, B, and C subunits and a strand of partially ordered RNA of 10 bases. Reddy et al. , Biophys J 74, 546 (1998) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 38 Virus assembly: compute energies of intermediates (a) A table showing the top three preferred configurations for each association of subunits in the computed assembly pathway for BBV, with the monomer as the assembling unit. The first column shows the number of associating monomers. Columns 2, 4, and 6 show a schematic of the three best structures for each association. G12 and G23 refer to the negative differences of the association energies of the first and second and second and third configurations. (b) The preferred structures, with the trimer as the assembling unit. It is important to note that the best configurations for both assembly pathways are nearly always the same; in some cases even the second best is the same, emphasizing that the trimer is the likely assembling unit. An exception is the best structure of the 15mer association. In this case the most stable monomer assembly is not made up of a multiple of protomers, but its preference, compared to the second and third most stable structures, which are made of protomers, is marginal. Reddy et al. , Biophys J 74, 546 (1998) 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 39 Summary - Protein folding problem is well-isolated problem, almost „classical“ - some aspects are reasonably well understood - interest currently widens towards studying multi-protein assemblies, superstructural units - few concepts available, learn from protein folding field? - many interesting phenomena involve membranes 1. Lecture SS 20005 Optimization, Energy Landscapes, Protein Folding 40