Slide 1

BIOINFORMATIK I UEBUNGEN
http://icbi.at/bioinf
Organisation
•
3 Übungen
•
Kurze Einführung anschließend Labor
•
Protokoll (je 2 Studierende, elektronisch doc, pdf ..)
•
Abgabe der Übungen bis spätestens 17. Mai 2016
Termine
Übungsziele
•
Kennlernen biologischer Datenbanken (NCBI, …)
•
Arbeiten mit DNA/RNA/Protein-Sequenzen
•
Sequenzalignment (BLAST)
•
Arbeiten mit Genome-Browsern (UCSC, Ensembl)
•
Lösung praktischer Beispiele mit Online-Analyse
(keine Programmierübung)
Biologischer Informationsfluss
Chromsome, Chromatin, DNA
DNA
Nomenklatur von Nukleinsäuren
Base
Symbol
Occurrence
Adenin
Guanin
Cytosin
Thymin
Uracil
A
G
C
T
U
DNA, RNA
DNA, RNA
DNA, RNA
DNA
RNA
Symbol
Meaning
Description
R
Y
W
S
M
K
H
B
V
D
N
A or G
C or T
A or T
G or C
A or C
G or T
A, C, or T (U)
G, C, or T (U)
G, A, or C
G, A, or T (U)
G, A, C or T (U)
puRine
pYrimidine
Weak hydrogen bonds
Strong hydrogen bonds
aMino groups
Keto groups
not G, (H follows G)
not A, (B follows A)
not T (U), (V follows U)
not C, (D follows C)
aNy nucleotide
Nomenklatur
DNA sequences are always from 5‘ to 3‘
+ strand
- strand
5´-ACGGTCGCTGTCGGTAGC-3´
3´-TGCCAGCGACAGCCATCG-5´
e.g. in fasta format :
>gene sequence|gi12345|chr17|GCTACCGACAGCGACCGT
Positions in the genome (genome assembly) are chromosome wise
e.g. human GRCh37/hg19
chr11:1-100
chr11:49,686,777-49,689,777
Positions in the chromosome start for both!! strands from position 1
chr11:1
+ strand
- strand
2523
2529
5´-ACGGTCGCTG…………TCGGTAGC-3´
3´-TGCCAGCGAC…………AGCCATCG-5´
chr11:1
2523
2529
Regulation of transcription
mRNA processing
Translation, genetic code and reading frames
Peptid chain, amino acid sequence, proteins
backbone
sidechains
Protein sequences are always form N-terminal end to C-terminal end
E.g.. SCD sequence in fasta format
National Library of Medicine (NLM)
National Center for Biotechnology Information (NCBI)
•
NIH (National Institute of Health)–Campus in Bethesda, Maryland, USA
(gegründet 1836 - Budget >30 Mrd $)
PubMed
•
•
•
•
http://www.pubmed.gov
Datenbank wurde entwickelt um Zugang zu Zitaten und Abstracts
biomedizinischer Literatur zur Verfügung zu stellen (MEDLINE)
>24 Mio Einträge von >5000 Journalen
>700 Mio Online Suchen pro Jahr
GenBank
•
Datenbank zur Verwaltung von Sequenzdaten
•
Frei zugänglich (Nucleotide)
•
Täglicher Datenaustausch mit EBI und DDBJ
•
Neuer „Release“ alle zwei Monate
•
>300.000 organisms (species)
Entrez
RefSeq
•
Best, comprehensive, non-redundant set of sequences
• For genomic DNA (NG_), transcript mRNA (NM_), other RNA (NR_)
and protein (NP_)
• For major research organisms (2645 organisms)
• Based on GenBank derived sequences
• Ongoing curation by NCBI staff and collaborators, with review status
indicated on each record (computational XM_, XP_)
Gene
•
One record represents one single gene from an organism
•
Gene-specific information such as map, sequence, expression, structure,
function, homology, publications, links
•
Can have one or more Refseq transcripts assigned (NM_)
•
Official gene symbol and name, GeneID, aliases and other designations
OMIM
•
Online Mendelian Inheritance in Men
•
Bibliographisches, krankheitszentriertes Kompendium
•
Ursprünglich Buchform (MIM, Johns Hopkins University)
•
Tägliche Updates
•
Für Ärzte, Wissenschafter, Studenten und Ausbildner
•
Links zu vielen Datenbanken (Literatur, Sequenzen...)
Insulin
•
Polypeptid-Hormon
•
Bildung: Betazellen der Langerhansinseln im Pankreas
(Bauchspeicheldrüse)
•
51 Aminosäuren (2 Ketten)
•
A mit 21 AS
•
B mit 30 AS
•
Schweineinsulin (1 AS unterschiedlich)
•
Rinderinsulin (3 AS unterschiedlich)
•
Glucosetransport in die Zelle und Blutzuckerregulation
•
Hemmt in der Fettzelle Lipolyse und fördert Lipogenese
•
In Leber und Muskelzelle wird Glykogenaufbau gefördert
Proinsulin
Vom Preproinsulin zum Insulin
Insulin als Medikament
•
Verwendung von Schweine- und Rinderinsulin
•
•
Bildung von Antikörpern & allergische Reaktionen möglich
Versorgung eines Diabetikers: 50 Pankreata/Jahr
•
Gentechnische Herstellung mit rekombinanter DNA Technologie
•
Unterschiedliche Wirkungsdauer (zB. Dissoziation von
Insulinhexameren) und Insulinanaloga
Exercise 1-1: Find difference between insulin sequence in pig and human
Sus scrofa (pig)
Homo sapiens (human)
DNA sequence
(AY242110)
NCBI Nucleotide (Genbank)
Notepad++
1. extract cds of insulin
blastn
fasta file
blastn
3. Compare cds with mRNA
7. genomic location and adjacent genes
NCBI Gene
using Refseq ID
blastn
2. extract mRNA seq of insulin
NCBI Gene
using Refseq ID
fasta file
Notepad++
4. extract protein sequence of insulin
fasta file
5. extract mRNA seq in human
NCBI Gene
using Refseq ID
6. extract protein sequence of insulin
fasta file
Notepad++
blastp (align two sequences)
8. compare protein sequences
Notepad++
Signal P server
9. find signal peptide
Exercise 1-2: Find information on
SICKLE CELL ANEMIA and KABUKI SYNDROM
2.1
2.2
2.3
2.4
2.5
2.6
2.7
Which genes/proteins are involved?
On which chromosome (arm, cytogenetic band) genes are located?
What is the position and strand on the human reference genome assembly?
Can these genes also found in the mouse (location)?
Are there common mutations i.e. non-synonymous SNPs known?
What is the function of the encoded proteins?
Find recent publications