TY - BOOK
T1 - Deciphering Transcriptional Regulation
T2 - Computational Approaches
AU - Valen, Eivind
N1 - Supervisors:
Assoc. Prof. Albin Sandelin
Prof. Anders Krogh
Assoc. Prof. Ole Winther
PY - 2010
Y1 - 2010
N2 - The myriad of cells in the human body are all made from the same blueprint: the humangenome. At the heart of this diversity lies the concept of gene regulation, the processin which it is decided which genes are used where and when. Genes do not functionas on/off buttons, but more like a volume control spanning the range from completelymuted to cranked up to maximum. The volume, in this case, is the production rate ofproteins. This production is the result of a two step procedure: i) transcription, in whicha small part of DNA from the genome (a gene) is transcribed into an RNA molecule (anmRNA); and ii) translation, in which the mRNA is translated into a protein. This thesisfocus on the ¿rst of these steps, transcription, and speci¿cally the initiation of this.Simpli¿ed, initiation is preceded by the binding of several proteins, known as transcription factors (TFs), to DNA. This takes place mostly near the start of the gene knownas the promoter. This region contains patterns scattered in the DNA that the TFs can recognize and bind to. Such binding can prompt the assembly of the pre-initiation complexwhich ultimately leads to transcription of the gene. In order to achieve the regulationnecessary to produce the multitude of tissues we observe, there exists a wide range ofthese TFs having different binding preferences and targeting different genes. By activating different TFs in a context dependent manner the organism can produce customizedsets of proteins for each cell resulting in different cell types.This thesis presents several methods for analysis and description of promoters. Wefocus particularly the binding sites of TFs and computational methods for locating these.We contribute to the ¿eld by compiling a database of binding preferences for TFs whichcan be used for site prediction and provide tools that help investigators use these. Inaddition, a de novo motif discovery tool was developed that locates these patterns inDNA sequences. This compared favorably to many contemporary methods.A novel experimental method, cap-analysis of gene expression (CAGE), was recentlypublished providing an unbiased overview of the transcription start site (TSS) usage ina tissue. We have paired this method with high-throughput sequencing technology toproduce a library of unprecedented depth (DeepCAGE) for the mouse hippocampus. Weinvestigated this in detail and focused particularly on what characterizes a hippocampuspromoter. Pairing CAGE with TF binding site prediction we identi¿ed a likely keyregulator of hippocampus.Finally, we developed a method for CAGE exploration. While the DeepCAGE library characterized a full 1.4 million transcription initiation events it did not capturethe complete TSS-ome of hippocampus. We ¿tted two statistical models to the CAGEdata and extrapolated how deep sequencing needs to be to capture most of the events.We concluded that while most genes are discovered, tag clusters and TSSs are not fullyexplored
AB - The myriad of cells in the human body are all made from the same blueprint: the humangenome. At the heart of this diversity lies the concept of gene regulation, the processin which it is decided which genes are used where and when. Genes do not functionas on/off buttons, but more like a volume control spanning the range from completelymuted to cranked up to maximum. The volume, in this case, is the production rate ofproteins. This production is the result of a two step procedure: i) transcription, in whicha small part of DNA from the genome (a gene) is transcribed into an RNA molecule (anmRNA); and ii) translation, in which the mRNA is translated into a protein. This thesisfocus on the ¿rst of these steps, transcription, and speci¿cally the initiation of this.Simpli¿ed, initiation is preceded by the binding of several proteins, known as transcription factors (TFs), to DNA. This takes place mostly near the start of the gene knownas the promoter. This region contains patterns scattered in the DNA that the TFs can recognize and bind to. Such binding can prompt the assembly of the pre-initiation complexwhich ultimately leads to transcription of the gene. In order to achieve the regulationnecessary to produce the multitude of tissues we observe, there exists a wide range ofthese TFs having different binding preferences and targeting different genes. By activating different TFs in a context dependent manner the organism can produce customizedsets of proteins for each cell resulting in different cell types.This thesis presents several methods for analysis and description of promoters. Wefocus particularly the binding sites of TFs and computational methods for locating these.We contribute to the ¿eld by compiling a database of binding preferences for TFs whichcan be used for site prediction and provide tools that help investigators use these. Inaddition, a de novo motif discovery tool was developed that locates these patterns inDNA sequences. This compared favorably to many contemporary methods.A novel experimental method, cap-analysis of gene expression (CAGE), was recentlypublished providing an unbiased overview of the transcription start site (TSS) usage ina tissue. We have paired this method with high-throughput sequencing technology toproduce a library of unprecedented depth (DeepCAGE) for the mouse hippocampus. Weinvestigated this in detail and focused particularly on what characterizes a hippocampuspromoter. Pairing CAGE with TF binding site prediction we identi¿ed a likely keyregulator of hippocampus.Finally, we developed a method for CAGE exploration. While the DeepCAGE library characterized a full 1.4 million transcription initiation events it did not capturethe complete TSS-ome of hippocampus. We ¿tted two statistical models to the CAGEdata and extrapolated how deep sequencing needs to be to capture most of the events.We concluded that while most genes are discovered, tag clusters and TSSs are not fullyexplored
M3 - Ph.D. thesis
BT - Deciphering Transcriptional Regulation
PB - Museum Tusculanum
ER -