Research highlights
DAGIC: Detecting Aberrant Genes in Interaction Context
Delineating the molecular drivers of cancer, i.e. determining cancer genes and the pathways which they deregulate, is an important challenge in cancer research. In this study, we aim to identify pathways of frequently mutated genes by exploiting their network neighborhood encoded in the protein-protein interaction network. To this end, we introduce a multi-scale diffusion kernel framework and apply it to a large collection of murine retroviral insertional mutagenesis data. The diffusion strength plays the role of scale parameter, determining the size of the network neighborhood that is taken into account. As a result, we identify densely connected components of known and putatively novel cancer genes and show that for different scale levels different functional enrichments and mutual exclusion patterns are observed. Taken together, this demonstrates the importance of analyzing gene mutation in the context of their interaction network in a multi-scale fashion. The source code of this work, DAGIC, for Matlab and R is available on http://bioinformatics.tudelft.nl/users/sepideh-babaei.People involved
Sepideh Babaei, Jeroen de Ridder, Marcel ReindersLaboratory evolution yields novel lactate transporters and aneuploidy
Laboratory evolution is a powerful approach in applied and fundamental yeast research, but complete elucidation of the molecular basis of evolved phenotypes remains a challenge. In this study, DNA microarray-based transcriptome analysis and whole-genome resequencing were used to investigate evolution of novel lactate transporters in Saccharomyces cerevisiae that can replace Jen1p, the only documented S. cerevisiae lactate transporter. To this end, a jen1Δ mutant was evolved for growth on lactate in serial batch cultures. Single-nucleotide changes were found in the acetate transporter gene ADY2, which were confirmed to mutate ADY2 into an efficient lactate transporter. Due to the strong selective advantage of having more copies of this novel lactate transporter, its gene became triplicated by formation of a novel isochromome III, carrying two additional ADY2 copies.Related publications
People involved
Jurgen Nijkamp, Dick de RidderThe genome sequence of a yeast for modern industrial biotechnology
Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains.Related publications
- De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology
- Integrating genome assemblies with MAIA
People involved
Jurgen Nijkamp, Dick de Ridder, Marcel van den Broek, Marcel ReindersMateriomics
It is increasingly recognized that material surface topography is able to evoke specific cellular responses, endowing materials with instructive properties that were formerly reserved for growth factors. This opens the window to improve upon, in a cost-effective manner, biological performance of any surface used in the human
body. Unfortunately, the interplay between surface topographies and cell behavior is complex and still incompletely understood. Rational approaches to search for bioactive surfaces will therefore omit previously unperceived interactions. Hence, in this projects, 2176 mathematically designed surface topologies were placed on chips of poly(lactic acid). Human mesenchymal stromal cells (hMSCs) were grown on the chips, and using high-content imaging and an analysis pipeline, surface effects on MSC proliferation and osteogenic differentiation were found. The analysis pipeline uses image processing and machine learning techniques to process raw chip images into results, such as best performing surfaces, surface parameters that play an important role in the tested biological process, and cell response predictors for new surfaces. Using robust statistics, quality measures and by exploiting surface similarity, we are able to significantly improve surface ranking consistency.Related publications
People involved
Marc Hulsman, Marcel ReindersSomatic structural rearrangements in genetically engineered mouse mammary tumors
This project reports on the first paired-end sequencing of tumors from genetically engineered mouse models of cancer to determine how faithfully these models recapitulate the landscape of somatic rearrangements found in human tumors. These were models of Trp53-mutated breast cancer, Brca1- and Brca2-associated hereditary breast cancer, and E-cadherin (Cdh1) mutated lobular breast cancer.
It is shown that although Brca1- and Brca2-deficient mouse mammary tumors have a defect in the homologous recombination pathway, there is no apparent difference in the type or frequency of somatic rearrangements found in these cancers when compared to other mouse mammary cancers, and tumors from all genetic backgrounds showed evidence of microhomology-mediated repair and non-homologous end-joining processes. Importantly, mouse mammary tumors were found to carry fewer structural rearrangements than human mammary cancers and expressed in-frame fusion genes. Like the fusion genes found in human mammary tumors, these were not recurrent. One mouse tumor was found to contain an internal deletion of exons of the Lrp1b gene, which led to a smaller in-frame transcript. We found internal in-frame deletions in the human ortholog of this gene in a significant number (4.2%) of human cancer cell lines.
Paired-end sequencing of mouse mammary tumors revealed that they display significant heterogeneity in their profiles of somatic rearrangement but, importantly, fewer rearrangements than cognate human mammary tumors, probably because these cancers have been induced by strong driver mutations engineered into the mouse genome. Both human and mouse mammary cancers carry expressed fusion genes and conserved homozygous deletions.Related publications
People involved
Christiaan Klijn, Lodewyk WesselsPredicting protein secretion success
The cell-factory Aspergillus niger is widely used for industrial enzyme production. Selecting enzymes for large-scale production requires costly lab work to test for successful high-level secretion of the over-expressed enzyme. To reduce the amount of lab work, we developed a sequence-based classifier that predicts successful high-level secretion of homologous proteins. This enables the selection of a subset of potential enzymes out of a large set of enzymes.
A dataset of 638 proteins was used to train and validate a classifier, using a 10-fold cross-validation protocol. Using a linear discriminant classifier, an average accuracy of 0.85 was achieved, which in practice could lead to half the amount of lab work.
Feature selection results indicate what features are mostly defining for successful protein production,
which could be an interesting lead to couple sequence characteristics to biological processes involved in protein production and secretion.Related publications
People involved
Bastiaan v.d. Berg, Jurgen Nijkamp, Marcel Reinders, Dick de RidderMAIA: Integrating genome assemblies
De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria.
The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies.Related publications
People involved
Jurgen Nijkamp, Wynand Winterbach, Marcel Reinders, Dick de RidderKC-SMART: Finding Significantly Recurrent Copy Number Changes
DNA copy number changes are a hallmark of tumor genomes. Cancers are prone to directed and random gain and loss of DNA. In this study we developed a method to separate the frequently occurring DNA copy number changes in a group of tumor samples from the random copy number changes. We do this in a statistically sound, unbiased manner which does not require any additional pre-processing of the data except for normalization. In addition, since we make use of a Gaussian kernel convolution frame work we are able to analyse the data in a scale space which allows for more in-depth discovery of important genomic locations of copy number change.
This work, KCsmart, is available as a package in the popular bioinformatics package Bioconductor for the statistical programming language R.Related publications
People involved
Christiaan Klijn, Jeroen de Ridder, Marcel Reinders, Lodewyk WesselsMolecular maps of the reorganization of genome – nuclear lamina interactions during differentiation
The three-dimensional organization of chromosomes within the nucleus and its dynamics during differentiation are largely unknown. To visualize this process in molecular detail, we generated high-resolution maps of genome – nuclear lamina interactions during subsequent differentiation of mouse embryonic stem cells via lineage-committed neural precursor cells into terminally differentiated astrocytes. This reveals that a basal chromosome architecture present in embryonic stem cells is cumulatively altered at hundreds of sites during lineage commitment and subsequent terminal differentiation. This remodeling involves both individual transcription units and multi-gene regions, and affects many genes that determine cellular identity. Often, genes that move away from the lamina are concomitantly activated; many others however remain inactive yet become unlocked for activation in a next differentiation step. These results suggest that lamina-genome interactions are widely involved in the control of gene expression programs during lineage commitment and terminal differentiation.People involved
Wouter Meuleman, Marcel Reinders, Lodewyk WesselsSearching for Collaborating Cancer Genes
Cancers are caused by an accumulation of multiple independent mutations that collectively deregulate cellular pathways, e.g. such as those regulating cell division and cell-death. Multiple independent mutations within one tumor hints towards a cooperation between the mutated genes. In this study we focus on the detection of statistically significant co-mutations, by analyzing a collection of publicly available retroviral insertional mutagenesis datasets.
We propose a two-dimensional Gaussian Kernel Convolution method (2DGKC), a computational technique that identifies the cooperating mutations in insertional mutagenesis data. We define the Common Co-occurrence of Insertions (CCI), signifying the co- mutations that are statistically significant across all different screens in the RTCGD. Significance estimates are made on multiple scales, and the results visualized in a scale space, thereby providing valuable extra information on the putative cooperation.Related publications
- Co-occurrence analysis of insertional mutagenesis data reveals cooperating oncogenes
- Discovering cooperating oncogenes by statistical co-occurrence analysis

