Lodewyk Wessels
Lodewyk Wessels
- T
- +31 20 5127987
- E
- l.wessels@nki.nl
Plesmanlaan 121
1066 CX Amsterdam
I spend one day per week (typically Tuesdays) at the Delft Bioinformatics Group. The rest of the time I hold a position as head of the Bioinformatics and statistics group at the Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital. Our group provides leadership on the collection and analysis of data for the research programs of the institute, by performing state of the art analyses of a wide array of data types, including laboratory and animal experiments, clinical trials, and epidemiologic studies. The members of the group also conduct research in bioinformatics and statistics, for example on stratifying tumors into groups with distinct and homogeneous outcome and therapy response; on the function of genes and pathways involved in tumorigenesis and understanding molecular regulatory mechanisms. A number of exemplary projects are presented below in more detail.
Extracting oncogenes and oncogenic pathways from insertional mutagenesis screens
To find oncogenic lesions which are collaborating events in tumorigenesis, we developed an approach to detect the significantly frequent co-occurrence of independent insertions within one tumor. We have extended this approach to detect combinatorial association logic networks (CALs): simple logic circuits which employ combinations of co-occurring and mutually exclusive insertions to predict the expression pattern of downstream targets. In classical one-dimensional analyses, direct interactions between the insertion patterns and transcription levels across tumors are detected. However, when the insertion loci themselves interact, direct associations between the individual loci and transcript levels may become undetectable. Therefore, our method detects associations between transcript levels and the outputs of small Boolean logic networks that combine multiple genetic loci. The detection of logic networks requires solving a demanding optimization problem. By reformulating the objective function and applying a customized branch and bound algorithm, we obtain runtimes of up to four orders of magnitude faster than exhaustive search. We demonstrated our method on an insertional mutagenesis dataset, combining insertion data with transcriptional information from the same sample, finding known and novel associations between genes involved in Notch signaling.
Identification of networks of co-occurring oncogenic gains and losses
Collaborating oncogenic events can also be induced by copy number alterations. To detect such events in aCGH data, we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively. This demonstrates that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. The detected co-occurrences are highly enriched for functional relationships. The co-occurring losses we find are independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low- intensity copy number changes may be an important feature of cancer development or maintenance by affecting the gene dosages of a large interconnected network of functionally related genes.
Integration of clinical and expression data for breast cancer outcome prediction
Several models exist that can be used to predict disease outcome of breast cancer patients. Only a few studies have created a single prediction model using both expression and clinical data. These studies often remain inconclusive regarding an obtained improvement (if any). We rigorously compared three different integration strategies (early, intermediate, and late integration) and no integration (only one data source) using five classifiers of varying complexity. We performed our analysis on a set of 295 breast cancer samples, for which expression data and an extensive set of clinical parameters are available.
A nearest mean classifier employing a logical OR operation on clinical and expression classifier outputs significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when employing an integration strategy. The late integration strategy performed best for four out of five classifiers, and early integration once. A nearest mean classifier that is trained on the originally published clinical variables performs worse than an expression based nearest mean classifier. However, adding the outputs from clinical prediction models, and a set of new pathological variables, results in a performance equivalent to that of the expression based classifier. Thus, there is no longer a significant performance argument to choose one data source over the other, but rather employ a late integration strategy based on nearest mean classifiers for optimal results.
Dynamics of genome - nuclear lamina interactions
In collaboration with the van Steensel group we study genome – nuclear lamina interactions in various cell types. For this, we use DamID data of the LaminB1 protein, which is one of the components of the nuclear lamina. We are not only interested in how the genome is organized in a cell nucleus, but more specifically how it is reorganized during, for example, differentiation. To this end we employed an in vitro differentiation system in which cultured mouse embryonic stem cells are differentiated into neural precursor cells, which in turn are induced to form astrocytes. For all three stages DamID profiles were collected. We developed a statistical test to discriminate between ‘constitutive’ and more dynamic, or ‘facultative’, genomic regions across these stages. Our data are currently obtained using high-density genome-wide tiling arrays, for which a strong dependency between probes adjacent on the genome is observed. The developed test employs the variance between independent biological replicates and autocorrelation levels present in the tiling array data to collectively estimate levels of technical and non-specific biological variance.
Statistical evaluation of biomarkers predicting treatment response
Biomarkers predicting treatment response are useful for tailoring treatment to host characteristics of individual patients in order to maximize treatment benefit and minimize side effects. Before prospective randomized trials are launched to evaluate a promising biomarker candidate, the first evaluation in humans often takes place in relatively small retrospective patient series or trials. Standard analyses use interaction terms in regression models. However, the impact of the introduction of a predictive biomarker into clinical practice can be also be estimated retrospectively by assigning patients to the marker-based and non-marker based arm of a hypothetical prospective trial. For example, this has been done in a retrospective analysis of phosphorylation of the estrogen receptor and tamoxifen response in a Swedish trial of premenopausal ER-positive breast cancer, where offering adjuvant tamoxifen treatment to the 52% patients with phosphorylated tumors (10-year recurrence-free survival of 75%) but not to the remaining 48% (10-year recurrence free survival of 52%) would result in an estimated 10-year recurrence-free survival of 64% for patients with phosphorylated tumors. This value is equal to the estimated 10-year recurrence-free survival if all patients are treated with adjuvant tamoxifen irrespective of phosphorylation, i.e., phosphorylation-guided treatment may save unnecessary treatment for half of the patients while maintaining approximately the same 10-year recurrence-free survival. Other examples include homologous recombination deficiency to predict response to high dose chemotherapy for breast cancer and EGFR ligands and insulin-like growth factors to predict response to EGFR-inhibitor treatment for lung cancer.
Research Highlights
Somatic structural rearrangements in genetically engineered mouse mammary tumors
This project reports on the first paired-end sequencing of tumors from genetically engineered mouse models of cancer to determine how faithfully these models recapitulate the landscape of somatic rearrangements found in human tumors. These were models of Trp53-mutated breast cancer, Brca1- and Brca2-associated hereditary breast cancer, and E-cadherin (Cdh1) mutated lobular breast cancer.
It is shown that although Brca1- and Brca2-deficient mouse mammary tumors have a defect in the homologous recombination pathway, there is no apparent difference in the type or frequency of somatic rearrangements found in these cancers when compared to other mouse mammary cancers, and tumors from all genetic backgrounds showed evidence of microhomology-mediated repair and non-homologous end-joining processes. Importantly, mouse mammary tumors were found to carry fewer structural rearrangements than human mammary cancers and expressed in-frame fusion genes. Like the fusion genes found in human mammary tumors, these were not recurrent. One mouse tumor was found to contain an internal deletion of exons of the Lrp1b gene, which led to a smaller in-frame transcript. We found internal in-frame deletions in the human ortholog of this gene in a significant number (4.2%) of human cancer cell lines.
Paired-end sequencing of mouse mammary tumors revealed that they display significant heterogeneity in their profiles of somatic rearrangement but, importantly, fewer rearrangements than cognate human mammary tumors, probably because these cancers have been induced by strong driver mutations engineered into the mouse genome. Both human and mouse mammary cancers carry expressed fusion genes and conserved homozygous deletions.Related publications
People involved
Christiaan Klijn, Lodewyk WesselsKC-SMART: Finding Significantly Recurrent Copy Number Changes
DNA copy number changes are a hallmark of tumor genomes. Cancers are prone to directed and random gain and loss of DNA. In this study we developed a method to separate the frequently occurring DNA copy number changes in a group of tumor samples from the random copy number changes. We do this in a statistically sound, unbiased manner which does not require any additional pre-processing of the data except for normalization. In addition, since we make use of a Gaussian kernel convolution frame work we are able to analyse the data in a scale space which allows for more in-depth discovery of important genomic locations of copy number change.
This work, KCsmart, is available as a package in the popular bioinformatics package Bioconductor for the statistical programming language R.Related publications
People involved
Christiaan Klijn, Jeroen de Ridder, Marcel Reinders, Lodewyk WesselsMolecular maps of the reorganization of genome – nuclear lamina interactions during differentiation
The three-dimensional organization of chromosomes within the nucleus and its dynamics during differentiation are largely unknown. To visualize this process in molecular detail, we generated high-resolution maps of genome – nuclear lamina interactions during subsequent differentiation of mouse embryonic stem cells via lineage-committed neural precursor cells into terminally differentiated astrocytes. This reveals that a basal chromosome architecture present in embryonic stem cells is cumulatively altered at hundreds of sites during lineage commitment and subsequent terminal differentiation. This remodeling involves both individual transcription units and multi-gene regions, and affects many genes that determine cellular identity. Often, genes that move away from the lamina are concomitantly activated; many others however remain inactive yet become unlocked for activation in a next differentiation step. These results suggest that lamina-genome interactions are widely involved in the control of gene expression programs during lineage commitment and terminal differentiation.People involved
Wouter Meuleman, Marcel Reinders, Lodewyk WesselsSearching for Collaborating Cancer Genes
Cancers are caused by an accumulation of multiple independent mutations that collectively deregulate cellular pathways, e.g. such as those regulating cell division and cell-death. Multiple independent mutations within one tumor hints towards a cooperation between the mutated genes. In this study we focus on the detection of statistically significant co-mutations, by analyzing a collection of publicly available retroviral insertional mutagenesis datasets.
We propose a two-dimensional Gaussian Kernel Convolution method (2DGKC), a computational technique that identifies the cooperating mutations in insertional mutagenesis data. We define the Common Co-occurrence of Insertions (CCI), signifying the co- mutations that are statistically significant across all different screens in the RTCGD. Significance estimates are made on multiple scales, and the results visualized in a scale space, thereby providing valuable extra information on the putative cooperation.Related publications
- Co-occurrence analysis of insertional mutagenesis data reveals cooperating oncogenes
- Discovering cooperating oncogenes by statistical co-occurrence analysis
People involved
Jeroen de Ridder, Jan J. Bot, Lodewyk Wessels, Marcel ReindersIdentification of Cancer Genes using Gaussian Kernel Convolution
A potent method for the identification of novel cancer genes is retroviral insertional mutagenesis. Mice infected with slow transforming retroviruses develop tumors because the virus inserts randomly in their genome and mutates cancer genes. The regions in the genome that are mutated in multiple independent tumors are likely to contain genes involved in tumorigenesis. As the size of these datasets increases, conventional methods to detect these so-called common insertion sites (CISs) no longer suffice, and an approach is required that can control the error independent of the dataset size. The authors introduce a framework that uses a technique called kernel density estimation to find the regions in the genome that show a significant increase in insertion density. This method is implemented over a range of scales, allowing the data to be evaluated at any relevant scale. The authors demonstrate that the framework is capable of compensating for the inherent biases in the data, such as preference for retroviruses to insert near transcriptional start sites. By better balancing the error, they are able to show that from the 361 published CISs, 150 can be identified that have a low probability of being a false detection. In addition, they discover eight novel CISs.

