How does CpG island methylation lead to gene silencing?

How does CpG island methylation lead to gene silencing?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I see often the blanket statement that CpG island methylation can lead to gene silencing. However, I have been unable to find a straightforward explanation of the mechanism by this.

Is it a purely physical block of binding for transcription factors? Does the methylation create a tertiary structure that is compacting the DNA? Does it recruit histone modification factors that compact the chromatin?

Please feel free to cite literature so I can go read the papers.

CG methylation has long been associated with gene silencing due to the generally negative correlation between gene promoter methylation and transcription levels.

When CG methylation occurs in the promoter or enhancer region of animals (where these 'CpG islands' tend to be), methylation seems to impede (to some extent) transcription factor (TF) binding. That is the baseline mechanism for gene repression. The altered chemical properties of the DNA make it less favorable binding site for its cognate TF.

In some cases, methylation clearly prevents TF binding. For instance, some TFs bind DNA to block methylation, which facilitates access of another TF or set of TFs to the bound DNA (for instance, this paper by the Schubeler lab about NRF1). In the case of other TFs, their binding is less reliant on methylation state, possibly due to the strength and breadth of the binding site (for instance this work from the John Stam. lab about CTCF).

Methylation also can affect nucleosome positioning and stability in some contexts; the nucleosome effects are poorly understood currently because nucleosomes and methylation likely feedback on one another. See this work from the Zilberman lab for an example.

The role of a higher-order repressive structure forming is arguably not obvious yet; for instance, histone H1 bining does not appear to be heavily influenced by methylation state (although this is somewhat contentious because there is often a positive but weak correlation at repressed elements in the genome between H1 and methylation--for instance just pubmed 'h1 chip' with reviews and read a recent one). More certain is the relative change in location of methylated promoters within the nucleus to a place of transcriptional 'silence' (inactivity). These repositioning phenomena are under intense investigation; you can learn more about this by googling for 'nuclear topology and gene repression' or something like that.

hope this helps

DNA methylation of intragenic CpG islands depends on their transcriptional activity during differentiation and disease

The human genome contains ∼30,000 CpG islands (CGIs), long stretches (0.5–2 kb) of DNA with unusually elevated levels of CpG dinucleotides. Many occur at genes' promoters, and their DNA nearly always remains unmethylated. Conversely, intragenic CGIs are often, but not always, methylated, and thus inactive as internal promoters. The mechanisms underlying these contrasting patterns of CGI methylation are poorly understood. We show that methylation of intragenic CGIs is associated with transcription running across the island. Whether or not a particular intragenic CGI becomes methylated during development depends on its transcriptional activity relative to that of the gene within which it lies. Our findings explain how intragenic CGIs are epigenetically programmed in normal development and in human diseases, including malignancy.


From the initial observation of the presence of DNA methylation differences in the vicinity of beta-globin genes ( 1 , 2 ), and the characterization of the first tumor-suppressor genes undergoing CpG island-methylation-associated silencing ( 3–7 ) to the present-day human epigenome projects ( 8 , 9 ), the clinical approval of epigenetic therapies ( 10 ), and the hypermethylation-associated down regulation of microRNAs (miRNAs) ( 11 , 12 ), epigenetic gene silencing has been the protagonist in the biomedical arena, and its representation in the scientific literature continues to increase ( ). The scenario is further enriched by the discovery that transcriptional repression mediated by DNA methylation occurs in the chromatin-‘receptive’ context of histone modification and chromatin-remodeling factors ( 13 , 14 ), and that these histone methylation and acetylation markers are also disrupted in human cancer ( 15 , 16 ), leading to further aberrations in gene silencing. We should also consider the spectrum of interindividual differences in CpG island DNA methylation patterns ( 17 , 18 ). The aspects of epigenetic gene silencing are therefore myriad, but in this review I shall focus on promoter CpG island hypermethylation, clarifying some of the unsolved issues and emphasizing the latest relevant ‘hot’ research in the area.

Let us start at the beginning by returning to the essentials of CpG islands. The frequency of the CpG dinucleotide in the human genome is lower than expected ( 19 ). The proposed reason for this lack of CpG in our genome is spontaneous deamination in the germline during evolution. However, approximately half of the human gene-promoter regions contain CpG-rich regions with lengths of 0.5 to several Kb, known as ‘CpG islands’ ( 19 ). Although the majority of these are associated with ‘house-keeping’ genes, half of the ‘tissue-specific’ genes also contain a promoter CpG island ( 19 ). The questions of which and how DNA methylation changes in tissue-specific genes occur in cancer remain largely unanswered. The Maspin is still the main representative gene in this class ( 20 ), but larger epigenomic studies have begun to address this issue ( 18 , 21 ). Due to the complexity of the problem and the small amount of information available, I shall not discuss this in the current review.

It should also be noted that although the most significant proportion of CpG islands is located in the 5′-unstranslated region and the first exon of the genes, certain CpG islands can occasionally be found within the body of the gene, or even in the 3′-region. CpG islands in these atypical locations are more prone to methylation ( 22 ), and the RNA transcript can cross over them without any evident impediment ( 23 ). Exceptionally, certain small genes can be considered in their totality as a whole CpG island. Typical CpG islands are entirely unmethylated at all stages of development and allow the expression of a particular gene if the appropriate transcription factors are present and the chromatin structure is accessible to them. In the transformed cell, certain CpG islands of tumor-suppressor genes will become hypermethylated, as I will discuss below.

Author Summary

Polycomb group (PcG) proteins and DNA methylation are fundamental epigenetic regulators of gene expression. The mechanisms underlying such regulation, the crosstalk between these mechanisms, and the role of higher order chromatin folding in mediating transcriptional control of involved genes remains unclear. Abnormal DNA methylation at gene promoters in cancer has been linked to PcG promoter occupancy and PcG-mediated maintenance of genes in a poised, low expression state in embryonic cells. We now strengthen these links and show that PcG occupancy around an entire gene, GATA-4, represses transcription by maintaining a series of long-range chromatin interactions. In embryonic cells, where DNA methylation is largely absent, GATA-4 is in a low, poised transcription state, and the loops can be virtually eliminated by retinoid-induced cellular differentiation, with attendant robust transcriptional up-regulation. When GATA-4 is DNA hypermethylated in colon cancer cells, the intensity of the long-range interactions is increased and associates with complete lack of transcription. Removal of DNA methylation in the cancer cells only slightly loosens the loops and restores expression to a low, poised state. Together, these findings suggest that both repressive pathways operate in part by the formation of chromatin higher order structures and provide important translational ramifications for targeting re-expression of epigenetically silenced genes for cancer therapy.

Citation: Tiwari VK, McGarvey KM, Licchesi JD, Ohm JE, Herman JG, Schübeler D, et al. (2008) PcG Proteins, DNA Methylation, and Gene Repression by Chromatin Looping. PLoS Biol 6(12): e306.

Academic Editor: Peter B. Becker, Adolf Butenandt Institute, Germany

Received: June 19, 2008 Accepted: October 28, 2008 Published: December 2, 2008

Copyright: © 2008 Tiwari et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by National Institute of Environmental Health Sciences (ES011858) and National Institute of Health (CA116160) grants (SBB) and by the Novartis Research Foundation (DS).

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: 3C, chromosome conformation capture ACT, associated chromosome trap ChIP, chromatin immunoprecipitation EC, embryonic carcinoma ES, embryonic stem MSP, methylation-specific PCR NTC, non-targeting control PcG, Polycomb group PRE, Polycomb Response Element pre-RCH, pre-repressive chromatin hub


To date, most WGBS analyses of human cancers have been performed on solid tumors [15]–[17]. DNA methylation analyses in AML have tended to employ less comprehensive methods, such as Illumina arrays, reduced representation bisulfite sequencing (RRBS) and methylated DNA immunoprecipitation (MeDIP)-seq. While these studies obviously have their own important strengths, such as throughput of multiple primary samples from patients [9],[34]–[36], no previous study has performed WGBS on primary AML blasts or cell lines. Nor have previous studies examined the effects of DNMTi on DNA methylation and gene expression, employing such comprehensive methods as WGBS and RNA-seq. This is important because DNMTi are used in the clinic, yet the relationship between their effects on DNA methylation and gene expression is unclear.

Accordingly, we report here the first WGBS analysis of DNA methylation in an AML cell line. We also report the effects of AzaC treatment on DNA methylation and gene expression. Based on simple quantitative analyses of methylation at promoters, CpG islands, and shores, there was no significant correlation between loss of DNA methylation and change in gene expression. However, a more sophisticated search algorithm identified a subset of upregulated genes with a signature loss of methylation flanking the TSS. Remarkably, many of these same genes gained methylation in AML3 cells compared to normal hematopoietic stem and progenitor cells and this was typically accompanied by their downregulation in AML3 cells. These genes have functions in cell movement, cell death and survival, and cell growth and proliferation and are preferentially upregulated on decitabine treatment of patient-derived primary AML blasts. Hence, these genes are candidates for genes whose expression is aberrantly repressed by DNA methylation in AML and reversed by DNMTi treatment.

Globally, the DNA methylation landscape of proliferating AML cells, without AzaC treatment, is reminiscent of other solid tumor epigenomes analyzed by WGBS [15]–[17] large regions of near complete DNA methylation are interspersed with regions of partial methylation and much more focal regions that are largely depleted of DNA methylation. As in normal genomes, regions lacking DNA methylation are predominantly at promoters containing CpG islands [18]. However, as is typical of cancer genomes, a proportion of CpG islands is methylated [26] in AML3 cells, about 12% of CpG islands overlapping gene TSS are methylated. Consistent with the link between CpG island hypermethylation and gene silencing [26], genes with methylated CpG islands tend to be expressed at a lower level than genes with unmethylated CpG islands. As shown previously in other cell types, at gene bodies there is general trend towards increasing gene body methylation with increasing expression, although this relationship breaks down at the most highly expressed genes [18],[21],[22],[24],[25].

Treatment of AML3 cells with AzaC resulted in a near-uniform approxiamte 50% decrease in methylation across the whole genome. Only promoters, CpG islands and 5′ UTRs underwent a slightly more modest decrease, presumably because many of these regions are unmethylated or barely methylated even prior to AzaC treatment. In contrast, previous reports have suggested that AzaC and decitabine cause preferential loss of DNA methylation at some regions of the genome [9],[34],[37]. While differences between cell lines, primary blasts, and DNMTi treatment protocols might account for some differences, the 15× genome-wide coverage achieved in our study unambiguously reveals a uniform decrease across the genome in this study. Of course, a 50% decrease in methylation across the whole genome results in a greater absolute loss of methylation at highly methylated regions, compared to lowly methylated regions. In fact, from this perspective our data appear consistent with those of Ley and coworkers [9]. In sum, at least in AML3 cells, there is no locus-specific preferential loss or retention of DNA methylation.

In contrast to the uniform loss of methylation across the genome, AzaC caused highly targeted gene-specific changes in gene expression. Specifically, 792 genes were significantly upregulated, and 426 downregulated. Since about 12% of genes with a CpG island overlapping the TSS harbor a methylated CpG island in AzaC-untreated cells, and since methylation is associated with decreased expression in these cells, we initially asked whether upregulation of gene expression was associated with CpG island hypomethylation. However, based on simple quantitative analyses of methylation at promoters, CpG islands, and shores, there was no significant correlation between loss of DNA methylation and change in gene expression. Previous studies, for example employing Illumina 450 K arrays or Sequenom technology targeted to selected genes, similarly failed to observe a strong link in this regard [9],[10],[38]–[43]. Conceivably, failure to observe widespread upregulation of hypomethylated genes in in vitro studies depends, in part, on lack of appropriate in vivo signals and environmental factors. Obviously, this issue can only be addressed in humans in the context of clinical studies.

Regardless, a major advantage of WGBS data lies in the ability to perform unbiased searches for patterns of methylation (at the single nucleotide level) that correlate with expression [20],[27]. Indeed, a more sophisticated search algorithm, WIMSi [27], identified a subset of 246 upregulated genes with a shared signature loss of methylation flanking the TSS. Increased expression of these genes after AzaC, associated with a common methylation loss signature, tentatively suggests that these genes might be directly regulated by DNA methylation. In further support of this idea, many of these same genes gained methylation in AML3 cells compared to normal hematopoietic stem and progenitor cells and this was typically accompanied by their downregulation. Conceivably, these are genes whose is expression is aberrantly repressed by DNA methylation in AML and reversed by AzaC treatment of AML the remainder of the genes regulated by AzaC in AML3 might be regulated as a secondary consequence of these candidate primary targets, or might be regulated via effects of AzaC on RNA or DNA damage. Consistent with these 246 genes being directly regulated by DNA methylation, these genes were also significantly and preferentially upregulated in decitabine-treated human primary AML blasts, compared to all genes and even genes regulated by AzaC in AML3 but lacking the loss of methylation signature characteristic of the 246 genes. Since AzaC and decitabine share the ability to induce DNA hypomethylation, but differ in some other respects such as AzaC’s preferential incorporation into RNA, this points to the 246 genes being regulated by DNA methylation. Underscoring the power of the WGBS and WIMSi analysis approach, the original authors of this decitabine study reported a limited correlation between change in DNA methylation and expression [9], likely because the array-based approach had insufficient coverage of promoter CpGs. The 246 AzaC-regulated genes are involved in processes the can reasonably drive anti-neoplastic activity, cell death, cell movement, and cell proliferation, supporting the view that reversal of silencing of these genes by DNMTi contributes to therapeutic activity. If so, methylation and/or expression status of these genes might have utility as biomarkers to predict and/or monitor patient response to DNMTi.

Changes in correlation between promoter methylation and gene expression in cancer

Background: Methylation of high-density CpG regions known as CpG Islands (CGIs) has been widely described as a mechanism associated with gene expression regulation. Aberrant promoter methylation is considered a hallmark of cancer involved in silencing of tumor suppressor genes and activation of oncogenes. However, recent studies have also challenged the simple model of gene expression control by promoter methylation in cancer, and the precise mechanism of and role played by changes in DNA methylation in carcinogenesis remains elusive.

Results: Using a large dataset of 672 matched cancerous and healthy methylomes, gene expression, and copy number profiles accross 3 types of tissues from The Cancer Genome Atlas (TCGA), we perform a detailed meta-analysis to clarify the interplay between promoter methylation and gene expression in normal and cancer samples. On the one hand, we recover the existence of a CpG island methylator phenotype (CIMP) with prognostic value in a subset of breast, colon and lung cancer samples, where a common subset of promoter CGIs hypomethylated in normal samples become hypermethylated. However, this hypermethylation is not accompanied by a decrease in expression of the corresponding genes, which are already lowly expressed in the normal genes. On the other hand, we identify tissue-specific sets of genes, different between normal and cancer samples, whose inter-individual variation in expression is significantly correlated with the variation in methylation of the 3' flanking regions of the promoter CGIs. These subsets of genes are not the same in the different tissues, nor between normal and cancerous samples, but transcription factors are over-represented in all subsets.

Conclusion: Our results suggest that epigenetic reprogramming in cancer does not contribute to cancer development via direct inhibition of gene expression through promoter hypermethylation. It may instead modify how the expression of a few specific genes, particularly transcription factors, are associated with DNA methylation variations in a tissue-dependent manner.

DNMT1 is required to maintain CpG methylation and aberrant gene silencing in human cancer cells

Transcriptional silencing by CpG island methylation is a prevalent mechanism of tumor-suppressor gene suppression in cancers 1,2,3,4 . Genetic experiments have defined the importance of the DNA methyltransferase Dnmt1 for the maintenance of methylation in mouse cells 5 and its role in neoplasia 6 . In human bladder cancer cells, selective depletion of DNMT1 with antisense inhibitors has been shown to induce demethylation and reactivation of the silenced tumor-suppressor gene CDKN2A 7 . In contrast, targeted disruption of DNMT1 alleles in HCT116 human colon cancer cells produced clones that retained CpG island methylation and associated tumor-suppressor gene silencing 8 , whereas HCT116 clones with inactivation of both DNMT1 and DNMT3B showed much lower levels of DNA methylation, suggesting that the two enzymes are highly cooperative 9 . We used a combination of genetic (antisense and siRNA) and pharmacologic (5-aza-2′-deoxycytidine) inhibitors of DNA methyl transferases to study the contribution of the DNMT isotypes to cancer-cell methylation. Selective depletion of DNMT1 using either antisense or siRNA resulted in lower cellular maintenance methyltransferase activity, global and gene-specific demethylation and re-expression of tumor-suppressor genes in human cancer cells. Specific depletion of DNMT1 but not DNMT3A or DNMT3B markedly potentiated the ability of 5-aza-2′-deoxycytidine to reactivate silenced tumor-suppressor genes, indicating that inhibition of DNMT1 function is the principal means by which 5-aza-2′-deoxycytidine reactivates genes. These results indicate that DNMT1 is necessary and sufficient to maintain global methylation and aberrant CpG island methylation in human cancer cells.


Inactivation and reactivation of the APRT gene in strain D422: We have used electroporation in the presence of 5-methyl dCTP to silence the APRT gene. These isolates are selected in medium containing DAP. Table 1 illustrates the frequencies of DAP R colonies under different inducing conditions. A number of clones were isolated, subjected to 5-aza-CR treatment, and plated on AAT medium. We used the semiquantitative assay for detecting reactivation, and in all cases reactivation of the silent metallothionein gene was also checked by plating treated cells in medium containing cadmium (H olliday and H o 1990). The reactivation test gives, in effect, an all or none result. Thus, 5-aza-CR treatment can reactivate a methylated gene at a frequency of 1–10%, whereas a nonmethylated mutant gene produces 10 −5 or less colonies on AAT medium. Of 17 APRT − clones picked, 16 were reactivated by 5-aza-CR. Eight of these isolates were used for subsequent molecular analysis. Clones 1C, 3C and 8C were from Experiment 2 (Table 1), 1A from Experiment 3 and 2B, 4B, 5B and 6B were from Experiment 4.

Genomic sequencing of the promoter of the APRT gene: The bisulphite genomic sequencing procedure is able to distinguish cytosine from 5-methyl cytosine (5-mC) in genomic DNA (F rommer et al. 1992 C lark et al. 1994 G rigg and C lark 1994). In brief, the bisulphite treatment deaminates cytosine to uracil, while 5-mC is unaffected. When the region is amplified by PCR, using appropriate primers the uracil residues are replaced by thymidine and the 5-mC residues are replaced by cytosine. The PCR product is cloned and sequenced. In our experiments, only one strand of the PCR product is sequenced, and in most experiments this comprised 323 bases.

Induction of DAP R colonies in strain D422 by electroporation and treatment with 5-methyl dCTP

The promoter region of the APRT gene is within a CpG island at the 5′ end of the gene. Genomic DNA from the D422 DAP S strain consistently showed an absence of cytosine methylation in the promoter region. Genomic sequencing of CHO K1 DNA was also carried out, and the promoter region was consistently shown to be nonmethylated.

D422 DAP R isolates that are reactivable by 5-aza-CR, were grown to a population size of approximately 2 × 10 7 cells in selective medium, and the DNA was isolated. A single DNA preparation was subjected to bisulphite treatment and several clones from the PCR product were sequenced (PCR clones). We detected DNA methylation in the promoter region of all these isolates, and in almost all cases the 5-mC residues were in CpG doublets. These results are presented in Table 2. The sequence analyzed is 323 base pairs in length and this contains 16 CpG doublets. In some clones (1A, 1C, 3C and 5B), the pattern of methylation was the same in all PCR clones, in others (2B and 6B) a majority of PCR clones had the same pattern, and in two (4B and 8C) there was considerable heterogeneity among the PCR clones. This variability presumably arose by the loss or gain of methylation during the time the original isolate was growing prior to the extraction of DNA. The lowest number of methylated CpG doublets was 5 or 6 in clone 2B, and the highest was 16 in clone 4B(iv) and 4B(ix). Two CpG sites (180 and 292) were methylated in all cases.

Non-CpG methylation was occasionally seen, at random sites, which was probably the result of the failure of deamination of cytosine by bisulphite. We also examined DNA from 5-aza-CR reactivable DAP R isolates from CHO K1. In this case, one would expect two different methylated genes to be present, each with a somewhat different pattern. This demands more exhaustive analysis, and although we confirmed that these isolates contained methylated promoter DNA (results not shown), we decided to concentrate on the more straightforward investigation of the D422 hemizygous strain.

Dual inheritance at the APRT gene: The D422 strain provides a simple system in which the APRT can be inactivated by DNA methylation and reactivated by 5-aza-CR. The CHO K1 strain with two APRT genes makes it possible to examine both methylated epimutants and mutations induced by EMS. Previously we reported that 5-methyl dCTP produced DAP R colonies with a frequency of about 7 × 10 −4 , which was comparable to the frequencies of BrdU R TK − and TG R HPRT − colonies (H olliday and H o 1991). There is only one copy of the HPRT gene as it is X-linked, and the TK gene is hemizygous (H olliday and H o 1990). In subsequent studies with APRT, we obtained a much lower frequency of DAP R APRT − isolates from CHO K1 (

3.7 × 10 −5 , see Table 3). We assume, but cannot prove, that in the earlier populations the APRT gene had become spontaneously hemizygous by de novo methylation, in at least a substantial proportion of the population, whereas in the more recent populations this had not happened.

The DAP R colonies obtained by 5-methyl dCTP-induced silencing in strain K1 presumably have two methylated APRT genes (Figure 1C). Of 22 isolates tested, all were reactivated by 5-aza-CR (Table 3 Figure 1D), but one would expect that in most cases only one of the genes would have become active, since this is sufficient for growth on AAT medium. Hemizygosity was confirmed by treatment with 5-methyl dCTP. DAP resistant strains arose with a frequency of 7 × 10 −4 (Table 3 Figure 1E), which is about 20-fold higher than step C in Figure 1 and Table 3.

One of the hemizygous APRT isolates was treated with EMS, under conditions that induce TG R HPRT − mutants at a frequency of about 10 −3 . DAP R colonies were obtained at a comparable frequency (Table 3 Figure 1F) and these should now contain one silent and one mutant gene. If so, they should be reactivable by 5-aza-CR. Sixteen out of twenty such isolates were shown to be reactivated by this procedure (Table 3 Figure 1G). At this point in the pathway, the cells should contain one inactive mutant gene, and one active gene. One of these isolates was treated with EMS, and again DAP R colonies were obtained (Table 3 Figure 1H). The prediction is that these isolates have 2 mutant genes, which either do not complement each other, or complement weakly. Twenty colonies were picked and twelve were found to be leaky, that is, they grew on AAT medium. It is possible that this growth is due to interallelic complementation. The remaining eight colonies were found to be nonreactivable by 5-aza-CR (Table 3).

Thus, the results obtained are consistent with the pathways indicated in Figure 1. It should be noted that the mutation pathway (Figure 1, A and B) was previously documented (C hasin 1974 J ones and S argent 1974). In those studies, the hemizygous strains were shown to have almost 50% of the initial APRT enzyme activity (100%) and the full mutant had no activity. We have examined the enzyme in the initial K1 strain (100% activity) and have shown that the doubly methylated strain (Figure 3C) has no activity. Hemizygous strains have up to 50% activity. (Note that reactivation to give growth on AAT does not require full gene expression.)

Materials and Methods

Tissue specimens. Bone marrow samples of patients diagnosed with ALL at the Ellis Fischel Cancer Center (Columbia, MO) were obtained in compliance with the local Institutional Review Board. Leukemic blasts, due to their high proportion, were concentrated to >90% purity using Ficoll density gradient centrifugation. DNA was isolated using the QIAamp DNA Mini kit (Qiagen, Valencia, CA) according to the manufacturer's specifications from 20 specimens: 6 from patients with precursor T-cell ALL, 10 from patients with precursor B-cell ALL, and 4 from healthy donors used as controls.

Cell lines. Precursor B-cell ALL cell lines representing various immunophenotypic stages of precursor B-cell development, NALM-6 (CD10 + , CD19 + , and CD20 − ), MN-60 (CD10 + , CD19 + , and CD20 + ), and SD-1 (CD10 − , CD19 + , and CD20 + ) and the precursor T-cell ALL cell line, Jurkat, were purchased from the Deutsche Sammlung von Mikroorganismen und Zellkulturen (Braunschweig, Germany) and grown in appropriate medium and resupplemented as necessary with fresh medium. Cells were harvested and DNA was extracted using the QIAamp DNA Mini kit.

Preparation of the CGI microarray. The microarray panel containing 8,640 CGI clones was prepared as described ( 13). Amplified PCR products were spotted in the presence of 20% DMSO, on UltraGap slides (Corning Life Science, Acton, MA). The slides were postprocessed immediately before the hybridization using Pronto Universal Microarray Reagents (Corning Life Science). All CGI clones present on the microarray were sequenced recently by the Microarray Centre of University Health Network (Toronto, Ontario, Canada ref. 14).

Amplicon development and differential methylation hybridization. Amplicons were generated and differential methylation hybridization (DMH) was done as described previously ( 10– 12). Briefly, 2 μg of genomic DNA were digested with MseI followed by ligation of PCR linkers and digestion with methylation sensitive endonucleases (HpaII and BstUI). PCR was then done to amplify only methylated fragments or fragments containing no internal HpaII or BstUI sites. The amplicons from the ALL and sex-matched normal control sample, which comprised pooled DNA from multiple donors (Promega, Madison, WI), were labeled with Cy5 or Cy3 fluorescence dye, respectively, and cohybridized to a panel of 8,640 short CGI tags arrayed on a glass slide. The slides were scanned with a GenePix 4200A scanner and the signal intensities of the hybridized probes were analyzed using GenePix 5.1 (Molecular Devices Corp., Sunnyvale, CA).

CGI microarray analysis. The 9K chip developed by Huang et al. ( 10) includes 8,640 MseI fragments. Our method for generating amplicons relies on the presence of BstUI or HpaII recognition sequences therefore, all of the MseI fragments lacking a recognition sequence were removed from the analysis resulting in a reduction of the number of CGIs analyzed. Because DNA methylation can vary among individuals depending on age or gender and even in different tissue types within the same individual, we reduced the gender-based variability and the variability due to tissue type by comparing each patient sample (16 total) and each normal bone marrow sample (4 total) with peripheral blood DNA from a pool of sex-matched individuals. To determine which clones were differentially methylated between ALL and normal samples, we globally normalized each microarray and then used the nonparametric Kruskal-Wallis test to do an across-array analysis for each probe.

Clone sequences. Sequences of the differentially methylated CGI clones were extracted from a Web site. 4 Next, BLAST searches were done to determine if these clone sequences were associated with the promoter region of known genes and if these regions contained CGIs. Finally, primers were developed for real-time reverse-transcription (RT-PCR) and for PCR of bisulfite-treated DNA using Primer3 ( 15) and MethPrimer ( 16), respectively.

Methylation-specific PCR and combined bisulfite restriction analysis. Two micrograms of DNA were treated with sodium bisulfite according to the manufacturer's recommendations (Ez DNA methylation kit, Zymo Research, Orange, CA). Bisulfite-treated DNA was then used as a template for PCR with primers designed using MethPrimer that were specific to the CGI regions of each tested gene. Purified amplicons produced using combined bisulfite restriction analysis (COBRA) primers ( Table 1 ) were restricted with BstU1, TaqaI, or HpyCH4IV according to the manufacturer's recommendations (New England Biolabs, Ipswich, MA). The methylation-specific PCR (MSP) primers (as reported previously ref. 11) were used in PCR to differentiate methylated and unmethylated sequences in low-density lipoprotein receptor-related protein 1B (LRP1B). Electrophoresis was done using a 3% agarose gel stained with SYBR Green or a 1.5% agarose gel stained with ethidium bromide to visualize COBRA and MSP products, respectively.

Primers used for COBRA and real-time SYBR Green analyses

Quantitative real-time MSP. Quantitative real-time MSP (qMSP) was done with deleted in liver cancer 1 (DLC-1) primers as described previously ( 12). Briefly, 100 ng of bisulfite-treated DNA and ABgene QPCR Mix (ABgene, Inc., Rochester, NY) were used for PCR amplification as recommended by the manufacturer. The reaction was carried out in 40 to 45 cycles using a SmartCycler (Cepheid, Kingwood, TX).

Cell line treatment. The Jurkat (precursor T cell) and NALM-6 (precursor B cell) ALL cell lines were grown in flasks with RPMI 1640 supplemented with 10% fetal bovine serum, l -glutamine, and gentamicin. Treatment was conducted during the log phase of growth with 5-aza-2-deoxycytidine (5-aza) and/or trichostatin A (TSA), whereas the control cells were not treated. Jurkat cells were seeded at 8 × 10 6 /mL and NALM-6 cells were seeded at 5 × 10 6 /mL. In culture, TSA was added at a 1 μmol/L final concentration and incubated for 6 h, whereas 5-aza was added at a 1 μmol/L concentration and incubated for 54 and 78 h in Jurkat and NALM-6 (based on different dividing times), respectively, with a medium change every 24 h. The cell culture that received both TSA and 5-aza treatment was first incubated with 5-aza as described previously, which was followed by an additional 6-h incubation with TSA. RNA and DNA from the cultured cells were extracted for use in RT-PCR and COBRA, respectively, using the previously mentioned kits.

Quantitative real-time RT-PCR. Total RNA (2 μg) from treated and untreated cell lines was pretreated with DNase I to remove potential DNA contaminants and was then reverse transcribed in the presence of SuperScript II reverse transcriptase (Invitrogen, Carlsbad, CA). The generated cDNA was used for PCR amplification with appropriate reagents in the reaction mix with SYBR Green and fluorescein (ABgene) as recommended by the manufacturer. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and hypoxanthine phosphoribosyltransferase 1 (HPRT1 ref. 17) were used as the housekeeping genes in the Taqman and SYBR Green real-time RT-PCR assays, respectively. The DLC-1 and GAPDH Taqman probe and primer sets for real-time PCR were purchased from Applied Biosystems Assay-on-Demand services. The reaction was carried out using an iCycler real-time PCR instrument (Bio-Rad Laboratories, Hercules, CA). The cycling conditions included an initial 15 min hot start at 95°C followed by 45 cycles at 95°C for 15 s and 60°C for 1 min. Primers were developed for SYBR Green assays using Primer3 ( Table 1). The reactions were carried out using an iCycler. The cycling conditions included an initial 15 min hot start at 95°C followed by 50 cycles at 95°C for 15 s, 58°C for 30 s, and 72°C for 30 s. All samples were analyzed in triplicate and fold changes were determined using the 2 −ΔΔCT method ( 18).

Bisulfite genomic sequencing analysis. Genomic DNA was treated with sodium bisulfite as described previously. Primer sequences and PCR conditions were as for the previously described COBRA assay. Amplified PCR products for Rap2-binding protein 9 (RPIB9) fragment 1, RPIB9 fragment 2, and protocadherin-γ subfamily A member 12 (PCDHGA12) were subcloned using the TOPO-TA cloning system (Invitrogen). Plasmid DNA from 6 to 10 insert-positive clones was isolated using the QIAquick Plasmid Mini Prep kit (Qiagen) and sequenced using an ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA).

Statistical methods. The geometric mean was used to summarize average fold changes in all RT-PCR analyses. All hypothesis tests were two sided and conducted at the 5% significance level unless otherwise stated. Statistical analysis was done using SAS 9.1 (SAS Institute, Cary NC) and R (R Foundation for Statistical Computing, Vienna, Austria).

Spearman's nonparametric correlation was used to evaluate the agreement between bisulfite sequencing and COBRA for a given gene. Only loci that could be detected with both technologies were considered. The proportion of methylated clones (based on bisulfite sequencing) for these loci was computed and correlated with an ordinal measure of COBRA methylation defined with three levels: none, partial, and complete. ‘None’ refers to those cases in which no cuts were observed, ‘partial’ refers to those cases in which partial banding was observed, and ‘complete’ refers to those cases in which all possible bands were observed (i.e., the banding pattern matched that of the SssI and bisulfite-treated control).

Despite the fact that more powerful parametric methods exist to detect localized clusters on a chromosome and due to the heterogeneous coverage of loci on the CGI microarray, a nonparametric randomization test was devised and used based on its simplicity and general applicability. Let nk denote the number of loci on the CGI microarray, which are located on chromosome k and let Sknk denote the number of differentially methylated loci on chromosome k. The locations (bp) of the Sk methylated loci on chromosome k are given by x1,…,xSk and the distances between consecutively methylated loci are given by di = xi + 1xi where i = 1,…,Sk − 1. For each chromosome, the median of these distances was computed to serve as the test statistic, t*. Chromosomes, which have clustered loci, should have smaller consecutive distances and therefore smaller medians. To determine the null distribution for chromosome k, a random sample of Sk loci was drawn from the nk loci with this set of Sk loci representing the methylated loci under the null hypothesis of loci methylated at random with respect to chromosomal location. For each sample, the median of the consecutive distances was computed as described previously. This process was repeated 1 million times and the distribution of the 1 million medians formed the distribution of the test statistic under the null hypothesis for each chromosome. The empirical P value was taken as the proportion of the 1 million medians less than the observed test statistic, t*, for each chromosome. Note that our method requires no assumption about the distribution of the nk loci on the CGI microarray. Furthermore, because the null distribution for each chromosome is generated from 1 million random samples of size Sk selected from among the nk loci, the spatial density of the loci are implicitly incorporated.

It is now almost 26 years since the CpG island—a stretch of DNA with a larger than expected proportion of cytosine followed by guanine bases—was first defined, based on an analysis of the relative proportions of the four bases in the then limited amount of human sequence information available (Gardiner-Garden and Frommer, 1987). At the time, these islands of CpG dinucleotides were presumed to be the location of cis-regulatory elements (regions of DNA that regulate the expression of nearby genes) and, in particular, to be the location of gene promoters (regions of DNA that initiate the transcription of genes).

During the past quarter century, we have sequenced numerous whole genomes from a wide range of species, and have witnessed the development of powerful techniques for identifying cis-regulators throughout these whole genomes, yet we still persist with the concept of the CpG island when we annotate those parts of the genome that do not code for proteins. Frequently ignored is the fact that the annotation only works if we exclude the substantial proportion of the genome that is repetitive DNA, mostly the remnants of self-replicating virus-like elements that have all of the sequence characteristics of the CpG island but are rarely found to be regulatory elements (Glass et al., 2007). A defining feature of CpG islands is that they tend to escape DNA methylation (the addition of a methyl group to cytosine), whereas cytosines in the genome as a whole, and in repetitive DNA in particular, tend to be heavily methylated (Yoder et al., 1997). The question that emerges is whether the CpG island annotation merely acts as a surrogate for an absence of DNA methylation, which is much more relevant when we are searching for cis-regulators in the genome.

Now, in eLife, Robert Klose, Chris Ponting and colleagues at Oxford University, Cancer Research UK and the University of Adelaide—including Hannah Long and David Sims of Oxford as joint first authors—highlight the weakness of the CpG island annotation in an innovative way. They report that when they looked for loci that escape DNA methylation in a set of non-human genomes, they found the CpG island annotation to be very poorly associated with these unmethylated loci (Long et al., 2013). They used a technique called biotinylated CxxC affinity purification (Bio-CAP), followed by massively parallel sequencing, to identify islands of non-methylated DNA in seven highly divergent vertebrate species, ranging from fish to humans.

The Bio-CAP approach takes advantage of the fact that CxxC protein domains (where x is an amino acid other than cysteine) bind preferentially to CpG dinucleotides that are not methylated (Voo et al., 2000). Long, Sims and co-workers found that the base composition of the non-methylated islands in the different species varied substantially. Moreover, the non-methylated islands were conserved more between the species than the CpG islands were, which suggests that they are more biologically meaningful. The results also demonstrate that the CpG island annotation performs especially poorly in non-human species.

The Bio-CAP approach is likely to have its own limitations: the CxxC domain is more likely to capture and enrich loci with multiple unmethylated CpG dinucleotides on the same fragment of DNA, so longer stretches of unmethylated sequence, especially if they are rich in CpG dinucleotides, are going to be more readily identified. The use of 51 base pair single-end reads in the Bio-CAP approach also makes it less likely that non-methylated islands in repetitive DNA (where it is more difficult to map such short reads) will be identified, should they happen to exist. However, as a survey technique, the Bio-CAP approach has many strengths. It should also be recognized that shotgun bisulphite sequencing, the gold standard for DNA methylation studies, does not comprehensively test every cytosine in the genome (Harris et al., 2010), strengthening the justification for survey techniques in the short term until a better genome-wide approach is developed.

The use of mixed cell types in the tissues studied might also influence the results, by tending to enrich those non-methylated islands that are found in many different types of cells. However, despite this possibility, Long, Sims and co-workers were able to compare cells taken from the liver and testes and identify non-methylated islands that were specific to each tissue type. The tissue-specific islands were shorter and contained fewer CpG dinucleotides than those found in both types of tissue, a finding that is reminiscent of work at Stanford that identified two classes of gene promoters—one with high levels of CpG dinucleotides and one with lower levels (Saxonov et al., 2006).

So where does this new insight about non-methylated islands leave us? Base composition has served us well for over a quarter of a century in defining the candidate cis-regulatory elements we call CpG islands, but we are now in a different era in which functional elements can be annotated at high resolution based on molecular assays in individual cell types. At first these annotations were generated by large collaborations—such as the ENCODE collaboration (Dunham et al., 2012), the modENCODE collaboration (Celniker et al., 2009), and the Roadmap in Epigenomics (Bernstein et al., 2010)—but it is becoming increasingly feasible for individual investigators to generate such annotations. This has enormous potential value in allowing us to understand the information located at non-protein coding sequences in the genome. Moreover, as Long, Sims and colleagues clearly demonstrate, the ability to do this is a prerequisite for performing comparative studies between species.

The problem that will arise in a new era of functional annotations will be that of community standards—most people have tended to agree what defines a CpG island, but definitions of features based on identifying unmethylated DNA are likely to be more contentious. For example, is there a minimum size for these features? If a single CpG dinucleotide remains unmethylated in all the cell types tested, surely it should be considered as a potentially significant locus? And if a locus is partially unmethylated on a consistent basis, how unmethylated does it have to be to be a candidate regulatory element? Is conservation of DNA methylation patterns the best way to identify candidates for regulatory elements, or are there other ways?

Notwithstanding these concerns, the work described by Long, Sims and colleagues represents the kind of bold and empirically-based approach that we need to develop for every cell type from every research organism. In parallel, the CpG island annotation on every genome browser should now come with a user warning, especially for non-human genomes: after 26 years of service, the CpG island should be allowed to retire with honour.

Watch the video: DNA Methylation (December 2022).