Information

Universal clock for humans contained in the telomeric sequences?

Universal clock for humans contained in the telomeric sequences?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I don't know if this would make sense, but imagine that we could only suffer from natural aging (not diseases whatsoever involved). Is there an estimate of what is our natural maximum lifespan that a human can have?

I learned that the telomeric sequences protect sensible information in DNA, and that somatic cells suffer from a cut of these sequences in each replication. So I wonder if this is a natural clock, and if it has been estimated how much time do we have naturally.


Telomeres and Early-Life Stress

Stefanie Mayer , . Kathryn K. Ridout , in Stress: Genetics, Epigenetics and Genomics , 2021

Factors Impacting Telomere Length

Telomere length is impacted by complex interactions between environmental exposures and cellular processes, including oxidative damage, DNA replication stress, epigenetic changes, and genetic polymorphisms. Guanine (G) nucleotides, which make up 50% of the telomere sequence, have a low reduction potential and are thus particularly sensitive to oxidative damage. 15 In states of oxidative stress when reactive oxygen species outnumber antioxidant mechanisms, oxidation can induce telomere shortening, cell senescence, and cell death. 16 Exposure to ionizing radiation or carcinogens can lead to telomere DNA damage. 17 This damage triggers DNA excision repair processes 17 to help restore the integrity of the telomere at the potential cost of shortening telomeres.

Emerging research suggests that epigenetic modifications may impact telomere length. Epigenetic modifications allow for elaboration of the genome beyond what is determined by the DNA. These modifications are carried forward during cellular division, but do not change the DNA sequence. Methylation is one of the most common forms of epigenetic modification 18 and can alter the chromatin state of telomeres. Low telomere methylation can lead to inappropriate signaling of DNA repair pathways and sequence errors and shortening of telomere DNA. 19

Genome-wide association studies (GWAS) have identified genetic loci that may contribute to telomere length variation. In a study of subjects with a history of familial longevity, Lee et al. 20 found three loci (4q25, 17q23.2, and 10q11.21) associated with telomere length. Mangino et al. 21 identified a locus on chromosome 18q12.2 associated with telomere length, and in their metaanalysis, Mangino et al. 22 identified novel genomic regions associated with telomere length variation (17p13.1 and 19p12) and confirmed associations between leukocyte telomere length and loci 3q26.2 and 10q24.33. 21, 23, 24

Genetic polymorphisms appear to have an important role in regulating telomere length in relationship to psychopathology risk. For example, telomere syndromes due to inherited genetic mutations of telomere maintenance machinery are associated with characteristic neuropsychiatric sequelae. 12 Telomere length and longevity have also been associate with specific single nucleotide polymorphisms (SNPs) in genes that produce telomerase. 25, 26 SNP rs2736100—located in the gene that codes for human telomerase reverse transcriptase (hTERT)—is associated with shorter telomeres. A recent study of 2026 people found that rs2736100 homozygotes had higher rates of depression than heterozygote carriers or homozygotes for the protective allele. 27 This association only occurred in subjects with no ELS history, suggesting that the effects of ELS on telomere length may be large enough to conceal genetic influences.


5.4: Cladistics

Therefore, the more differences there are in biological sequences, the more time has passed for the differences to accumulate. The more time that has passed since organisms have shared a common ancestor, the less evolutionarily related the organism are to each other.

Homologous traits can be used to group organisms into clades. Traits shared among the species or groups in a dataset tend to form nested patterns that provide information about when branching events occurred in the lineage.

Butterflies, moths and flies are all equally related to beetles.

Wasps are more closely related to butterflies, moths and flies than to beetles.

In 1735, Linnaeus published Systema Naturae in which he proposed a system of categorizing and naming organisms using a standard format so scientists could discuss organisms using consistent terminology. Linnaeus's tree of life contained just two main branches for all living things: the animal and plant kingdoms.

In 1866, Haeckel, proposed another kingdom, Protista, for unicellular organisms. He later proposed a fourth kingdom, Monera, for unicellular organisms whose cells lack nuclei, like bacteria.

In 1969, Whittaker proposed adding another kingdom—Fungi—in the tree of life. Whittaker's five-kingdom tree was considered the standard for many years.

In the 1970s, Woese created a tree with three Domains above the level of Kingdom: Archaea, Bacteria, and Eukarya.


<p>This section provides any useful information about the protein, mostly biological knowledge.<p><a href='/help/function_section' target='_top'>More. </a></p> Function i

Binds the telomeric double-stranded 5'-TTAGGG-3' repeat and plays a central role in telomere maintenance and protection against end-to-end fusion of chromosomes. In addition to its telomeric DNA-binding role, required to recruit a number of factors and enzymes required for telomere protection, including the shelterin complex, TERF2IP/RAP1 and DCLRE1B/Apollo. Component of the shelterin complex (telosome) that is involved in the regulation of telomere length and protection. Shelterin associates with arrays of double-stranded 5'-TTAGGG-3' repeats added by telomerase and protects chromosome ends without its protective activity, telomeres are no longer hidden from the DNA damage surveillance and chromosome ends are inappropriately processed by DNA repair pathways. Together with DCLRE1B/Apollo, plays a key role in telomeric loop (T loop) formation by generating 3' single-stranded overhang at the leading end telomeres: T loops have been proposed to protect chromosome ends from degradation and repair. Required both to recruit DCLRE1B/Apollo to telomeres and activate the exonuclease activity of DCLRE1B/Apollo. Preferentially binds to positive supercoiled DNA. Together with DCLRE1B/Apollo, required to control the amount of DNA topoisomerase (TOP1, TOP2A and TOP2B) needed for telomere replication during fork passage and prevent aberrant telomere topology. Recruits TERF2IP/RAP1 to telomeres, thereby participating in to repressing homology-directed repair (HDR), which can affect telomere length.

<p>Manually curated information for which there is published experimental evidence.</p> <p><a href="/manual/evidences#ECO:0000269">More. </a></p> Manual assertion based on experiment in i


Figure 2

Figure 2. TPP1 and different telomeric states. a) TPP1 mediates interactions between POT1, TIN2, and telomerase (4). OB = oligonucleotide/oligosaccharide binding fold RD = recruitment domain, also called PBD (4) S/T = serine/threonine-rich region TID = TIN2-interaction domain. Nomenclature is from ref 5. b) Telomeric states. (i) Nonextendible state. POT1 is bound to the telomeric 3′ end, preventing elongation by telomerase. The telomere is shown here in a linear form. The t-loop should also correspond to a nonextendible state (see text). (ii) Extendible state. Unknown mechanisms may dissociate or displace the telomeric-end-bound POT1, or prevent it from binding, to allow telomerase access. Telomerase may be enriched at telomeres via TPP1, which binds indirectly to the double-stranded telomeric tract. (iii) Extending state. During extension, the activity and processivity of telomerase are stimulated by TPP1 bound to more internal POT1 molecules.

The six-component shelterin complex has been known for some time to be involved in telomere length control (1, 2). For example, inhibition of TRF1 leads to telomere elongation, whereas overexpression of TRF1 causes telomere shortening in human telomerase-positive cells without affecting telomerase activity. Reduction of TIN2 protein levels or the overexpression of mutant alleles that disrupt TIN2 interaction with TPP1 leads to telomere elongation (13, 14). Suppression of TPP1 by RNA interference or the disruption of the TPP1–POT1 interaction, which is accompanied by the loss of the POT1 signal at telomeres, also results in telomere lengthening (7, 8). Thus, reinforcement of the shelterin complex seems to inhibit telomerase. Longer telomeres load more shelterin complexes, and this may provide a length-sensing mechanism (1). Furthermore, shelterin, particularly TRF2, promotes formation of t-loops in which the telomeric 3′ overhang is tucked into the double-stranded part of the telomere (15). This may sequester the 3′ overhang from DNA repair factors as well as from telomerase. In addition, shelterin may promote binding of POT1 to the telomeric 3′ end, which also inhibits telomerase (16). Indeed, in vitro studies demonstrate that POT1, when bound to the telomeric 3′ end, prohibits binding and extension by telomerase (Figure 22, panel b, state i) (12, 17). Thus, the shelterin complex so far has been mostly linked to telomerase inhibition. However, telomeres must switch from nonextendible to extendible states, at least in telomerase-positive cells (3, 18).

Now, both new papers strengthen the view that shelterin components also have a telomerase-activating function. A physical interaction between TPP1 and human telomerase is demonstrated by coimmunoprecipitation of TPP1 and telomerase expressed in rabbit reticulocyte lysate and in cell extracts (5). TPP1 OB fold is necessary and sufficient for this interaction, an indication that TPP1 recruits telomerase to telomeres through its OB fold (5). Because TPP1 does not bind telomeric DNA directly, it could exert this function when bound either to the double-stranded telomeric tract via TIN2/TRF1/TRF2 or to the 3′ overhang via POT1 (5) (Figure 22, panel b, state ii). Wang et al.(4) performed detailed in vitro telomerase activity assays in the presence of POT1 and TPP1 and discovered activating functions, apart from a possible recruitment function. POT1 has been known to inhibit telomerase activity when bound to the telomeric 3′ end, and this inhibition is not overcome upon association with TPP1. Therefore, Wang et al. forced binding of POT1 with a primer DNA point mutation to a more upstream register, leaving a telomerase-extendible 3′ tail (Figure 22, panel b, state iii). Indeed, under this experimental condition, POT1 and TPP1 not only improved total telomerase activity but also increased telomerase processivity, an effect requiring the TPP1–POT1 interaction. Moreover, the two proteins together were able to rescue telomerase processivity on a G-quadruplex-forming oligonucleotide, probably because of the ability of POT1 to trap an open form of this structure (19).

In summary, the two papers identify TPP1 as an intimate and direct regulator of human telomerase. TPP1 mediates communication between shelterin and telomerase. It also may provide an ideal target to regulate telomerase activity at individual chromosome ends to trigger preferential recruitment and/or activation at short telomeres. The new data provide a molecular snapshot of the stimulatory effect of the POT1–TPP1 telomere binding protein complex during telomerase-mediated extension. However, the molecular nature of this and other telomeric states and the mechanism of their transition remain to be elucidated and must be studied in further detail. A possible model is that shelterin delivers POT1 to telomeric 3′ ends to prevent their elongation by telomerase, resulting in a nonextendible state (Figure 22, panel b, state i) (12, 16, 17). Unknown mechanisms may dissociate or displace the telomeric-end-bound POT1 or prevent it from binding, thus allowing telomerase access and producing an extendible state (Figure 22, panel b, state ii). This would permit the formation of an extending state in which telomerase is stimulated by TPP1 bound to more internal POT1 molecules (Figure 22, panel b, state iii). This duality of POT1 is reminiscent of studies in budding yeast, where the single-stranded overhang is bound by Cdc13p (cell division cycle 13), another OB-fold-containing protein that bears weak structural similarity with POT1. Cdc13p protein recruits telomerase holoenzyme in the S-phase via a telomerase subunit called Est1p (ever shorter telomere 1) (20). This interaction is critical for telomere maintenance, because est1 yeast strains undergo cellular senescence. However, Cdc13p can also negatively regulate telomere elongation (21). Nonetheless, budding yeast Est1p is not homologous to TPP1, and therefore it may not fulfill an analogous function. In Saccharomycescerevisiae, the phosphatidyl inositol-3-like protein kinases Tel1 and mitosis entry checkpoint 1 (Mec1) (ATM (ataxia telangiectasia mutated) and ATR (ATM and Rad3-related in humans)) also seem to play a critical role in preferentially activating telomerase at short telomeres. Interestingly, these kinases also associate with human telomeres in a cell-cycle-dependent manner (22), and it will be fascinating to uncover whether similar mechanisms regulate TPP1 function and telomerase activity at human telomeres.


Quantifying telomere length

The most commonly used techniques to measure telomere length are Southern blot, polymerase-chain reaction (PCR) based techniques and in situ hybridization. Southern blotting or telomere restriction fragment analysis (TRF) is the traditional method and still considered the gold standard [22]. The telomeres are represented as smears, and the weight of the smear is representative for the average telomere length. The main disadvantage of this technique is the relatively high amounts of DNA which is required. This technique is therefore not feasible for determining telomere length in single cells, or for different chromosomes, or when DNA availability is limited. The real-time PCR-based method is relatively fast and only requires small amounts of genomic DNA. This technique is based on modified PCR primers to avoid primer-dimer amplification as much as possible [13]. The final measure will be a ratio telomere quantity divided by a reference gene quantity (T/S ratio) which is a relative measure, perfectly valid within a given population (as it will correctly rank subjects) but more difficult to compare between populations. The most recent advantage is the development of a multiplex assay in which both the telomere and the reference gene is targeted in a single well [14]. The PCR technique suffers the same disadvantage as the TRF method when considering single cell or specific chromosome analysis. The quantitative PCR technique has been widely used and accepted to estimate telomere length in large cohort studies [10, 15, 86, 93]. A specific modification of the previous techniques is single telomere length analysis, which uses Southern blotting techniques to separate PCR-amplified products, by combining specific primers and probes for the telomeres and the subtelomeric regions to measure telomere length per chromosome [5]. This technique is at the moment thought to be the most accurate telomere measurement, but it is a labor-intensive and technically challenging technique that can only be used for chromosomes from which the subtelomeric region is known. In situ hybridization techniques make it possible to visualize the telomeres in single cells. Quantitative fluorescence in situ hybridization (Q-FISH) uses a (CCCTAA)3 peptide nucleic acid probe to visualize the telomeres. In metaphase spreads, the telomeres are visible at the end of the chromosomes and they can be quantified also in single chromosomes [52]. An important variation on this technique is the flow fluorescence in situ hybridization, or Flow-FISH. By combining Q-FISH hybridization and flow cytometry analysis, it is possible to measure average telomere length in interphase cells in combination with standard flow cytometry antibodies to select the cell population of interest [71].


Supporting information

S1 Fig. Peptide alignment of Nxf2 homologs.

We used NCBI web BLAST to search the D. melanogaster Nxf2 peptide sequence against the RefSeq peptide database and identified homologs in 22 Drosophila species. The carboxyl-terminal region of Nxf2 derives from CDS which shares homology with the TART-A TE (gray box). At the peptide level, this region is conserved out to D. virilis, which suggests that, if it was acquired from an insertion of the TART-A TE, the insertion would have occurred in the common ancestor of the entire genus. CDS, coding sequence TE, transposable element.

S2 Fig. Zoom view of dotplot showing alignments of D. melanogaster TART-A versus D. melanogaster nxf2 and D. yakuba TART-A.

The pink boxes show the 2 segments of shared homology between D. melanogaster TART-A and D. melanogaster nxf2. D. yakuba TART-A aligns to D. melanogaster TART-A at regions directly adjacent to, but not including, the TART-A/nxf2 shared homology. Underlying data can be found in S2 Data.

S3 Fig. Within-species comparisons of nxf2 versus TART-A.

We compared nxf2 transcript sequences from D. melanogaster (A), D. yakuba (B), and D. sechellia (C) to TART-A sequences from the same species using mummer [106]. There is sequence homology present between D. melanogaster nxf2 and TART-A but not for D. yakuba nxf2/TART-A nor for D. sechellia nxf2/TART-A. Underlying data can be found in S2 Data.

S4 Fig. Alignment of nxf2-like region from 71 D. melanogaster TART-A elements.

We identified 71 TART-A elements with 3′ UTRs from 17 long-read D. melanogaster genome assemblies. All 71 elements contain the nxf2-like sequence (gray box) suggesting that this region is present in most, if not all, TART-A elements in D. melanogaster. Note that a portion of the nxf2-like region appears to have been deleted in one of the TART-A elements.

S5 Fig. Illumina sequencing coverage of the nxf2-like region of TART-A across the DGRP.

We compared genomic sequencing coverage for the nxf2-like region of TART-A (blue shading) to its upstream and downstream flanking regions (yellow shading). For each DGRP strain, we divided read coverage by the median coverage of that strain’s TART-A ORF1 and ORF2 to control for copy number differences between strains. We calculated coverage for each strain in 10-bp windows across the region. Each box in the figure summarizes the per-strain coverage values for a single 10-bp segment. Within each box, the internal line represents the median coverage and the hinges correspond to the 25th and 75th percentiles. The whiskers extend to 1.5× the interquartile range. The coverage of the nxf2-like region is similar to the coverage of the downstream region, both of which are reduced relative to the upstream region. This pattern is consistent with truncation of the UTR, which has previously been described for TART [74]. Because the nxf2-like sequence is present in both UTRs, truncation of the 5′ UTR, which is fairly common, should reduce coverage of both the nxf2-like region and downstream flanking region by approximately 50% compared to the upstream region, which is not present in the 5′ UTR (Fig 1B). We observed a reduction in coverage of approximately 30%, consistent with a mixture of TART-A copies, some with truncated 5′ UTRs and some without. The median coverage across all boxes within a region is shown by the colored horizontal bars. Underlying data can be found in S2 Data. ORF, open reading frame.

S6 Fig. Repetitive element up-regulation in nxf2 knockdown.

Each RepBase repeat for which we observed expression in total RNA-seq data from female ovaries is shown on the y-axis, and the fold change in expression in the nxf2 RNAi knockdown versus a control knockdown of the white gene is shown on the x-axis with a log2 scale. Expression values are the mean of 2 biological replicates for both knockdown and control. For LTR retrotransposons, LTRs are shown separately from the rest of the TE. Underlying data can be found in S2 Data. LTR, long terminal repeat TE, transposable element.

S7 Fig. Correlation between shRNAs in nxf2 knockdown.

We used 2 shRNAs that target different regions of the nxf2 transcript and calculated expression values for genes as well as TEs for each knockdown. We found that the expression values are highly correlated between the 2 experiments (Spearman’s rho = 0.92 [Genes] and 0.94 [TEs]). Underlying data can be found in S2 Data. shRNA, short hairpin RNA TE, transposable element.

S8 Fig. nxf2 cleavage products from degradome-seq data.

We analyzed published degradome-seq and Aub-immunoprecipitated small RNA data to determine whether there were nxf2 degradome-seq reads showing the 10-bp sense:antisense overlap with TART-A piRNAs, consistent with cleavage by a Piwi protein. We identified 11 locations (A–K) within the TART-like region of nxf2 where degradome-seq cleavage products (red) overlap with antisense piRNAs (blue) by 10 bp at their 5′ ends. The nxf2 transcript is shown in black. degradome-seq, degradome sequencing piRNA, Piwi-interacting small RNA.

S9 Fig. Genes up-regulated upon disruption of the piRNA pathway show greater abundance of aligned piRNAs.

We identified 168 genes whose fold change in expression was greater than or equal to nxf2 across RNAi knockdowns of 16 piRNA pathway components. These genes have a significantly larger abundance of aligned piRNAs compared to the remainder of expressed genes, suggesting their expression may be regulated by piRNAs (Wilcoxon test P = 4.1e-06). Underlying data can be found in S2 Data. piRNA, Piwi-interacting small RNA RNAi, RNA interference.

S10 Fig. PiRNA pathway genes do not show a uniform response to piRNA pathway disruption.

We examined the fold change in expression of 41 known piRNA pathway genes across RNAi knockdowns of 16 piRNA pathway components, excluding the targeted gene from analysis for each experiment. PiRNA pathway genes show a median fold change near 1 (horizontal red line) for most experiments. Underlying data can be found in S2 Data. piRNA, Piwi-interacting small RNA RNAi, RNA interference.

S11 Fig. The correlation between nxf2 expression and TART-A copy number is reproducible.

We repeated the analysis shown in Fig 7A using a replicate microarray dataset from [125] and found a similar correlation (Spearman’s rho = −0.49), which suggests that the microarray expression measurements are highly reproducible. Underlying data can be found in S2 Data.

S12 Fig. Expression of other piRNA pathway genes (besides nxf2) is not correlated with TART-A copy number.

We were able to obtain expression values for 39 other piRNA pathway genes from the same microarray dataset that we used for nxf2 expression. For each of these genes, we calculated Spearman correlation coefficient for its expression compared to TART-A copy number. All correlation coefficients are at least 2-fold smaller in magnitude than what we observed for nxf2. Underlying data can be found in S2 Data. piRNA, Piwi-interacting small RNA.

S13 Fig. Summary of correlations between piRNA pathway genes and TART-A copy number.

The histogram summarizes the Spearman correlation coefficients between 39 piRNA pathway genes and TART-A copy number (shown in S12 Fig). The red line shows the correlation coefficient for nxf2. Underlying data can be found in S2 Data. piRNA, Piwi-interacting small RNA.

S14 Fig. Per-strain piRNA coverage of nxf2.

We plotted piRNA read depth (normalized as RPM mapped) along the nxf2 transcript for each of the 16 DGRP strains shown in Fig 7. For each strain, the abundance of TART piRNAs is listed in the plot title. We masked the locations of the TART/nxf2 shared homology (gray boxes) before alignment to avoid cross-mapping of TART-derived piRNAs. Underlying data can be found in S2 Data. DGRP, Drosophila Genetic Reference Panel piRNA, Piwi-interacting small RNA RPM, reads per million.

S15 Fig. Correlation between TART-A and nxf2 piRNAs.

There is a strong positive correlation between TART-derived piRNAs that align to nxf2 versus the nxf2 piRNAs downstream from the region of shared homology, across 16 DGRP strains (Spearman’s rho = 0.88, P < 2.2e-16). Underlying data can be found in S2 Data. DGRP, Drosophila Genetic Reference Panel piRNA, Piwi-interacting small RNA.

S16 Fig. The 5 DGRP strains used in the RNA-seq experiment have nxf2 expression levels that are representative of the DGRP population as a whole.

We used the microarray dataset from [125] to select 5 DGRP strains whose median nxf2 expression level is similar to that of the full DGRP population. Underlying data can be found in S2 Data. DGRP, Drosophila Genetic Reference Panel RNA-seq, RNA sequencing.

S1 Table. Allele-specific counts for TART-derived antisense piRNAs aligned to nxf2. piRNA, Piwi-interacting small RNA.

S1 Data. Multiple sequence alignment of nxf2.

S2 Data. Underlying data for all graphs.

S3 Data. Multiple sequence alignment used for Fig 2B.

S4 Data. FASTA file containing the sequence of the D. simulans TART-A fragment.


Telomeric G-Quadruplexes: From Human to Tetrahymena Repeats

The human telomeric and protozoal telomeric sequences differ only in one purine base in their repeats TTAGGG in telomeric sequences and TTGGGG in protozoal sequences. In this study, the relationship between G-quadruplexes formed from these repeats and their derivatives is analyzed and compared. The human telomeric DNA sequence G3(T2AG3)3 and related sequences in which each adenine base has been systematically replaced by a guanine were investigated the result is Tetrahymena repeats. The substitution does not affect the formation of G-quadruplexes but may cause differences in topology. The results also show that the stability of the substituted derivatives increased in sequences with greater number of substitutions. In addition, most of the sequences containing imperfections in repeats which were analyzed in this study also occur in human and Tetrahymena genomes. Generally, the presence of G-quadruplex structures in any organism is a source of limitations during the life cycle. Therefore, a fuller understanding of the influence of base substitution on the structural variability of G-quadruplexes would be of considerable scientific value.

1. Introduction

G-rich DNA sequences can form intra- and intermolecular G-quadruplexes based on the association of one or more DNA strands. The nucleotides which intervene between G-runs form loops of folded G-quadruplex structures which can adopt a variety of different topological forms [1, 2]. When the guanine tracts are oriented in the same direction, the double-chain reversal (propeller) loops link two adjacent parallel strands to form a parallel structure [3]. When the guanine tracts are oriented in opposite directions, the edgewise or diagonal loops link two antiparallel strands to form an antiparallel G-quadruplex [4]. In antiparallel hybrid or so-called (

) structures, a single strand is oriented in a different direction from the others [5–7]. A novel ( ) type fold which has recently been described by Marušič et al. exhibits a conformation in which all three loop types occur in one conformation: edgewise, diagonal, and double-chain reversal loops [8]. In addition, intermolecular multimeric G-quadruplexes can be formed by the association of two or more strands [9].

These structures underline the high degree of G-quadruplex structural polymorphism, a phenomenon which is dependent on many different factors: the length and sequence of nucleic acid, and environmental conditions present during the folding reaction such as the buffer, pH, stabilizing cation, temperature, and the presence of agents causing dehydration [10–14]. G-rich sequences with the propensity to form G-quadruplex structures can be located in many regions of human genomic DNA, especially in several biologically important regions including the end of linear eukaryotic telomeres [15, 16]. However putative G-rich sequences are not randomly distributed within a genome such sequences predominantly occur in protooncogene regions (which promote cell proliferation) and are depleted in tumour suppressor genes (which maintain genomic stability) [17]. It is very unlikely that these putative sequences can form in vivo and direct evidence of their existence in living cells is still a topic of discussion [18–20]. Undoubtedly, the most extensively studied G-quadruplex forming sequences are those located at the 3′-ends of human telomeres. Telomeric sequences and specialized nucleoprotein complexes which cap the ends of linear chromosomes are essential for chromosomal stability and genomic integrity [21–23]. Mammalian telomeres consist of tandem repeats of G-rich sequences,

. Several kilobases of this sequence are double-stranded, but more than a hundred nucleotides remain unpaired and form single-stranded 3′-overhangs [24], a state which would provide favourable conditions for the formation of one or more G-quadruplexes in vivo [22]. The structure and stability of telomeres play a significant role in the development of cancer and cell aging [25, 26]. There is also evidence that telomeres serve as a type of biological clock, as telomere structures appear to become shorter with each successive cell cycle. In immortalized cells and in cancer cells, however, a telomerase is activated to maintain the length of the telomere by reelongating the telomeric sequence at the chromosome ends [27, 28]. G-quadruplexes formed by single-stranded human telomeric DNA have also been shown to inhibit the activity of telomerase [29], and this discovery has led to increased interest in the structures as attractive potential drug targets [30].

A broad range of studies of human telomeric G-quadruplexes have been carried out using a wide variety of different techniques [1]. To date, high-resolution structures of four distinct folding topologies with three G-tetrad layers have been identified for the four human telomeric repeats [1–7]. In addition, an additional structure consisting of only two G-tetrad layers has also been revealed which highlights the structural polymorphism of telomeric G-quadruplexes [31]. The structure of human telomeric DNA in crowded solutions has also been investigated by many authors [11], but this structure is likely to be a result of dehydration rather than molecular crowding [12, 32, 33]. The great variety of structures identified to date can also be attributed to the presence of flanking nucleotides outside the core sequence G3(T2AG3)3 and the concentration of ions and to the use of different experimental methods and conditions [1, 34].

A series of systematic studies concerning the sequence derivatives of human telomeric repeats were carried out by Vorlícková et al. [35–38], and these earlier studies focused on the substitution of guanine for adenine, the introduction of abasic sites, 8-oxoadenine replacing adenine, and the substitution of 5-hydroxymethyluracil for thymine in telomeric repeats were analyzed [39–41]. However, in this study, an opposite strategy is applied, the substitution of adenine for guanine (see Figure 1). The main aim is to achieve the total conversion of four human repeats to Tetrahymena repeats which retain the ability to form intramolecular G-quadruplex. Interestingly, G-rich repetitions containing imperfections were also found in the human and Tetrahymena genome see Table 1 and Supporting Materials.

extinction coefficient at 257 nm.

In this study, we examine the structures formed by the Tetrahymena telomeric sequence, dG4(T2G4)3, which differs from the human sequence by a single G-for-A replacement in each repeat [42]. Since Gs are essential for the formation of G-quadruplexes, we have systematically substituted each of the three adenines for guanines in the TTA loops of the G-quadruplex-forming sequence G3(T2AG3)3, thereby increasing the number of guanines by up to three guanines per oligonucleotide. Circular dichroism spectroscopy (CD) and polyacrylamide gel electrophoresis (PAGE) were used to observe the effect of base substitution (s) on the formation, thermal stability, and conformation of G-quadruplexes. The measurements were performed in the presence of both Na + and K + ions and with concentrations of either PEG-200 or acetonitrile at 0, 15, 30, and 50 wt% at different temperatures. In addition, the formation of G-quadruplex structures was verified and confirmed using Thiazole Orange (TO). TO is an excellent DNA fluorescent probe for DNA structural forms because of its high fluorescence quantum yield [43]. This ligand stabilizes the G-quadruplex structure and can also induce topological changes [44, 45]. The G-quadruplex-TO complex offers a characteristic profile of induced-circular dichroism spectrum in buffers containing sodium cations [44].

2. Materials and Methods

All experiments were carried out in a modified Britton-Robinson buffer (mRB), 25 mM phosphoric acid, 25 mM boric acid, 25 mM acetic acid, and supplemented by 50 mM of KCl or NaCl, PEG-200 (polyethylene glycol with an average molecular weight of 200) and acetonitrile (Fisher Slovakia) pH was adjusted by Tris to a final value of 7.0. Oligonucleotides with sequences shown in Table 1 were purchased from Metabion international AG. The lyophilized DNA samples were dissolved in double-distilled water prior to use to give 1 mM stock solutions. Single-strand DNA concentrations were determined by measuring the absorbance at 260 nm at high temperature (95°C).

2.1. CD Spectroscopy

CD and UV-vis spectra were measured using a Jasco model J-810 spectropolarimeter (Easton, MD, USA). The temperature of the cell holder was regulated by a PTC-423L temperature controller. Scans were performed over a range of 220–600 nm in a reaction volume of 300 μl in a cuvette with a path length of 0.1 cm and an instrument scanning speed of 100 nm/min, 1 nm pitch, and 1 nm bandwidth, with a response time of 2 s. CD data represents three averaged scans taken at a temperature range of 0–100°C. All DNA samples were dissolved and diluted in suitable buffers containing appropriate concentrations of ions and dehydrating agent. The amount of DNA oligomers used in the experiments was kept close to 25 μM of DNA strand concentration. The samples were heated at 95°C for 5 minutes then allowed to cool down to the initial temperature before each measurement. CD spectra are expressed as the difference in the molar absorption of the right-handed and left-handed circularly polarized light (Δε) in units of M −1 ·cm −1 . The molarity was related to DNA oligomers. A buffer baseline spectrum was obtained using the same cuvette and subtracted from the sample spectra. The thermal stability of different quadruplexes was measured by recording the CD ellipticity at 295 and 265 nm as a function of temperature [14, 46]. The temperature ranged from 0 to 100°C, and the heating rate was 0.25°C/min. The melting temperature (

) was defined as the temperature of the midtransition point. was estimated from the peak value of the first derivative of the fitted curve. DNA titration was performed with increasing concentrations of TO. TO was solubilized in DMSO to reach a final concentration of stock solution of 10 mM. The concentration of DNA and TO in 1 mm quartz cell was 30 μM and 0–200 μM, respectively, and the increment of TO was

67 μM. Each sample was mixed vigorously for 3 min following the addition of TO CD/UV spectra were measured immediately.

2.2. Electrophoresis

Samples consisting of 0.3 μl of 1 mM stock solutions were separated using nondenaturing PAGE in a temperature-controlled electrophoretic apparatus (Z375039-1EA Sigma-Aldrich, San Francisco, CA) on 15% acrylamide (19 : 1 acrylamide/bisacrylamide) gels. DNA was loaded onto

cm gels. Electrophoresis was run at 10°C for 4 hours at 125 V (

8 V·cm −1 ). Each gel was stained with StainsAll (Sigma-Aldrich). The gel was also stained using the silver staining procedure in order to improve the sensitivity of the DNA visualization [44].

2.3. Fluorescence Spectroscopy

The fluorescence spectra were acquired with a Varian Cary Eclipse Fluorescence Spectrophotometer at 22 ± 1°C which was equipped with a temperature-controlled circulator. A quartz cuvette with a 3 mm path length was used in all of the experiments. In the fluorescence measurements, the excitation and emission slits were 5 nm and the scan speed was 240 nm/min. 66 μM of TO was titrated with DNA (3.3, 6.6, and 13.2 μM) in a mRB buffer in both the presence and absence of monovalent metal cations. The molar ratios between DNA and ligand were 1 : 20, 1 : 10, and 1 : 5. The excitation wavelength was adjusted to 452 nm.

3. Results and Discussion

3.1. Sequence Design and CD Spectra

The sequence derived from human telomeric sequence d(G3(T2AG3)3) and substituted derivatives under different conditions are studied. The DNA sequences and the abbreviations used in this study are summarized in Table 1. Points 1, 2, and 3 indicate the positions of the base substitution in the first, second, and third loops of the HTR sequence, respectively. Point 0 indicates a flanking guanine at the 5′ end of the oligonucleotide, Figure 1. In the DNA oligonucleotides derived from HTR, the guanine (G)-for-adenine (A) in the TTA loop was substituted with the expectation that the modified sequences would retain the ability to form G-quadruplexes spontaneously, albeit with different topologies than those found in HTR sequences. The HTR derivatives were analyzed in the presence of both 50 mM NaCl and KCl, Figure 2. The first group represents oligonucleotides containing only single point mutations at different positions HTR1, HTR2, and HTR3 (black lines in Figure 2). The second group represents oligonucleotides containing two point mutations (spectra indicated with blue in Figure 2). The first two loops were modified in HTR1,2, the first and last loops were changed in HTR1,3 and the second and third loops were modified in HTR2,3. Oligonucleotides HTR0,1,3, HTR0,2,3, and HTR1,2,3 contained three G-for-A substitutions (spectra in green). The spectrum and melting temperatures of the HTR0,1,2 sequence are very similar to those of the HTR0,2,3 sequence (not shown in this study), while the HTR0,1,2,3 sequence is equivalent to the THR sequence.

1 h to the initial temperature at which the sample was kept at the beginning of the measurement [14].

The substituted sequences were also compared with the unmodified HTR and THR sequences. In general terms, each of the guanine residues in any G-run could be involved in the formation of G-tetrads. In the case of the formation of three-layered G-tetrad quadruplexes, loop lengths were found to vary when the base substitution was introduced into the HTR sequence loops could consist of three or four nucleotides depending on the location and number of substitutions. However, we cannot exclude the possibility of the formation of four-layered G-quadruplexes for sequences containing three substitutions, but it is important to note that such structures would have to consist of at least one heteronucleotide-tetrad in which adenine is also present. To date, the 3D structure of full-length THR sequences in presence of potassium has not been determined the only facet of the structure which is known is the tetrameric G-quadruplex structure formed from four shorter sequences d(TTGGGGT) (PDB: 139D) [47]. This structure consists of four G-tetrads and cannot be stated as representing the real structure of a full-length oligonucleotide. Nevertheless, the 3D structure of THR has been ascertained only in the presence of sodium (PDB: 186D) [48]. This structure consists of three stacked G-tetrads, two edgewise loops, and one double-chain-reversal loop. Despite the fact that the sequences of THR and HTR differ at only one of the six nucleotides, their 3D topologies are quite different because HTR in sodium adopts a three-G-tetrad structure consisting of two edgewise loops and one central diagonal loop (PDB: 143D) [4]. However, the HTR sequence can also adopt a stable basket-type conformation in the presence of potassium consisting of only two G-tetrad layers (PDB: 2KF8) [31].

Several naturally occurring HTR sequences have been identified to date. Forms 1 (PDB: 2HY9) and 2 (2JPZ) consist of three G-tetrads, but the order of loops differs HTR forms one double-chain-reversal and two edgewise loops in both forms [49, 50]. There is some similarity with THR G-quadruplexes which form in solution in the presence of sodium [48]. Form 3 is represented by a parallel G-quadruplex with three double-chain-reversal loops (PBD: 1KF1) [3]. Recently, Lim et al. have also confirmed the structure of a 27-nt HTR derivative in the presence of sodium which differs significantly from those mentioned above (PBD: 2MBJ) [1]. Although both known HTR structures solved in sodium possess the same relative strand orientations, they differ in the hydrogen-bond directionalities and in the loop arrangement. The 2MBJ structure again consists of two edgewise and one double-chain-reversal loops.

The sequence derived from the telomere of Oxytricha d[G4(T4G4)3] (PDB: 201D and 230D) adopts a structure with similar types of loops to those found in HTR in sodium two edgewise and one central diagonal loops [4, 51, 52]. However, the Oxytricha sequence forms a four-layered G-tetrad quadruplex. At the time of writing, the solution structure of Oxytricha sequence d[G4(T4G4)3] in K + containing solution had yet to be determined. The main reason for this could be the fact that this sequence and THR in the presence of potassium can adopt different topological forms which coexist in solution additional bands are observed during electrophoretic separation [14]. Interestingly, the four-layered G-quadruplexes are very stable, exhibiting particularly high melting temperatures in the presence of potassium [14]. Recently, the structure of d(GGGGCC)4 in the presence of potassium has also been determined the sequence contains cytosines instead of thymine residues and one 8-bromodeoxyguanosine (PDB: 2N2D) [53]. The G-quadruplex structure adopted by this sequence could be closely related to that of THR in potassium. This antiparallel structure is composed of four G-quartets which are connected by three edgewise C-C loops. CD spectra results show many signatures in common with the THR sequence. One of the cytosines in every loop is stacked upon the G-quartet an arrangement which results is a very compact and stable structure. Similarly, the melting temperature of the structure is higher than 90°C.

It is generally accepted that CD spectroscopy is a very useful and cost-efficient method for offering a first glance at the architecture of folded G-quadruplexes. CD spectra of G-quadruplexes can be used to indicate whether the DNA has folded into a parallel or antiparallel conformation [36, 54].

Although there are up to 25 generic folding topologies of G-quadruplexes, it is possible to classify the structures into three groups based on the sequence of glycosidic bond angles adopted by guanosines of the G-quadruplex [55]. Group I consists of parallel G-quadruplexes with strands oriented in the same direction and with guanosines of the same glycosidic bond angles. Parallel G-quadruplexes (Group I) share the same characteristics irrespective of whether they contain three or four loops: an intense positive maximum at

240 nm. Groups II and III consist of antiparallel G-quadruplexes Group II can be characterized by guanosines of glycosidic binding angles in orientations such as anti-anti and syn-syn and also syn-anti and anti-syn, while Group III consists of stacked guanosines of distinct glycosidic bonding angles. Antiparallel G-quadruplexes show a positive band at

295 nm. Positive and negative CD signals at

240 nm, respectively, are characteristic for Group II, while Group III shows reverse peaks. In contrast, the CD spectra of high ordered G-quadruplex architecture of Group III forms exhibit negative and positive signals at 240 nm and

CD profiles corresponding to distinct G-quadruplex conformations are determined empirically therefore, the interpretation of CD spectra of unknown putative G-quadruplex sequences can be ambiguous. A number of other factors can also cause a degree of uncertainty over the evaluation of CD spectra, including, for example, the presence of mixed populations of various conformers and/or the presence of multimeric conformations in solution [9, 14, 44, 46].

CD measurements clearly show that the G-for-A substitutions had a considerable impact on the spectral profile of each sequence. The presence of the G-quadruplex scaffold formed from the unmodified HTR sequence is characterized by a positive peak at

295 nm with two shoulders at around

250 nm in the presence of potassium (Figure 2(a), red line). According to CD spectra these signatures are characteristic for Group II antiparallel G-quadruplexes. This spectrum is indicative of the formation of a two-layered basket-type structure [31, 55]. The HTR sequence adopts a clear antiparallel G-quadruplex conformation of Group II type in the presence of sodium (Figure 2(d), red line). The structure is characterized by a large positive maximum at

295 nm, a smaller one near

245 nm, and a negative CD peak at

265 nm. Previous studies have reported that these sequences form an intramolecular, basket-type antiparallel G-quadruplex [4]. Every sequence shows a clear peak at

295 nm which is characteristic of an antiparallel G-quadruplex topology. The first set of oligomers with a single substitution per oligonucleotide in the presence of potassium shows two separated peaks at

265 nm the signal is dominant at 295 nm (spectra shown in black). However, THR and HTR derivatives containing one or more G-for-A substitutions in the presence of potassium show an increase of the peak at

265 nm (Figures 2(b) and 2(c)). This indicates the coexistence of more than one topological structure, that is, both parallel and antiparallel configurations see also the electrophoretic results in Figure 7. The structural polymorphism was seen to increase with increasing numbers of Gs in the DNA sequence. The CD signal at

265 nm (spectra shown in green) was predominant for oligonucleotides containing three substitutions (Figure 2(c)).

In the presence of sodium, only the HTR2 sequence with a substitution in the second loop exhibited a CD spectrum identical to that of HTR, although even this correspondence displayed lower amplitudes (spectra in dotted black in Figure 2(d)). HTR1 and HTR3 sequences with substitutions in the first and third loop, respectively, also displayed a positive maximum at 295 nm, but the negative peaks at 265 nm were shallower and slightly shifted towards longer wavelengths in comparison to the results of the unmodified sequence. Despite these differences, they are nonetheless likely to form G-quadruplexes of Group III. Only the CD spectra of the THR sequence shows signatures of Group II types.

Samples in the second group exhibited a positive maximum at

295 nm with a slight shift to lower wavelengths in the case of HTR1,2 and HTR1,3 (Group III, spectra shown in blue in Figure 2(b)). These two sequences displayed a lack of a negative peak at 265 nm, and the smaller positive peak at around 245 nm was shifted slightly to longer wavelengths (Figure 2(e)). HTR2,3 shows a negative signal at

245 nm and a positive signal at 265 and 295 nm, results which are indicative of the formation of Group II antiparallel G-quadruplexes.

The CD spectra of HTR0,2,3 are close to those of Group II G-quadruplexes while the CD of HTR0,1,3 and HTR1,2,3 resemble those of Group III G-quadruplexes. All samples with three mutations exhibited a positive maximum at

295 nm . HTR0,1,3 exhibits negative signals at 235 and 275 nm in the presence of sodium (Figure 2(f)), while HTR0,2,3 shows positive signals at 265 and 295 nm.

In general, all the modified sequences in the presence of both Na + and K + were seen to differ to some degree from the HTR spectrum and were also found to differ from each other. The varying CD spectral profiles from sample to sample are a result of slight changes in G-quadruplex topology. However, it was not possible to determine either the group or structure of the G-quadruplexes with any degree of certainty on the basis of CD spectral profiles alone due to the coexistence of various topological forms, a finding which was confirmed by the electrophoretic results discussed in Section 3.5.

3.2. CD Spectra in the Presence of PEG-200 and Acetonitrile

In the presence of K + , the dehydrating agent PEG-200 is known to induce a conformational change of telomeric G-quadruplexes, primarily the transition from an antiparallel structure to a parallel arrangement [2, 9, 11, 13, 14, 56]. Therefore, the influence of PEG-200 and another dehydrating agent acetonitrile on CD spectral results and the stability of HTR derivatives were also investigated. The representative CD spectra of HTR and THR in the presence of different concentrations of both dehydrating agents (15, 30, and 50 wt%) and 50 mM KCl are shown in Figure 3. Both types of DNAs were found to form G-quadruplex structures with a propeller-like parallel arrangement in the presence of K + . However, when the sequences were studied in the presence of sodium with no potassium present, no structural conversions were observed this finding remained constant for all of the studied HTR derivatives and THR. Interestingly, at a PEG-200 concentration of 50 wt% the positive peaks at 295 nm were found to disappear and a CD signal at 265 nm was recorded which was

2-fold higher than without the presence of PEG-200. The same effect was observed for acetonitrile. This is an intrinsic property of any converted G-quadruplex molecule. In a recent study, our group presented a hypothesis which explains this fact the CD signal depends on the number and orientation of stacked glycosyl bonds [9, 14, 57]. We have also previously shown that PEG-200 causes the dimerization of HTR [9] therefore electrophoretic analysis of the sequences in the presence of PEG-200 was also performed, Figure 8.

The melting temperature of HTR and the vast majority of G-quadruplex structures are known to increase in the presence of PEG-200. In order to verify this fact, the melting temperatures were determined in the presence of PEG-200 on the basis of CD melting curves. The results are summarized in Table 2 and clearly confirm that PEG-200 increases the melting temperatures of HTR derivatives. In a methodology, which has been used in our previous studies, dual wavelength measurements were performed for cases in which the spectra displayed peaks at both 295 and 265 nm, respectively [9, 11]. A of 63.2°C was obtained in a mRB buffer containing 50 mM KCl, as compared to a value of 50.4°C in a buffer with 50 mM NaCl for HTR at 295 nm. The overall picture which emerges from the thermodynamic data is that the stability of G-quadruplexes of HTR derivatives increases with increased numbers of G-for-A substitutions in both Na + and K + solutions. The lowest value of HTR was recorded in both 50 mM KCl and 50 mM NaCl. The of HTR derivatives was found to be higher in KCl than in NaCl. All of the studied sequences show a higher value in the presence of both dehydrating agents. The results indicate that PEG-200 stabilizes G-quadruplexes with or without the A-for-G mutation. The proposed melting temperatures summarized in Table 2 clearly demonstrate that both the number of guanine residues in a G-tract and the nature of the stabilizing ion are important determining factors in the thermal stability of G-quadruplexes.

3.3. Titration Measurements

Our group has recently developed a new experimental methodology for the identification of G-quadruplex forming sequences using the cyanine dye Thiazole Orange (TO). TO is an excellent DNA fluorescent probe for various structural motifs due to its high fluorescence quantum yield [58]. This experimental technique can also be used to investigate the hypothesis that HTR derivatives adopt G-quadruplex conformations. TO interacts with various DNA secondary structures, but it has a stronger binding affinity to triplex and G-quadruplex structures than to other structural motifs [43, 45]. Although TO is optically inactive, TO-quadruplex complexes are chiral and display a unique profile of the induced CD (ICD) spectrum in the visible region [44]. Recently we have described the common ICD features shared by many different G-quadruplex structures. The results of TO-quadruplex interaction are the positive peaks at 265 and 295 nm (UV range), and the three peaks in the visible region at

473 nm in the solution either without the presence of metal cations or in presence of Na + [44]. TO facilitates the formation of G-quadruplex structures even without the presence of other cations, but the adopted topology induced with TO can vary in comparison with the presence of sodium or potassium in solution the CD profile in the UV region can be different. A completely different ICD profile of the TO-DNA complex was observed for sequences unable to adopt G-quadruplex structure [44]. However, other G-quadruplex ligands tested in our laboratory were not suitable for this purpose and provided ambiguous results for example, Thioflavin T, porphyrin derivatives, Hoechst 33342, and Hoechst 33258. This methodology is intended to be used as a supplementary technique because it extends the possibilities of basic spectral methods in terms of distinguishing G-quadruplex structures without the use of more expensive and time-consuming methods. ICD monitoring can be applied in different conditions, but it is the most sensitive in solutions without the presence of metal cations it can also be applied with slightly reduced sensitivity in solutions containing Na + or low concentrations of K + (<5 mM). It should be noted here that the interpretation of ICD profile at higher concentration of K + is by no means unambiguous. Nevertheless, we also performed the titration experiments in the presence of 50 mM KCl because this condition is more biologically relevant.

The results of titration analysis in the presence of 50 mM NaCl are shown in Figure 4. The ICD results display the expected positive signals at

510 nm and negative signals at

475 nm, signatures which are characteristic for G-quadruplexes. In addition, the G4C2 sequence was analyzed because of its 3D structure in solutions containing potassium. As expected, this oligonucleotide was also found to form G-quadruplexes under the given conditions. Signals corresponding to those of antiparallel G-quadruplex structures were also clearly detected in the UV region. By increasing the concentration of TO, the signals at 295 and 265 nm were seen to decrease and increase, respectively, phenomena which are indicative of a conversion from antiparallel to parallel folding. This effect was also observed under the influence of PEG-200, Figure 3.

As was noted above, titration measurements in 50 mM KCl were also performed, Figure 5. The signals observed in the UV region clearly suggest that G-quadruplex structures were formed in the presence of potassium, but the effect of structural conversion was significantly suppressed. An intensive ICD signal with a maximum of around 500 nm is known to correspond to the formation of complexes between DNA and ligands. However, there was a distinct lack of any of the clear common features which are typically observed for profiles obtained in the presence of sodium. Interestingly, the ICD of the THR sequence was inverted, and therefore we suggest that the binding mode of TO with THR is different from that with HTR derivatives. Another explanation is that THR forms at least two distinct topological conformations in solution and that one of these forms can bind with TO more effectively. As was reported in our previous study, THR forms at least three different structures in the presence of potassium [14] we therefore decided to verify this hypothesis using electrophoretic separation.

It is important to exclude the potential side effect of using DMSO during TO titration experiments. The stock solution of TO contains DMSO, a polar aprotic solvent which may produce an effect similar to that of PEG-200 [56]. The concentration of DMSO used in our experiments did not exceed 4.5 wt%. In order to eliminate the dehydrating effect DMSO may cause in TO titration experiments, titration analysis was also performed in the presence of DMSO alone. However, no significant effect was observed at concentrations lower than 5 wt% in the absence of Na + , K + , and ions. Nevertheless, the presence of DMSO in solution containing K + could explain the slight differences in ICD profiles at concentrations of K + greater than 5 mM.

3.4. Fluorescence Analysis

DNA-TO complexes display a clear but wide absorption at around 500 nm. A single positive peak of TO was observed at 452 nm and this wavelength was used for the excitation of the DNA-TO complex. The fluorescence spectra of the HTR and THR sequences are shown in Figure 6. The measurements were performed in three different environments: (i) a mRB buffer without metal cations, (ii) a mRB buffer supplemented with 50 mM NaCl, and (iii) mRB supplemented with 50 mM KCl. For both oligomers, the fluorescence enhancement achieved the highest yield in the solution without metal cations. The fluorescence yield of HTR was greater than the yield of THR in all three of the tested conditions. The goal of this experiment was to demonstrate that the profile of fluorescence is not greatly dependent on the sequence of DNA oligonucleotide for this ligand. The fluorescence enhancement of TO can induce the formation of any type of G-quadruplex structure.

3 μM). The S-line represents the mobility of the mixture of oligonucleotides: d

3 μM). The loading buffer contained 50 wt% PEG-200.

3.5. Electrophoresis in the Presence of Na + , K + , and TO

Nondenaturing polyacrylamide gel electrophoresis (PAGE) is an accessible technique which is used to supplement spectroscopic data when the presence of multiple species of G-quadruplexes cannot readily be identified based on CD spectra alone. The mobility of the DNA sample depends on many different factors, including conformation, charge, and molecular mass. Electrophoretic separation can provide valuable information about the molecularity of G-quadruplexes. Intramolecular G-quadruplexes have a compact structure and thus migrate faster through a cation-containing gel than their linear counterparts, while intermolecular G-quadruplexes migrate more slowly due to their higher molecular weight [9, 14, 44]. Oligomers d(AC)9, d(AC)14, and d(AC)18 were used as standards due to their lack of secondary structures. These standards served as benchmarks in comparing the mobility of different electrophoretic patterns. Since none of the sequences used were longer than 22 nt., the oligonucleotides which were observed to have migrated faster than d(AC)9 could be identified as having formed intramolecular G-quadruplexes. It is also reasonable to assume that oligonucleotides which moved more slowly or at a similar speed to d(AC)18 had adopted high-order G-quadruplex structures. Figure 7 shows the electrophoretic records of native 15% polyacrylamide gels illustrating the relative mobilities of the oligomers in the presence of 50 mM NaCl and KCl at 10°C (Figures 7(a) and 7(c)). In addition, the corresponding electrophoretic results, where the gels and loading buffers contain 2 molar equivalents of TO, are shown in Figures 7(b) and 7(d). In general, some clear trends emerge. Gel electrophoresis performed in the presence of sodium shows that all of the oligonucleotides had moved in one bulk, with single bands migrating faster than d(AC)18 in each column. This effect was also observed when TO was present in the gel. These results indicate that all DNA oligonucleotides form antiparallel intramolecular G-quadruplexes under these conditions. These results agree with the results obtained by CD spectroscopy. It is important to note that intramolecular structures had formed exclusively in the presence of sodium despite the introduction of mutations in HTR sequences increasing the possibility of the formation of different topologies of G-quadruplexes. The electrophoresis did not reveal any significant anomalous mobility of oligomers sequences with the same length were found to move more or less equally.

In contrast to sodium, the presence of potassium led to the formation of both intra and intermolecular arrangements (Figures 7(b) and 7(d)). In the first group, the HTR1 quadruplex with one substitution in the first loop exhibited the fastest migrating band in comparison to that of HTR. A single smear band was also observed for the HTR2 sequence. Smears typically arise when two distinct conformers can be formed a slow isomerization between the two conformers during the electrophoretic separation is the main source of band smearing. The mobility of the HTR and HTR3 sequences with substitutions in the third loop is similar. The oligonucleotides containing two substitutions per oligomer displayed high levels of polymorphism. These oligonucleotides form several coexisting conformers because each line contains several bands moving at different rates. Interestingly, the HTR1,2 and HTR1,3 sequences displayed two faster well-recognized bands, results which correspond to the formation of intramolecular conformers, and slower bands representing multimeric structures. The addition of TO also caused the fastest conformers to coalesce and the slowest structures to diminish. HTR2,3 produced a faster intra- and slower intermolecular species (dimer and tetramer). Surprisingly, the oligonucleotides with three substitutions per oligomer were found to be slightly less polymorphic in comparison with the sequences containing two substitutions, displaying only bands with lower magnitudes corresponding to the formation of multiple-molecular G-quadruplexes in the case of the HTR0,1,3 and HTR1,2,3 sequences. TO was found to exert only a limited effect on the multimeric forms of these oligonucleotides.

3.6. Electrophoresis in the Presence of PEG-200

The dependence of HTR dimerization on PEG 200 concentration has been analyzed in previous studies [9]. The formation of both intermolecular dimers and intramolecular monomers was observed in the buffer containing a PEG-200 concentration of 15% wt. The HTR derivatives containing 2 and 3 substitutions were seen to convert readily to slower migrating dimeric structures even at lower concentrations of PEG-200. At a PEG-200 concentration of 50 wt% and 50 mM KCl, the complete structural conversion to a parallel dimeric G-quadruplex was induced, Figures 3 and 8. This effect was not observed in buffers that did not contain potassium [9, 14]. CD measurements at 50 wt% PEG-200 showed no signal at

295 nm. Based on our previous studies, intermolecular species which migrate more slowly are indicative of the formation of dimers [2, 9, 14]. The 3D structure of HTR containing a flanking sequence in an analogical condition has been determined using NMR [11]. The results show an intramolecular parallel G-quadruplex structure (PDB: 2LD8), but the overhanging nucleotides can cause a steric hindrance for the dimerization of this structure.

4. Conclusion

In this study, we clearly demonstrate that increasing the number of guanines in the loop regions of HTR sequences supports the formation of G-quadruplex structures. Any substitution of A-for-G increases the melting temperature, while the introduction of several substitutions was found to facilitate the coexistence of several conformers in the presence of potassium. The systematic introduction of these substitutions finally leads to the formation of sequences which occur in the Tetrahymena telomere. In addition, similar sequences were also found in the human genome. These findings raise an interesting point. Why does the Tetrahymena telomere require sequences which can adopt such highly stable G-quadruplex structures? In general, very stable G-quadruplexes are usually a source of problems in cells during the life cycle of an organism. The THR sequence is more polymorphic than HTR it forms two different monomeric and one dimeric conformers as has been shown here and in our previous studies [14]. Our analysis focused on sequences consisting of four G-runs without any overhanging nucleotides at both termini this type of arrangement is not an ideal model for extrapolation to natural telomeric repeats which typically consist of tens to thousands repeats.

Our results demonstrate that all HTR derivatives including THR can be converted from antiparallel to parallel folds in the presence of potassium and PEG-200. ICD spectra indicate that the binding mode of TO with THR in the presence of KCl might be different from those observed for HTR derivatives, and this is a finding which could also be important for other molecules recognizing the THR structure in nature. It suggests that the structure of THR shows some structural features which are different from those of HTR and HTR derivatives in the presence of potassium. Confirmation of the biological significance of this fact remains an open topic.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Slovak Research and Development Agency under Contracts nos. APVV-0280-11 and APVV-0029-16, European Cooperation in Science and Technology (COST CM1406), Slovak Grant Agency (1/0131/16 and 002UPJŠ-4/2015), and internal university grants (VVGS-PF-2017-251 and VVGS-2016-259). The authors thank G. Cowper for critical reading and correction of the manuscript.

References

  1. K. W. Lim, V. C. M. Ng, N. Martín-Pintado, B. Heddi, and A. T. Phan, “Structure of the human telomere in Na+ solution: An antiparallel (2+2) G-quadruplex scaffold reveals additional diversity,” Nucleic Acids Research, vol. 41, no. 22, pp. 10556–10562, 2013. View at: Publisher Site | Google Scholar
  2. V. Víglaský, K. Tlučková, and Ľ. Bauer, “The first derivative of a function of circular dichroism spectra: Biophysical study of human telomeric G-quadruplex,” European Biophysics Journal, vol. 40, no. 1, pp. 29–37, 2011. View at: Publisher Site | Google Scholar
  3. G. N. Parkinson, M. P. H. Lee, and S. Neidle, “Crystal structure of parallel quadruplexes from human telomeric DNA,” Nature, vol. 417, no. 6891, pp. 876–880, 2002. View at: Publisher Site | Google Scholar
  4. Y. Wang and D. J. Patel, “Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex,” Structure, vol. 1, no. 4, pp. 263–282, 1993. View at: Publisher Site | Google Scholar
  5. A. Ambrus, D. Chen, J. Dai, T. Bialis, R. A. Jones, and D. Yang, “Human telomeric sequence forms a hybrid-type intramolecular G-quadruplex structure with mixed parallel/antiparallel strands in potassium solution,” Nucleic Acids Research, vol. 34, no. 9, pp. 2723–2735, 2006. View at: Publisher Site | Google Scholar
  6. K. N. Luu, A. T. Phan, V. Kuryavyi, L. Lacroix, and D. J. Patel, “Structure of the human telomere in K+ solution: An intramolecular (3 + 1) G-quadruplex scaffold,” Journal of the American Chemical Society, vol. 128, no. 30, pp. 9963–9970, 2006. View at: Publisher Site | Google Scholar
  7. A. T. Phan, V. Kuryavyi, K. N. Luu, and D. J. Patel, “Structure of two intramolecular G-quadruplexes formed by natural human telomere sequences in K+ solution,” Nucleic Acids Research, vol. 35, no. 19, pp. 6517–6525, 2007. View at: Publisher Site | Google Scholar
  8. M. Marušič, P. Šket, L. Bauer, V. Viglasky, and J. Plavec, “Solution-state structure of an intramolecular G-quadruplex with propeller, diagonal and edgewise loops,” Nucleic Acids Research, vol. 40, no. 14, pp. 6946–6956, 2012. View at: Publisher Site | Google Scholar
  9. P. Tóthová, P. Krafčíková, and V. Víglaský, “Formation of highly ordered multimers in G-quadruplexes,” Biochemistry, vol. 53, no. 45, pp. 7013–7027, 2014. View at: Publisher Site | Google Scholar
  10. N. Smargiasso, F. Rosu, W. Hsia et al., “G-quadruplex DNA assemblies: Loop length, cation identity, and multimer formation,” Journal of the American Chemical Society, vol. 130, no. 31, pp. 10208–10216, 2008. View at: Publisher Site | Google Scholar
  11. B. Heddi and A. T. Phan, “Structure of human telomeric DNA in crowded solution,” Journal of the American Chemical Society, vol. 133, no. 25, pp. 9824–9833, 2011. View at: Publisher Site | Google Scholar
  12. M. C. Miller, R. Buscaglia, J. B. Chaires, A. N. Lane, and J. O. Trent, “Hydration is a major determinant of the G-quadruplex stability and conformation of the human telomere 3 ′ sequence of d(AG 3(TTAG3)3),” Journal of the American Chemical Society, vol. 132, no. 48, pp. 17105–17107, 2010. View at: Publisher Site | Google Scholar
  13. D. Miyoshi, A. Nakao, and N. Sugimoto, “Molecular crowding regulates the structural switch of the DNA G-quadruplex,” Biochemistry, vol. 41, no. 50, pp. 15017–15024, 2002. View at: Publisher Site | Google Scholar
  14. V. Víglaský, L. Bauer, and K. Tlučková, “Structural features of intra- and intermolecular G-quadruplexes derived from telomeric repeats,” Biochemistry, vol. 49, no. 10, pp. 2110–2120, 2010. View at: Publisher Site | Google Scholar
  15. J. L. Huppert and S. Balasubramanian, “Prevalence of quadruplexes in the human genome,” Nucleic Acids Research, vol. 33, no. 9, pp. 2908–2916, 2005. View at: Publisher Site | Google Scholar
  16. A. K. Todd, M. Johnston, and S. Neidle, “Highly prevalent putative quadruplex sequence motifs in human DNA,” Nucleic Acids Research, vol. 33, no. 9, pp. 2901–2907, 2005. View at: Publisher Site | Google Scholar
  17. J. Eddy and N. Maizels, “Gene function correlates with potential for G4 DNA formation in the human genome,” Nucleic Acids Research, vol. 34, no. 14, pp. 3887–3896, 2006. View at: Publisher Site | Google Scholar
  18. A. Henderson, Y. Wu, Y. C. Huang et al., “Detection of G-quadruplex DNA in mammalian cells,” Nucleic Acids Research, vol. 42, no. 2, pp. 860–869, 2014. View at: Publisher Site | Google Scholar
  19. R. F. Hoffmann, Y. M. Moshkin, S. Mouton et al., “Guanine quadruplex structures localize to heterochromatin,” Nucleic Acids Research, vol. 44, no. 1, pp. 152–163, 2016. View at: Publisher Site | Google Scholar
  20. V. S. Chambers, G. Marsico, J. M. Boutell, M. Di Antonio, G. P. Smith, and S. Balasubramanian, “High-throughput sequencing of DNA G-quadruplex structures in the human genome,” Nature Biotechnology, vol. 33, no. 8, pp. 877–881, 2015. View at: Publisher Site | Google Scholar
  21. E. H. Blackburn, “Telomere states and cell fates,” Nature, vol. 408, no. 6808, pp. 53–56, 2000. View at: Publisher Site | Google Scholar
  22. M. J. McEachern, A. Krauskopf, and E. H. Blackburn, “Telomeres and their control,” Annual Review of Genetics, vol. 34, pp. 331–358, 2000. View at: Publisher Site | Google Scholar
  23. N. Grandin and M. Charbonneau, “Protection against chromosome degradation at the telomeres,” Biochimie, vol. 90, no. 1, pp. 41–59, 2008. View at: Publisher Site | Google Scholar
  24. W. E. Wright, V. M. Tesmer, K. E. Huffman, S. D. Levene, and J. W. Shay, “Normal human chromosomes have long G-rich telomeric overhangs at one end,” Genes & Development, vol. 11, no. 21, pp. 2801–2809, 1997. View at: Publisher Site | Google Scholar
  25. J. A. Londoño-Vallejo, “Telomere instability and cancer,” Biochimie, vol. 90, no. 1, pp. 73–82, 2008. View at: Publisher Site | Google Scholar
  26. C. B. Harley, “Telomere loss: mitotic clock or genetic time bomb?” Mutation Research DNAging, vol. 256, no. 2–6, pp. 271–282, 1991. View at: Publisher Site | Google Scholar
  27. A. J. Sfeir, W. Chai, J. W. Shay, and W. E. Wright, “Telomere-end processing: The terminal nucleotidesof human chromosomes,” Molecular Cell, vol. 18, no. 1, pp. 131–138, 2005. View at: Publisher Site | Google Scholar
  28. L. M. Colgin and R. R. Reddel, “Telomere maintenance mechanisms and cellular immortalization,” Current Opinion in Genetics & Development, vol. 9, no. 1, pp. 97–103, 1999. View at: Publisher Site | Google Scholar
  29. A. M. Zahler, J. R. Williamson, T. R. Cech, and D. M. Prescott, “Inhibition of telomerase by G-quartet DNA structures,” Nature, vol. 350, no. 6320, pp. 718–720, 1991. View at: Publisher Site | Google Scholar
  30. S. Neidle and G. Parkinson, “Telomere maintenance as a target for anticancer drug discovery,” Nature Reviews Drug Discovery, vol. 1, no. 5, pp. 383–393, 2002. View at: Publisher Site | Google Scholar
  31. K. W. Lim, S. Amrane, S. Bouaziz et al., “Structure of the human telomere in K + solution: A stable basket-type G-quadruplex with only two G-tetrad layers,” Journal of the American Chemical Society, vol. 131, no. 12, pp. 4301–4309, 2009. View at: Publisher Site | Google Scholar
  32. R. Buscaglia, M. C. Miller, W. L. Dean et al., “Polyethylene glycol binding alters human telomere G-quadruplex structure by conformational selection,” Nucleic Acids Research, vol. 41, no. 16, pp. 7934–7946, 2013. View at: Publisher Site | Google Scholar
  33. L. Petraccone, A. Malafronte, J. Amato, and C. Giancola, “G-quadruplexes from human telomeric DNA: How many conformations in PEG containing solutions?” The Journal of Physical Chemistry B, vol. 116, no. 7, pp. 2294–2305, 2012. View at: Publisher Site | Google Scholar
  34. V. Viglasky, L. Bauer, K. Tluckova, and P. Javorsky, “Evaluation of human telomeric G-quadruplexes: The influence of overhanging sequences on quadruplex stability and folding,” Journal of Nucleic Acids, vol. 2010, Article ID 820356, 2010. View at: Publisher Site | Google Scholar
  35. M. Vorlícková, M. Tomasko, A. J. Sagi, K. Bednarova, and J. Sagi, “8-Oxoguanine in a quadruplex of the human telomere DNA sequence,” FEBS Journal, vol. 279, no. 1, pp. 29–39, 2012. View at: Publisher Site | Google Scholar
  36. D. Renčiuk, I. Kejnovská, P. Školáková, K. Bednářová, J. Motlová, and M. Vorlíčková, “Arrangements of human telomere DNA quadruplex in physiologically relevant K+ solutions,” Nucleic Acids Research, vol. 37, no. 19, pp. 6625–6634, 2009. View at: Publisher Site | Google Scholar
  37. M. Tomaško, M. Vorlíčková, and J. Sagi, “Substitution of adenine for guanine in the quadruplex-forming human telomere DNA sequence G3(T2AG3)3,” Biochimie, vol. 91, no. 2, pp. 171–179, 2009. View at: Publisher Site | Google Scholar
  38. M. Vorlíčková, J. Chládková, I. Kejnovská, M. Fialová, and J. Kypr, “Guanine tetraplex topology of human telomere DNA is governed by the number of (TTAGGG) repeats,” Nucleic Acids Research, vol. 33, no. 18, pp. 5851–5860, 2005. View at: Publisher Site | Google Scholar
  39. I. Kejnovská, K. Bednářová, D. Renčiuk et al., “Clustered abasic lesions profoundly change the structure and stability of human telomeric G-quadruplexes,” Nucleic Acids Research, vol. 45, no. 8, pp. 4294–4305, 2017. View at: Publisher Site | Google Scholar
  40. H. Konvalinová, Z. Dvořáková, D. Renčiuk et al., “Diverse effects of naturally occurring base lesions on the structure and stability of the human telomere DNA quadruplex,” Biochimie, vol. 118, article no. 4773, pp. 15–25, 2015. View at: Publisher Site | Google Scholar
  41. M. Babinský, R. Fiala, I. Kejnovská et al., “Loss of loop adenines alters human telomere d(AG(3)(TTAG(3))(3)) quadruplex folding,” Nucleic Acids Research, vol. 42, no. 22, pp. 14031–14041, 2014. View at: Publisher Site | Google Scholar
  42. C. W. Greider and E. H. Blackburn, “A telomeric sequence in the RNA of Tetrahymena telomerase required for telomere repeat synthesis,” Nature, vol. 337, no. 6205, pp. 331–337, 1989. View at: Publisher Site | Google Scholar
  43. I. Lubitz, D. Zikich, and A. Kotlyar, “Specific high-affinity binding of thiazole orange to triplex and g-quadruplex DNA,” Biochemistry, vol. 49, no. 17, pp. 3567–3574, 2010. View at: Publisher Site | Google Scholar
  44. P. Krafčíková, E. Demkovičová, and V. Víglaský, “Ebola virus derived G-quadruplexes: Thiazole orange interaction,” Biochimica et Biophysica Acta (BBA) - General Subjects, vol. 1861, no. 5, pp. 1321–1328, 2016. View at: Publisher Site | Google Scholar
  45. J. Mohanty, N. Barooah, V. Dhamodharan, S. Harikrishna, P. I. Pradeepkumar, and A. C. Bhasikuttan, “Thioflavin T as an efficient inducer and selective fluorescent sensor for the human telomeric G-quadruplex DNA,” Journal of the American Chemical Society, vol. 135, no. 1, pp. 367–376, 2013. View at: Publisher Site | Google Scholar
  46. J. B. Chaires, “Human telomeric G-quadruplex: Thermodynamic and kinetic studies of telomeric quadruplex stability,” FEBS Journal, vol. 277, no. 5, pp. 1098–1106, 2010. View at: Publisher Site | Google Scholar
  47. Y. Wang and D. J. Patel, “Solution Structure of a Parallel-stranded G-Quadruplex DNA,” Journal of Molecular Biology, vol. 234, no. 4, pp. 1171–1183, 1993. View at: Publisher Site | Google Scholar
  48. Y. Wang and D. J. Patel, “Solution structure of the Tetrahymena telomeric repeat d(T2G4)4 G-tetraplex,” Structure, vol. 2, no. 12, pp. 1141–1156, 1994. View at: Publisher Site | Google Scholar
  49. J. Dai, C. Punchihewa, A. Ambrus, D. Chen, R. A. Jones, and D. Yang, “Structure of the intramolecular human telomeric G-quadruplex in potassium solution: A novel adenine triple formation,” Nucleic Acids Research, vol. 35, no. 7, pp. 2440–2450, 2007. View at: Publisher Site | Google Scholar
  50. J. Dai, M. Carver, C. Punchihewa, R. A. Jones, and D. Yang, “Structure of the hybrid-2 type intramolecular human telomeric G-quadruplex in K+ solution: Insights into structure polymorphism of the human telomeric sequence,” Nucleic Acids Research, vol. 35, no. 15, pp. 4927–4940, 2007. View at: Publisher Site | Google Scholar
  51. Y. Wang and D. J. Patel, “Solution Structure of theOxytrichaTelomeric Repeat d[G4(T4G4)3] G-tetraplex,” Journal of Molecular Biology, vol. 251, no. 1, pp. 76–94, 1995. View at: Publisher Site | Google Scholar
  52. F. W. Smith, P. Schultze, and J. Feigon, “Solution structures of unimolecular quadruplexes formed by oligonucleotides containing Oxytricha telomere repeats,” Structure, vol. 3, no. 10, pp. 997–1008, 1995. View at: Publisher Site | Google Scholar
  53. J. Brčić and J. Plavec, “Solution structure of a DNA quadruplex containing ALS and FTD related GGGGCC repeat stabilized by 8-bromodeoxyguanosine substitution,” Nucleic Acids Research, vol. 43, no. 17, pp. 8590–8600, 2015. View at: Publisher Site | Google Scholar
  54. J. Kypr, I. Kejnovská, D. Renčiuk, and M. Vorlíčková, “Circular dichroism and conformational polymorphism of DNA,” Nucleic Acids Research, vol. 37, no. 6, pp. 1713–1725, 2009. View at: Publisher Site | Google Scholar
  55. A. I. Karsisiotis, N. M. Hessari, E. Novellino, G. P. Spada, A. Randazzo, and M. Webba Da Silva, “Topological characterization of nucleic acid G-quadruplexes by UV absorption and circular dichroism,” Angewandte Chemie International Edition, vol. 50, no. 45, pp. 10645–10648, 2011. View at: Publisher Site | Google Scholar
  56. D. Miyoshi, T. Fujimoto, and N. Sugimoto, “Molecular crowding and hydration regulating of G-quadruplex formation,” Topics in Current Chemistry, vol. 330, pp. 87–110, 2013. View at: Publisher Site | Google Scholar
  57. D. M. Gray, J.-D. Wen, C. W. Gray et al., “Measured and calculated CD spectra of G-quartets stacked with the same or opposite polarities,” Chirality, vol. 20, no. 3-4, pp. 431–440, 2008. View at: Publisher Site | Google Scholar
  58. C. Allain, D. Monchaud, and M.-P. Teulade-Fichou, “FRET templated by G-quadruplex DNA: A specific ternary interaction using an original pair of donor/acceptor partners,” Journal of the American Chemical Society, vol. 128, no. 36, pp. 11890–11893, 2006. View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Erika Demkovičová et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


MATERIALS AND METHODS

Cell Culture

BJ foreskin fibroblasts were grown in a 4:1 mixture of DMEM and medium 199 containing 10% iron-supplemented calf serum (Hyclone, Logan, UT) and gentamicin (25 μg/ml Sigma, St. Louis, MO) at 37°C in 5% CO2. Approximately 30 doublings before senescence, some cells were infected with a pLXSN retrovirus expressing HPV16 E6 and E7 proteins (Halbert et al., 1992) and selected on G418 (400 μg/ml) for 10–14 days.

Metaphase Spread Preparation and Cytogenetics Analysis

Cells were incubated with colcemid (Invitrogen, Carlsbad, CA) for 4 h, trypsinized, treated with hypotonic KCl buffer (0.075 M) for 30 min at 37°C, and washed several times with methanol-acetic acid (3:1) until a clean white cell pellet was obtained. Pellets were stored at -20°C until being dropped onto slides and GTG-banded using standard methods (Gustashaw, 1997). Metaphase images were analyzed using CytoVision software (Applied Imaging, San Jose, CA) at the cytogenetics laboratory, University of Texas Southwestern Medical Center.

Fluorescence In Situ Hybridization and Comparative Genomic Hybridization Analysis

One-day-old slides were rehydrated in phosphate-buffered saline (PBS, pH 7.5) for 15 min at room temperature, fixed in 4% formaldehyde in PBS for 2 min, and then washed in PBS three times for 5 min. Slides were treated with pepsin (1 mg/ml, pH 2.0) at 37°C for 10 min and washed twice for 2 min in PBS. The slides were again fixed in formaldehyde for 2 min, washed in PBS three times, and then dehydrated by 2-min serial incubations in 70, 90, and 100% ethanol and air-dried. The slides were denatured for 10 min at 78°C in a hybridization mixture (20 μl) containing 70% formamide, 15 ng 3′-Cy3-conjugated (CCCTAA)3 2′-deoxyoligonucleotide N3′-P5′ phosphoramidate telomeric probe, 0.25% (wt/vol) blocking reagent (Roche Molecular Biochemicals, Indianapolis, IN) and 5% MgCl2 in 10 mM Tris, pH 7.2, and then annealed for 2 h at room temperature. After two washes with 70% formamide, 0.1% bovine serum albumin, and 10 mM Tris, pH 7.2, and two washes with 0.15 M NaCl, 0.05% Tween-20, and 0.05 M Tris, the slides were dehydrated by 2-min serial incubations in ethanol and air-dried in the dark. The slides were then annealed without denaturation in the same hybridization solution as above but containing the complementary telomeric oligonucleotide (3′-Cy3-conjugated (TTAGGG)3 2′-deoxyoligonucleotide N3′-P5′ phosphoramidate telomeric probe, for 2 h at room temperature. The slides were washed again using the same wash steps as above, dehydrated by an ethanol series, and air-dried in the dark. Chromosomes were counterstained with Vectashield containing 4,6-diamidino-2-phenylindole-dihydrochloride (DAPI, 0.6 μg/ml final concentration, Vector Laboratories, Burlingame, CA) for chromosome identification.

The slides were scanned and metaphase spreads were automatically found using the Msearch mode of the Metafer software system (MetaSystem Hard and Software, Altlussheim, Germany). The metaphase images were autocaptured as 90 stacks separated by 0.2 μm by a Zeiss Axioplan 2 microscope (63×, 1.4 NA Plan-Apochromat oil immersion objective Thornwood, NY) with Cy3 and DAPI single-pass filter sets using the AutoCapt mode of the Metafer software system. The inverted DAPI image of each metaphase spread was karyotyped using the ISIS digital image analysis system (MetaSystem) and the intensity of telomere signal was measured by modified CGH analysis software (MetaSystem). Telomere ends were interpreted as telomere signal-free ends when the red (telomere) to blue (DAPI) ratio was zero (Figure 1).

Figure 1. Identification of the shortest telomeres. BJ metaphases at PD84 were serially hybridized to both C- and G-rich telomeric probes to maximize the detection of small telomeric regions. (A) Telomeres so short that they failed to give an observable signal (signal-free ends) were identified by CGH analysis. (B) The frequency of signal-free ends per 100 metaphases is shown for all telomeric ends from an analysis of 403 metaphases. BACs within 100 kb of the ends of the telomeres indicated by vertical arrows were used in the analysis shown in Figure 2.

For dual color fluorescence studies of telomeric variant repeats, 20 ng of the telomeric sequence variant probe (either Cy3-conjugated (TCAGGG)3 or (TGAGGG)3 2′-deoxyoligonucleotide N3′-P5′ phosphoramidate) was combine with 15 ng 3′-FITC-conjugated (TTAGGG)3 2′-deoxyoligonucleotide N3′-P5′ phosphoramidate probe and processed as above with a 10-min denaturation step and a 16-h 37°C hybridization. After washing and DAPI staining, metaphase spreads were digitally captured with precision Cy3/FITC/DAPI bandpass filter sets.

Immunofluorescence and Fluorescence In Situ Hybridization Analysis

Cells were grown on glass chamber slides for at least 96 h before immunostaining for γH2AX and 53BP1 proteins to minimize stress arising during trypsinization. Cells were briefly washed with PBS and fixed with freshly prepared 4% paraformaldehyde in PBS for 15 min at room temperature. After washing three times with PBS, cells were permeabilized for 15 min with PBS containing 0.2% Triton X-100, blocked for 1 h with 1.5% goat serum (Vector Laboratories) in PBS, and double-stained with 50 ng/μl mouse monoclonal anti-53BP1 (kindly provided by Dr. Chen, Mayo Clinic) and 10 ng/μl rabbit antiphospho-H2AX (Ser 139 Upstate Biotechnology, Lake Placid, NY) at 37°C for 1 h. Cells were subsequently washed three times for 5 min with PBS and incubated for 1 h with Alexa Fluor 568–conjugated anti-rabbit (1:200) and Alexa 488–conjugated anti-mouse antibodies (1:200 Jackson ImmunoResearch, West Grove, PA Molecular Probes, Eugene, OR) at room temperature. After washing the cells three times for 5 min with PBS, the cells were dehydrated by an ethanol series (70, 90, and 100%) for 1 min each and counterstained with Vectashield containing DAPI for nuclear identification. The slides were scanned, and positive nuclei were automatically found and captured by a Zeiss Axioplan 2 microscope (63×, 1.4 NA Plan-Apochromat oil immersion objective) with Texas Red, FITC, and DAPI single-pass filter sets using the AutoCapt mode of the Metafer software system. The images were saved as ninety z-stacks separated by 0.2 μm.

The slides were then processed for fluorescence in situ hybridization (FISH) analysis using the BAC probes RP3-416J7 (6pter), RP11-1197K16 (17qter), RP4-764O12 (7qter), RP11-1260E13 (17pter), RP11-974F22 (9qter), and RP11-81L3 (9q21, 62 Mb from 9qter Children's Hospital Oakland Research Institute). All of the BACs except 9q21 were within 100 kb of the telomere. BAC DNAs were extracted using a large-construct kit (Qiagen, Chatsworth, CA), and labeled with either Spectrum-orange or Spectrum-green conjugated dUTP after a nick translation kit protocol (Vysis, Downers Grove, IL). After washing three times for 5 min with PBS, the cells were dehydrated through an ethanol series as above and air-dried. DNA was denatured for 10 min at 70°C in hybridization buffer containing 70% formamide (Sigma) and 2× SSC solutions (pH 7.0). After denaturation, the cells were dehydrated by a cold ethanol series and air-dried. The cells were put through another two denaturation and dehydration steps above to fully remove γ-H2AX and 53BP1 fluorescent antibodies. A mix of 100 ng of one Spectrum-orange– and one Spectrum-green–conjugated BAC probes in a hybridization buffer containing 50% formamide, 10% dextran sulfate, and 1× SSC was denatured at 78°C for 5 min and then hybridized to the cells for 16 h at 37°C in a humidified chamber. The slides were washed for 2 min with 0.4× SSC/0.3% NP-40 at 70°C, and 1 min in 2× SSC/0.1% NP-40 at room temperature. After being air-dried, the cells were counterstained with DAPI. The same nuclei identified for γH2AX/53BP staining above were automatically found and captured with Spectrum-orange, Spectrum-green, and DAPI single-pass filter sets using the AutoCapt mode of the Metafer software system. The images were saved as 90 z-stacks separated by 0.2 μm. The nuclei were analyzed using the ISIS digital image analysis system (MetaSystem). The same slide was then reprobed with a different set of BAC probes after again removing the previous signals with the denaturation steps described above. The physical mapping sites of these BACs was confirmed using regular FISH analysis of at least 20 metaphase spreads of normal fibroblast cells per BAC, and no polymorphism was observed.


Electronic supplementary material

13059_2007_1632_MOESM1_ESM.pdf

Additional data file 1: The p-arm sequence as given was attached at the p-arm coordinate, and the reverse complement of the q-arm sequences were attached at the indicated q-arm coordinates (PDF 12 KB)

13059_2007_1632_MOESM2_ESM.pdf

Additional data file 2: Duplicon modules were defined by processing the results of BLAST searches of in-house curated subtelomere query sequences (see text and Materials and methods). Colinear and properly oriented pairs of BLAST matches to the query sequence were joined into a chain if not separated by greater than 25 kb and not uninterrupted by other hits from the same query sequence. Groups of chained blast hits spanning ≥1 kb of the subject sequence were defined as duplicons. These methods were tolerant of insertions and deletions <25 kb in size (for example, of retrotransposons) but not tolerant of rearrangements. (PDF 61 KB)

13059_2007_1632_MOESM3_ESM.pdf

Additional data file 3: Each module is defined by a set of pairwise alignments, and each reference sequence in these sets is represented as a single row in this table. The first column (module) contains an identifier for the particular copy of the module (duplicon) indicated in the next three columns. These columns (query sequence) list the subtelomeric location of the query sequence defining the module (see Materials and methods). The 'aligned sequences' column shows the locations of other duplicons in this module, matched by the query. The coordinates in this column refer either to our published subtelomeric assemblies (designated by chromosome and arm p or q) or the human genome build 35 (all other designations). The %IDeach is percent nucleotide sequence identity across the chained pairwise alignment, excluding masked sequence. The %IDavg is the average percent identity of all pairwise alignments in the module. This was the number used for %ID in charts and analyses in this paper. The final column shows a 1 if the module contains intrachromosomal non-subtelomeric sequence matches, and 0 if it does not. (PDF 772 KB)

13059_2007_1632_MOESM4_ESM.pdf

Additional data file 4: This table shows the numbers of duplicon modules defined per subtelomere. The complete list of these modules is included in Additional data file 3. The 'subtelomeric' column shows the total number of modules for each subtelomere region (since each module is defined by a set of subtelomeric coordinates). The 'non-subtelomeric' column lists the subset of these modules with homology to duplicated regions that lie outside the subtelomeres. A comparison of these non-subtelomeric duplicons to the subtelomeric copies is included in Figure 3 and in Additional data file 5. The 'intra-chromosomal' column indicates the subset of modules with homology to a different region on the same chromosome. (PDF 27 KB)

13059_2007_1632_MOESM5_ESM.pdf

Additional data file 5: Subtelomeric regions correspond to the set of query sequences enumerated in Additional data file 1 and the average percent identity across the sequences to which each is aligned. The non-subtelomeric regions correspond to the aligned sequences that fall outside the subtelomere regions (the subset listed in Additional data file 2). (PDF 42 KB)

13059_2007_1632_MOESM6_ESM.pdf

Additional data file 6: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 846 KB)

13059_2007_1632_MOESM7_ESM.pdf

Additional data file 7: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 543 KB)

13059_2007_1632_MOESM8_ESM.pdf

Additional data file 8: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 531 KB)

13059_2007_1632_MOESM9_ESM.pdf

Additional data file 9: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 602 KB)

13059_2007_1632_MOESM10_ESM.pdf

Additional data file 10: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 474 KB)

13059_2007_1632_MOESM11_ESM.pdf

Additional data file 11: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 758 KB)

13059_2007_1632_MOESM12_ESM.pdf

Additional data file 12: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 552 KB)

13059_2007_1632_MOESM13_ESM.pdf

Additional data file 13: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 630 KB)

13059_2007_1632_MOESM14_ESM.pdf

Additional data file 14: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 564 KB)

13059_2007_1632_MOESM15_ESM.pdf

Additional data file 15: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 793 KB)

13059_2007_1632_MOESM16_ESM.pdf

Additional data file 16: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 657 KB)

13059_2007_1632_MOESM17_ESM.pdf

Additional data file 17: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 654 KB)

13059_2007_1632_MOESM18_ESM.pdf

Additional data file 18: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 693 KB)

13059_2007_1632_MOESM19_ESM.pdf

Additional data file 19: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 472 KB)

13059_2007_1632_MOESM20_ESM.pdf

Additional data file 20: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 718 KB)

13059_2007_1632_MOESM21_ESM.pdf

Additional data file 21: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 474 KB)

13059_2007_1632_MOESM22_ESM.pdf

Additional data file 22: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 532 KB)

13059_2007_1632_MOESM23_ESM.pdf

Additional data file 23: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 641 KB)

13059_2007_1632_MOESM24_ESM.pdf

Additional data file 24: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 544 KB)

13059_2007_1632_MOESM25_ESM.pdf

Additional data file 25: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 607 KB)

13059_2007_1632_MOESM26_ESM.pdf

Additional data file 26: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 834 KB)

13059_2007_1632_MOESM27_ESM.pdf

Additional data file 27: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 513 KB)

13059_2007_1632_MOESM28_ESM.pdf

Additional data file 28: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 520 KB)

13059_2007_1632_MOESM29_ESM.pdf

Additional data file 29: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 501 KB)

13059_2007_1632_MOESM30_ESM.pdf

Additional data file 30: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 538 KB)

13059_2007_1632_MOESM31_ESM.pdf

Additional data file 31: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 542 KB)

13059_2007_1632_MOESM32_ESM.pdf

Additional data file 32: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 659 KB)

13059_2007_1632_MOESM33_ESM.pdf

Additional data file 33: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 551 KB)

13059_2007_1632_MOESM34_ESM.pdf

Additional data file 34: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 978 KB)

13059_2007_1632_MOESM35_ESM.pdf

Additional data file 35: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 531 KB)

13059_2007_1632_MOESM36_ESM.pdf

Additional data file 36: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 594 KB)

13059_2007_1632_MOESM37_ESM.pdf

Additional data file 37: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 690 KB)

13059_2007_1632_MOESM38_ESM.pdf

Additional data file 38: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 504 KB)

13059_2007_1632_MOESM39_ESM.pdf

Additional data file 39: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 790 KB)

13059_2007_1632_MOESM40_ESM.pdf

Additional data file 40: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 556 KB)

13059_2007_1632_MOESM41_ESM.pdf

Additional data file 41: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 503 KB)

13059_2007_1632_MOESM42_ESM.pdf

Additional data file 42: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 661 KB)

13059_2007_1632_MOESM43_ESM.pdf

Additional data file 43: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 596 KB)

13059_2007_1632_MOESM44_ESM.pdf

Additional data file 44: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 627 KB)

13059_2007_1632_MOESM45_ESM.pdf

Additional data file 45: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 508 KB)

13059_2007_1632_MOESM46_ESM.pdf

Additional data file 46: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 548 KB)

13059_2007_1632_MOESM47_ESM.pdf

Additional data file 47: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). (PDF 562 KB)

13059_2007_1632_MOESM48_ESM.pdf

Additional data file 48: This table shows blocks of modules that occur exclusively in subtelomere regions. The first column gives an identifier for each block. The next three columns (query sequence) give the subtelomeric location that defines the block (which will consist of one or more adjacent modules). For completeness, in some cases aligned sequences have been included in these blocks even though they fell below thresholds for module definition. The percent identity of the chained alignments between the sequences is indicated (excluding masked sequence). Named genes/gene families that have transcripts matching part or all of the respective duplicon blocks are listed in the last column. Block 7 is the D4Z4 tandem repeat on the 4q and 10q subtelomeres, for which no percent identity is calculated because of the very large number and diverse percent identities of the BLAST alignments among tandem D4Z4 repeats. (PDF 22 KB)

13059_2007_1632_MOESM49_ESM.pdf

Additional data file 49: This table shows blocks of modules that are adjacent to the ends of finished telomeres (see Materials and methods). The columns describe the same categories of information as indicated in Additional data file 48. A limited set of non-subtelomeric copies of subterminal duplicons exist (Additional data file 49). Their genomic locations suggest sites of ancestral telomere-associated chromosome rearrangements, including a well-documented telomere fusion at 2q13-q14 [37] that contains representatives of subterminal duplicon families A, B, C, and D (Additional data file 49). The non-subtelomeric site of a duplicon from family D at 3p12.3 is the tip of an extended duplication region the DNA on the centromeric flank of this site contains 4q and 10q subtelomere homology, including beta satellite repeat structure resembling part of the D4Z4 repeat. Subterminal family F contains several non-subtelomeric sites of duplicons those on chromosomes 22q, 14q, and 12p are very close to the respective centromeres (Additional data file 49), indicating potential ancestral inversion of a chromosome arm followed by duplication of pericentromeric sequences as a mechanism for the genesis of the non-subterminal copies of this subterminal sequence family. The sequence similarity between subterminal duplicon copies within a family is mainly in the 90-96% range for subterminal blocks A, B, and D (Table 2 see Additional data file 49 for the rare exceptions.). As with the subtel-only blocks, some of these duplicons correspond to only part of the subterminal block sequence. There is also some overlap in sequences occupied by subterminal duplicon blocks A, B, and D this is reflected in their occupancy of parts of the same transcript families RPL23A7 and FAM41C (Table 2). The cross-family homologies between subterminal blocks A, B, and D are also in the 90-96% identity range but the positions of the duplicons within the blocks vary and are located at different distances from the (TTAGGG)n tract also, there are several alternative organizations of high-copy repetitive elements (masked and not examined in detail in this study) within these subterminal blocks. Thus, there might be more frequent shuffling of subterminal sequences than sequences located more centromerically, at least within a subset of subtelomere alleles this idea is broadly consistent with an earlier model of subtelomere structure featuring compartments with distinct functional properties [9]. Further refinement of the classification of these subterminal families appears feasible and will benefit from more extensive sampling of (TTAGGG)n-adjacent sequences from additional alleles. Subterminal Block F contains one duplicon on 10p with very high similarity to the 18p query sequence, suggesting a very recent duplication event the remaining duplicons were all in the 91-94% identity range. Block C has the highest sequence similarity among all subterminal duplicon sequence families, and has a copy at the 2q fusion locus. Block E (96-97%) is unusual in that it corresponds to a portion of subtelomere-only duplicon family 6 (Table 1), and is the only subterminal duplicon sequence family with subtel-only properties. This particular sequenced allele of 17p might have formed by the truncation of a chromosome end within this large subtelomere-only duplicon, as there is mapping evidence for several longer alleles of the 17p telomere (H Riethman, unpublished). It is interesting to note that (TTAGGG)n tracts at 17p and, indeed, on this particular allele of 17p tend to be consistently among the shortest in the human genome [19, 51]. (PDF 44 KB)

13059_2007_1632_MOESM50_ESM.pdf

Additional data file 50: Comparison of subtel-only and subterminal duplicon blocks defined in this work with the subtelomeric homology blocks reported in Linardopoulou et al. [12] (PDF 17 KB)

13059_2007_1632_MOESM51_ESM.pdf

Additional data file 51: Candidate transcripts were identified by blasting the representative subtelomere-only query sequences (Additional data file 48) against the NCBI RefSeq mrna database (downloaded 24 July 2006) [52]. Human mRNAs with 90% or greater homology were run through Spidey [53] against the set of subtelomere-only duplicon block representatives. This table has been filtered to those hits above 95% identity according to the Spidey predictions. The first and second columns indicate the subtelomere-only block and RefSeq accession that align to each other. The third is the description line from the RefSeq database. The fourth and fifth columns are the percent identity and percent coverage of the aligned mRNA as reported by Spidey. (PDF 29 KB)

13059_2007_1632_MOESM52_ESM.pdf

Additional data file 52: Candidate transcripts were identified by blasting the representative subterminal query sequences (Additional data file 49) against the NCBI RefSeq mrna database (downloaded 24 July 2006) [52]. Human mRNAs with 90% or greater homology were run through Spidey [53] against the set of subterminal duplicon block representatives. The first and second columns indicate the subterminal block and RefSeq accession that align to each other. The third is the description line from the RefSeq database. The fourth and fifth columns are the percent identity and percent coverage of the aligned mRNA as reported by Spidey. (PDF 72 KB)