#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Comparative Analysis of DNA Replication Timing Reveals Conserved Large-Scale Chromosomal Architecture


Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome.


Published in the journal: Comparative Analysis of DNA Replication Timing Reveals Conserved Large-Scale Chromosomal Architecture. PLoS Genet 6(7): e32767. doi:10.1371/journal.pgen.1001011
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1001011

Summary

Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome.

Introduction

Mammalian genomes are complex and heterogeneous entities, consisting of many thousands of functional elements that are packed into chromosomes and organized in nuclear space. Our understanding of the global implications of genome organization, its effect on gene regulation and its evolutionary consequences is still quite limited. Recent advances in epigenomic profiling have begun to uncover large-scale genomic domains that are marked with specific histone modifications [1][5], interact with important nuclear landmarks [6] or replicate as units at specific times during S phase [7][14]. Data on inter chromosomal interactions hint as to how such large scale domains may be organized in the three-dimensional nucleus structure [15]. Yet the origin of large-scale genome organization is unclear: How does the genome self-organize into domains? How are these domains exploited for regulation and how can the cell propagate them to daughter cells? From an evolutionary perspective, the extent to which the genome's domain organization is conserved is unclear, as are the evolutionary mechanisms that contribute to such conservation [16], [17]. Even if domains are conserved, the origin of such conservation may have several explanations. If domains are functionally important, for example as scaffolds for gene clusters [18], we may expect genome rearrangements that break them to be selected against. On the other hand, if genome rearrangements are enriched at particular hotspots [19], [20], or are affected by various epigenetic factors, the genome may conserve domains with low rates of rearrangements without selection.

Genomic replication domains were shown to exhibit a particularly robust large-scale behavior. Domains of tens of kilobases to megabases collectively replicate at particular timings during S-phase in mice [7], [8], [21], human [9], [12][14], [22], [23] and flies [10], [11]. Such modular behavior was suggested to be driven by the coordinated firing of a large number of spatially clustered origins of replication. Recently, studies in mouse and human cells reveals that approximately one third of the genome changes its ToR between tissues [8], [21], [23]. DNA replication timing was shown to be highly correlated with other genomic features, most notably the regional G+C content but also gene density, gene expression, open chromatin and mutability (reviewed in [24]). Genomic replication domains therefore naturally describe an important type of large-scale genomic organization and are ideal markers for studying such organization from an evolutionary perspective.

In this work, we measure and compare the time of replication of the human and mouse genomes. We use the data to test the correlation between the divergence of large-scale chromosome structure and the divergence of replication timing. We find that while chromosome structure is constantly being challenged by evolution, the genome's time of replication is remarkably conserved. Our analysis of the correlation between genome rearrangements, time of replication and chromosomal conformation suggests that the evolution of chromosome architecture may be confined by the static and dynamic organization of the genome in the nucleus. These results put some of the open questions on chromosome structure and function in a new evolutionary perspective and suggest that additional comparative analysis may be important for their investigation

Results

We followed the technical approach of Woodfine et al. [13], [14], and quantified the time of replication (ToR) of mammalian genomic replication domains by sorting G1 and S phase cells, and measuring the ratio between their DNA contents using custom design two dye microarray technology (Agilent Technologies; Materials and Methods, Figure S1). We confirmed that the ToR profiles thus derived fit well with mouse ToR profiles generated by us and others using alternative protocols (Farkash-Amar et al. [7]  = 0.8 Figure S2, Hiratani et al. [8]  = 0.72). We measured ToR profiles of human fibroblasts (FFT) and lymphoblasts (Molt-4), and of mouse embryonic fibroblasts (MEF) and lymphoblasts (L1210), using ∼105K genomic probes that were designed independently for each species (GEO database, GSE17236). The standardized experimental protocols and downstream computational analysis enabled a rigorous cross-species and cross-tissues comparative ToR analysis. The human and mouse genome differ from each other by a large number of genome rearrangements, but over 90% of the genomes can be mapped into corresponding syntenic regions. We used the human-mouse genome alignment [25][28] to project the mouse ToR data onto syntenic human orthologous regions. The comparison of human and mouse ToR was done over large genomic bins (50Kb) and was based on non-orthologous probes, which ensured that our results were not affected by hybridization biases. The use of non orthologous probes can be justified by the fact that ToR changes slowly along the genome, i.e., there is a high degree of autocorrelation of ToR in relatively large genomic domains (up to 4Mb, see Figure S3) and thus two probes from the same region should report the same ToR. The comparative ToR map of the human and mouse genome we computed covers 1.38 gigabases (49% of the human genome).

ToR is conserved between human and mouse

The human and mouse genome sequences differ locally on about 30% of the nucleotides [29]. Furthermore, the two species are separated by hundreds of large-scale genome rearrangement events (such as fusions, translocations and inversions). Despite these differences the correlation between human and mouse ToR is striking. As shown in Figure 1 (see also Figure S4 and Figure S5), the global replication landscape of the human genome matches that of the mapped mouse regions (overall Spearman  = 0.74 for fibroblasts, and  = 0.78 for lymphoblasts, P<10−100). The levels of human-mouse ToR correlations are similar to those derived by comparing the two cell types within each species ( = 0.7 for human, and  = 0.83 for mouse). This correlation confirms previous observation of ToR conservation which were based on analysis of ToR of genes [7] and expands it to the entire genome. Our estimations of the extent of ToR conservation are higher than those proposed recently [30], probably due to the more careful genome alignment procedure we used here. Our data show that ToR conservation is higher in gene deserts than in gene rich domains (Figure 1C and Figure S6), suggesting that ToR conservation is not a simple consequence of gene expression conservation. Furthermore, the observed conservation is not likely to be a consequence of global sequence conservation, since sequence divergence and ToR divergence are uncorrelated (Spearman  = 0.02). The ToR landscape consists of large-scale domains, as shown before for the mouse genome [7], and we will focus below on the ToR evolution at these scales. We reconfirmed that our 50Kb tiling resolution is capturing most of the large scale ToR structure in the genome by analysis of ToR in one human and one mouse chromosomes that were densely tiled on our arrays (Figure S7).

Fig. 1. Conservation of time of replication in human and mouse cells.
Conservation of time of replication in human and mouse cells.
(A) Conservation of the replication profiles. Shown are ToR profiles for human chromosomes 1, 6, 15 and 21 for human fibroblasts (FFT, light blue), human lymphoblasts (MOLT4, dark blue), and the orthologous time of replication profiles in mouse embryonic fibroblasts (MEF, light green), and mouse lymphoblasts (L1210, dark green). Below each chromosome we show the human-mouse synteny map, color coded according to the corresponding mouse chromosomes. (B) Cross-tissue and species correlations. A two-way comparison between the two human and two mouse ToR profiles. Spearman correlation coefficients are specified in each scatter plot. (C) ToR conservation is higher in gene deserts than in gene rich domains. ToR spearman correlation (human versus mouse) as a function of gene density, represented by the number of transcription start sites per 50kb window.

Spatial analysis confirms tissue-specific ToR conservation

To systematically characterize genomic domains with evolutionary conserved, diverged or tissue specific time of replication, we used the spatial clustering algorithm [31]. The algorithm works in an unsupervised fashion to identify and characterize spatial clusters. A spatial cluster is a collection of contiguous genomic regions that display a similar multivariate ToR trend (see Materials and Methods). Data that do not fit any of the clusters is attributed to a default background cluster. The algorithm thus identifies frequently recurring patterns in the data, while taking into account the strong spatial coupling between adjacent genomic loci. The algorithm can in theory discover clusters that display ToR conservation or any mixture of diverged ToR trends, allowing the above conclusion on global ToR conservation to be revaluated from a regional perspective. Analysis of the human and mouse aligned maps revealed that 92% of the mapped regions fell into four spatial clusters, all of which display ToR conservation (Figure S8). Interestingly, two of the inferred clusters (representing 25% and 15% of the probes respectively) exhibit distinct tissue-specific replication patterns (clusters 3 and 4 in Figure S8). The ToR difference between the two cell types is conserved between mouse and human, suggesting that tissue-specific ToR is evolutionary conserved. This was directly confirmed by computing profiles of the difference in ToR between fibroblasts and lymphoblasts (for both human and mouse), and measuring the correlation between these profiles across the two species ( = 0.22, p<10−292). Analysis of the four inferred clusters in light of other genomic features confirmed previous observations that late replication regions are significantly poor in genes and transcription and showed additional correlations with genomic features, including increased frequency of interaction with the nuclear lamina and biased distance from the telomere (Figure S9).

G+C content conservation does not explain ToR conservation

Replication time was previously shown to be correlated with the genomic regional G+C content [7], [12][14]. Indeed, we observe a strong correlation between ToR and G+C content in both human and mouse (Figure 2A, Figure S10). The regional G+C content is known to be conserved between mouse and human (Figure 2B). On the other hand, ToR structure is known to affect mutability and may therefore contribute to G+C content heterogeneity [32], [33]. To test if conservation of ToR and G+C content are two aspects of the same phenomenon we subtracted from the ToR of each probe the mean of ToR in probes with similar G+C contents, forming a residual ToR profile that was uncorrelated with the G+C content by design (see Materials and Methods, Figure 2C). We found that the residual ToR profiles are still highly correlated between mouse and human, which demonstrates that the conservation of ToR between species is not a mere consequence of slow G+C content divergence. Furthermore, the independence of ToR conservation from G+C conservation is supported by the conservation of tissue specific differences in ToR (discussed above), since such tissue specific differences cannot possibly be a direct consequence of G+C content. We did not find a significant correlation between ToR conservation and sequence conservation (see Materials and Methods), which suggests that ToR conservation is not reflecting global sequence conservation, but rather conservation of short subsequences at specific regulatory elements.

Fig. 2. G+C content conservation does not explain ToR conservation.
G+C content conservation does not explain ToR conservation.
(A) Fitting ToR to G+C content. Shown are moving averages of ToR as a function of G+C content (in 50Kb bins) for the four ToR profiles (using 50 equal-sized G+C content bins ranging between 0.2 and 0.7). (B) Conservation of G+C content between human and mouse. Shown are G+C content in 50Kb segments of orthologous human and mouse genome segments. Spearman correlation is given on the plot. (C) Residual ToR is conserved. We computed the residual ToR (see Materials and Methods) for each of the experiments by subtracting the G+C to ToR trend (depicted in 2A) from the original ToR value. The residual profiles therefore lack any correlation to the regional G+C content. Shown are two-way comparisons among the residual profiles, demonstrating highly significant ToR correlation even after the G+C correlation has been normalized.

ToR and chromosomal interactions

The conserved large-scale genomic ToR domains we have characterized, with their correlation to different genomic features, are likely to represent physical chromosomal domains with specific nuclear preferences. We analyzed published Hi-C chromosomal interaction data [15], which was measured on a human lymphoblastoid cell line (GM06990), and tested the interaction preferences of 4 equal-sized groups, each replicating in one of the quarters of the S phase. We measured the amount of interactions (paired-end reads) within groups and between groups, and studied it separately for intra- and inter-chromosomal interactions (Materials and Methods). We first found that late replicating domains are generally less represented in the Hi-C dataset, either due to their relative isolation or due to technical issues with chromatin extraction and shearing (Figure 3A). After normalizing this effect, we found that domains with similar ToR tend to trans-interact with each other more often than with domains with different ToRs (Figure 3B). When examining intra-chromosomal interactions, we found that early replicating domains have more Hi-C interactions than late replicating domains (Figure 3C). Interestingly, the additional chromosomal interactions of early replicating domains are primarily short-ranged (<500kb). This result is in agreement with the previously noted distributions of interaction distances for open (early replicating) and closed (late replicating) chromosomal domains [15]. The interactions of late replicating domains are relatively more biased toward long distances, while more of the interactions of early domains are representing local interactions.

Fig. 3. ToR and chromosomal interactions.
ToR and chromosomal interactions.
(A) Late replicating domains are less accessible. The number of inter chromosomal interactions involving each replication time group are shown. We show the for each replication group the number of trans interactions that include this group (we call this measure the accessibility of the group). The late replicating group takes part in less interactions. (B) Regions with similar ToR tend to trans-interact. We show for each pair of replication groups the interaction ratio (log scale), which is the number of trans interactions, normalized according to the accessibility of the group (shown in Figure 3A). There is a bias towards self interactions within the early and the late replicating groups. (C) Early replicating DNA is more involved in close cis interaction. We show for each ToR group the number of intra-chromosomal interactions, divided into close interactions (<500K) and far interactions (>500K).

ToR and genome rearrangements

Since the ToR domain structure of the human and mouse genome is highly conserved in alignable regions, we next focused on the conservation and divergence patterns near breakpoints. Rearrangements continuously reorganize the large-scale layout of the genome through translocations, inversions and duplications. Such events may shuttle a genomic region from one genomic context into an entirely different one. The dynamics of ToR divergence that follow rearrangements can hint at the mechanisms that regulate ToR. If replication initiation is mostly determined by local elements we can expect low ToR divergence even around rearrangements, but if the chromosomal neighborhood is a major factor in ToR regulation we would expect significant ToR divergence there. We used the inferCARs algorithm [34] to extract a collection of 1382 syntenic blocks (>50Kb) shared by human, rhesus, mouse, rat and dog, which cover over 92% of the human genome. Using dog as an outgroup, we identified 880 simple fusion events (Materials and Methods), which are events that can be associated to a unique branch in the phylogenetic tree. In Figure 4A we show the phylo-tree and the number of events on each branch. Most of these events are between domains of similar ToR (Figure S11), yet we are interested in events that fused domains of different ToR. An example of a simple event that is assigned to the common mouse-rat lineage branch is shown in Figure 4B. We considered two alternative scenarios for ToR divergence following fusion of an early replicating domain and a late replicating domain (Figure 4C). The first scenario involves an early-to-late invasion, where the late side accommodates and advances its replication. The opposite scenario involves late-to-early invasion, where the replication of the originally early domain is delayed following fusion. Examples of both types of invasions are given in Figure 4D. Analysis of all simple fusion events (Figure 4E) indicated that near breakpoints, ToR is more diverged than expected by chance (with more cases than expected representing significant divergence near fusion points, hyper geometric P<0.00015). This analysis provided us with a detailed list of genomic regions that went through ToR divergence following a change in genomic context (Table S1), opening new evolutionary avenues for further refining our understanding of ToR regulation. For example, we note that early-to-late invasion is more common than late-to-early invasion (15 versus 7 events for fibroblasts, 23 versus 14 events for lymphoblasts, see Figure S12 for more examples on both cell types). The mechanisms underlying ToR divergence can only be hypothesized given the current data (ToR measurements of a third outgroup species is needed to reconstruct evolutionary histories with higher certainty). For a subset of the early-to-late invasions (e.g., Figure S12C) the most simple mechanism, in which a single replication fork crosses the fusion point, is a valid explanation. In other cases, (e.g., Figure 4D), divergence encompass a territory that is much larger than the scope of a single fork. Importantly, despite these specific cases, genome rearrangements are not causing massive divergence of replication timing, and the overall replication structure is largely conserved between human and mouse, suggesting evolution typically shuffles ToR domains rather than breaking and fusing them.

Fig. 4. ToR divergence at genome rearrangement sites.
ToR divergence at genome rearrangement sites.
(A) Phylogenetic tree. The phylogenetic tree used in our analysis, showing the number of simple fusion events on each branch. (B) Fusion event illustrated. Two syntenic blocks (colored green and orange) are adjacent in mouse and rat, and distal in human dog and rhesus. The branch associated with the event is marked in red on the phylo-tree. (C) Two possible divergence patterns following a fusion event. On top we show schematically the ancestral ToR of two distal segments (early replicating and late replicating domains) prior to fusion. After the fusion event the ToR can either propagate from the early domain into the late domain (early-to-late invasion), or from the late domain into the early domain (late-to-early invasion). (D) Invasion examples. We depict the mouse lymphoblast ToR with a black line (confidence intervals are shown in grey). The human ToR as projected onto the mouse genome is depicted with blue dots. The two segments that got fused are colored green (left segment) and orange (right segment). The approximated ToR near the breakpoint prior to fusion is depicted with a colored circle (green and orange) for both segments. Known genes are depicted with green rectangles below each graph. On top we show an example of the more common case of early-to-late invasion, while on bottom we show a late-to-early invasion (see Figure S12 for data on both cell types and more details). (E) ToR divergence at distal fusion sites. Shown is a scatter plot of the ToR divergence on the late side (segment that had later ToR prior to fusion, Y axis) versus the ToR divergence on the early side (segment that had earlier ToR prior to fusion, X axis). We draw the mean ToR divergence ± its standard deviation as vertical and horizontal grid lines (gray). We classify an event as an early-to-late invasion (E2L) if on the early side the divergence is close to the mean divergence (up to the standard deviation), and the late side divergence is greater than the sum of the mean divergence and its standard deviation (colored red). Similarly, we classify an event as an late-to-early (L2E) invasion if the late side the divergence is close to the mean divergence (up to the standard deviation), and the early side divergence is smaller than the mean divergence minus its standard deviation (colored blue). For fibroblasts we counted 15 E2L events versus 7 L2E events. For lymphoblasts we counted 23 E2L events versus 14 L2E events. In all cases we computed a hyper-geometric test versus 10,000 random points in the genome (plotted in gray) to verify that these counts are significantly diverged from the background (for L2E P value <10−5, for E2L P value <0.025).

Distal rearrangements are preferentially fusing early replicating domains and are enriched for distal Hi-C interactions

We focused next on fusion events in the mouse lineage that involve segments that are distal (located more than 5Mb apart or on a different chromosome) in both human, dog and rhesus. We observed that these events preferentially involve early replicating domains (Figure 5A). Furthermore, rearrangements preferentially bring together genomic fragments of similar ToR (Figure 5B). This may be a consequence of a preference for rearrangements that involve early replicating regions, or a general mechanistic tendency to fuse breakpoints occurring at the same time in S phase. Another possibility is that the fusion of segments with very different ToR is more frequently deleterious since it violates the overall organization of the genome. We limited the analysis further, focusing on 55 fusion events on the mouse lineage for which the two human domains reside on different chromosomes (while their mouse orthologs are adjacent), and examined the level of interaction between the segments' ends, as reflected in the Hi-C dataset. We found that the interaction probability between these specific set of pairs is above the background (Figure 5C), which suggests that rearrangements occur between parts of the genome that are more often occupying the same nuclear compartment. Taken together, replication timing and Hi-C data suggest that genome rearrangements are correlated with the replication and nuclear architecture at an evolutionary scale, and that breakpoints generally shuffle genomic segments with similar ToR and prior chromosomal proximity. We anticipate that future data on chromosomal interactions at higher resolution and for additional species will allow a more quantitative estimation of the effect of replication timing and physical interactions on rearrangement rates.

Fig. 5. ToR and Hi-C preferences of distal rearrangements.
ToR and Hi-C preferences of distal rearrangements.
(A) Distal fusion events involve early replicating domains. Each distal murine fusion event (inter-chromosomal or spanning at least 5MB in all non-rodent species) is associated with 2 human ToR groups (of the two segment ends that got fused). We show the breakdown of fusion events according to ToR groups. (B) Distal fusion events bring together domains with similar ToR. We split the genome to two equal groups (E:early, L:late), and counted for all murine distal fusions the number interaction of all possible pairs (E-E, E-L, L-L). We show the ratio between the counts and a random control (log10 scale), showing a significant preference for fusions of the same ToR group. (C) Distal fusion events are enriched for Hi-C interactions. We focused on murine distal fusion event that involves a pair of human sites that reside on different chromosomes. For each pair we counted the number of Hi-C interactions in the human genome (between segments of 1Mb centered on the breakpoint), which reflects chromosomal proximity. As a control we shuffled the pairs, getting random pairs of sites in the human genome that reside on different chromosomes. Using 75% as a threshold (dashed line), we tagged each pair as interacting (1st quartile) or non-interacting (other quartiles). Breakpoints are enriched in the interacting group (1.6 enrichment and P<0.01 in a Hyper geometric test). Shown are density plots of the number of reads between mouse fused pairs (red), between the shuffled control pairs (grey), and between a collection of random pairs selected over all of the genome (black).

Discussion

We have shown that the mammalian genome is subject to a conserved replication timing structure that divides the genome into megabase-scale segments of coordinately replicating sequences. As suggested before [7], [8], this structure is highly correlated with other genomic traits (G+C content, gene density) but we show that its conservation is independent of these features. The human and mouse genomes differ significantly in their chromosomal landscape, which is a result of dramatic events like chromosomal fusion, switch to acrocentric layout and numerous large-scale rearrangements [29]. If ToR regulation was influenced by the location in the one-dimensional chromosomal space, such changes should have resulted in major ToR divergence between mouse and human, and in fragmentation of the ToR domain structure. As such divergence is not observed there must be some mechanism preventing it from occurring.

The genomic features that regulate ToR locally are currently uncharacterized. In theory, such elements could actively regulate the ToR of their surrounding genomic domains, and their conservation may be a consequence of strong selection working to conserve a functionally important ToR landscape. Under these assumptions, ToR would serve as a scaffold for the emergence of domains of active genes, thereby explaining the correlation between gene activity and early replication. Alternatively, local ToR regulation may be a consequence of gene activity rather than an enabler of it. According to this scenario, ToR conservation may be an indirect result of the conservation of gene activity and not a participant in driving it. However, the high conservation of ToR in gene deserts regions suggests that this latter possibility cannot fully account for the data. The more common early-to-late invasions of ToR that we observed (Figure 4E) support the notion of a simple in cis mechanism in which early replicating domains are positively regulated to retain their ToR regardless of their genomic context, while late replicating domains are passively regulated by lack of predisposition to early replication.

The conserved ToR landscape and the global map of chromosome interactions of the human genome reveal how genome rearrangements interact with the chromosomal architecture of the genome in four dimensions (the nucleus space and time of replication). We showed that rearrangements tend to bring together domains with similar ToR, and that pairs of loci that were fused in the mouse lineage, tend to trans-interact in the human genome. One explanation for these observations is that the mechanisms of breakpoint and repair increase the likelihood of rearrangements that involve loci with similar replication timing, or prior chromosomal proximity. Indeed, chromosomal proximity was recently suggested to underlie the cancer-prone translocation TMPRSS2-ERG [35]. An alternative explanation for the association between ToR, chromosomal interactions and breakpoints, may be a selective constraint on rearrangements that would significantly change the nuclear architecture by moving a domain to a foreign genomic context. A large-scale and deleterious change in ToR due to translocation was observed in human lymphocytes in cytogenetic resolution [36]. Data from additional species would allow for true phylogenetic reconstruction of the ToR history in different parts of the genome, and could help to resolve and refine the above hypotheses. We expect such data to provide unique insights into the regulation of DNA replication and to expand significantly our understanding of this key aspect of genome organization.

Materials and Methods

Cell culture

Mouse L1210 lymphocytic leukemia cells (ATCC CCL219) were grown in CO2-independent L-15 medium supplemented with 2 gr/L dextrose. Human Molt-4 acute lymphoblastic leukemia cells (ATCC CRL-1582) were grown in RPMI. Mouse Embryonic cells (MEF) were grown in DMEM supplemented with 0.2% beta-mercaptoethanol and 1% NeAA (Non-Essentials Amino-Acids). Primary foreskin fibroblasts transfected with the h-TERT gene (FFT) were grown in RPMI. All cells were supplemented with 10% fetal bovine serum, penicillin G and streptomycin sulfate.

Fluorescence-activated cell sorting and DNA extraction

Cells were washed twice with PBS, fixed with ethanol, stained with 5 propidium iodide, and incubated with 50 RNASE A. Then, the cells were sorted by their DNA content using the fluorescence-activated cell sorting (FACS) Vantage machine. DNA was extracted from S phase and G1 phase isolated cells using incubation with standard lysis buffer followed by Proteinase K treatment, phenol-chloroform extraction and ethanol precipitation. The resulting DNA was sonicated, cleaned using QIAGEN PCR purification kit and its concentration was measured using the Nano-Drop.

DNA labeling and array hybridization

250ng-1 DNA isolated from G1 or S phase cells was labeled with dUTP-cy3 or dUTP-cy5 respectively, using Agilent's CGH labeling protocol (www.embl-heidelberg.de/courses/Agilent05/CGH-Protocol.pdf). Pairs of samples were co-hybridized on Agilent custom design mouse or human microarrays according to Agilent's hybridization protocol. The arrays were scanned using an Agilent scanner and raw data was analyzed using Agilent's feature extraction software. We have optimized the protocol to yield high quality data using as little as 250ng of input DNA (Figure S13). The ToR of each cell type was measured using two replicates (Figure S14).

Microarray design

Arrays were designed using Agilent's website “eArray”. The mouse array covered the entire mouse genome with an average spacing of 40Kb and chromosome 19 with an average spacing of 1Kb (Agilent Microarray Design Identification (AMADID) #018925). The human array covered the entire human genome with an average spacing of 38Kb and all the sequenced part of chromosome 22 with an average spacing of 1Kb (AMADID #021214). All experiments were performed using Agilent's 2×105K CGH arrays. To allow simple comparison between human and mouse the data was binned to a 50Kb resolution, which resulted in ∼47,000 mouse bins and ∼53,000 human bins.

Smoothing ToR data

To reduce noise on the single probe level (which is caused by dye bias [37]) we smoothed out the raw ToR data. We define the smooth ToR value of bin to be:window size of . Throughout the paper we use the smooth ToR values. For the breakpoint analysis we define the (smooth) right-sided ToR to be:

, with (left-sided ToR is defined in a similar fashion).

The confidence interval around the ToR profiles is the standard deviation of the 2W+1 values that were averaged when computing the smooth ToR profiles.

Comparison of ToR and sequence conservation

For each 50Kb bin we defined the ToR conservation to be the difference between human fibroblast ToR and mouse fibroblast ToR. We defined the sequence conservation of the bin to be the percentage of conserved nucleotides between human and mouse (using maf files). Sequence conservation and ToR conservation are not correlated (Spearman ).

G+C analysis

We used the Human March 2006 Assembly (hg18) and the Mouse February 2006 Assembly (mm8) for human and mouse G+C content computation. To generate the G+C to ToR trend of Figure 2A we divided the G+C content spectrum (computed on windows with a width of 50Kb) between 0.2 to 0.7 into 50 equal sized segments [0.2, 0.21], [0.21, 0.22], …, [0.69,0.7], and computed the average ToR of each segment, which we call the GC-predicted ToR (). We defined the residual ToR to be: . The residual ToR is “G+C normalized” in the sense that it has no significant correlation with the G+C content.

Genomic mappings

We used the “liftOver” tool of UCSC [25][28] to project the ToR, the residual ToR and the regional G+C content of mouse onto the human genome. For each 50Kb mouse bin we defined a window of 20Kb centered on the middle of the bin, and attempted to project that window onto the human genome (with liftOver), requiring at least a 30% match, in order to obtain a high quality mapping. If a window succeeded projection we associated the original bin value to middle of the projected window (on human coordinates). In this manner we succeeded to project ∼30,000 bins (62%) onto the human genome, where we aligned them to the human 50Kb bins to allow easy comparison. We discarded all bins that had missing human data, leaving us with ∼27,000 bins. The result is a comparative ToR map that contains four aligned ToR profiles (FFT, MOLT4, MEF, L1210).

Spatial clustering

The spatial clustering algorithm [31] works in an unsupervised fashion to identify contiguous genomic regions with similar trends of ToR in the two species and two tissues. We used as input for the algorithm the 4 ToR profiles of the ToR map. We normalized the profiles such that all had the same mean and standard deviation. We learned the most likely parameters of a star shaped hidden Markov model, with one central hidden non-emitting state, and N emitting states (the petals of the star), for N = 4,5,…,10. We noticed 4 states that appear in a robust manner for all N values, and therefore focused on the N = 4 model (data not shown).

Comparison of ToR with various genomic features

We measured several genomic features for each of the ToR spatial clusters (Figure S9). Gene expression of FFT and Molt-4 were downloaded from UCSC [25]. Expression decreases with ToR, as previously shown. The Spearman correlation between the ToR difference and the gene expression difference (between FFT and Molt-4) is 0.11 (with a very small P value). ToR is correlated with distance to telomeres (Figure S9). We validated the known correlations between ToR and transcription density (amount of transcribed sequence, according to RefSeq genes), exon density, number of transcription start sites (in bins of 50Kb), amount of lamina interaction, and number of gene.

Hi-C analysis

For the Hi-C analysis we represent ToR by the Molt-4 ToR track (Human lymphoblasts). We downloaded the Hi-C dataset [15] which is composed of ∼60M interacting pairs. We discarded interactions between loci without ToR data, which left us with ∼46M interactions; about half of them were inter-chromosomal and half were intra- chromosomal. We split the genome into 4 equal parts according to the FFT ToR quartiles (E:early, EM:early-medium, LM:late-medium, L:late). To generate Figure 3B, we counted for each possible pair of ToR groups the number of inter-chromosomal interactions. We then randomly paired all the loci that participated in an interaction to generate a background simulated interaction map, and computed for each pair a random count with its standard deviation. The ratio value for each pair is defined to be the number of counted interactions divided by the random count, and in the figure we display the log10 ratio. To generate Figure 3C we counted for each ToR group the number of intra-chromosomal close interactions (<500K) and far interactions (>500K). It should be noted that although the original Hi-C paper limits its analysis to 1Mb resolution, we were able to infer insights with higher resolution (<500Kb) since we were interested in four ToR categories, instead of in individual loci.

Breakpoints

Using the algorithm of Ma et al. we identified 1382 homologous segments that are free of chromosome fissions or fusions as well as inversions or translocations larger than 50Kb. These long segments cover 92% of the human genome. In the phylogenetic tree of Figure 4A, we used only simple events, which are events that we could assign with high confidence to a unique branch of the phylotree (Figure S15, example in Figure 4B). Each event has a set of posterior species (leaves of the tree rooted below branch) and a set of prior species (leaves of tree rooted above branch). The prior distance of an event is the minimal distance between the two fused points among all prior species. In Table S1 we specify all events identified in this manner. We tagged all fusions with a prior distance greater than 5Mb as distal events, and tagged all other fusions as close-range events.

We focused on distal murine lineage events (events in the branches leading from the human-mouse ancestor to mouse). We computed ToR divergence on both sides of each fusion site. For each fusion, we defined the late-side domain to be the domain with the later human ToR (associated with ), and the early-side domain to be the domain with the earlier human ToR (associated with ). We defined the late divergence and early divergence to be and , respectively. These two values reflect how much the mouse ToR has diverged from the ancestral ToR over evolution. For example, a positive late divergence () implies that after fusion the ToR of the murine domain advanced in time. In Figure 4E we show in a scatter plot late divergence versus early divergence. To generate Figure 5C we counted the number of Hi-C interactions in a window of 1MB between each pair distal simple fusion events in the mouse lineage. As control we shuffled all pairs, using only random pairs that reside on different chromosomes. Using the 75% percentile as a threshold, we tagged all pairs that were in the top quartile as interacting pairs and tagged all other pairs as non-interacting pairs. The original breakpoint pairs were enriched in the interacting group (1.6 enrichment hyper geometric P<0.01) when compared to the shuffled control pseudo breakpoints. We show the tagging threshold as a dashed line in Figure 5C.

URL and accession numbers

ToR data was deposited in the GEO database, accession GSE17236.

Supporting Information

Attachment 1

Attachment 2

Attachment 3

Attachment 4

Attachment 5

Attachment 6

Attachment 7

Attachment 8

Attachment 9

Attachment 10

Attachment 11

Attachment 12

Attachment 13

Attachment 14

Attachment 15

Attachment 16


Zdroje

1. PaulerFM

SloaneMA

HuangR

ReghaK

KoernerMV

2009 H3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome. Genome Res 19 221 233

2. WenB

WuH

ShinkaiY

IrizarryRA

FeinbergAP

2009 Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nat Genet 41 246 250

3. MikkelsenTS

KuM

JaffeDB

IssacB

LiebermanE

2007 Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448 553 560

4. BernsteinBE

MikkelsenTS

XieX

KamalM

HuebertDJ

2006 A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125 315 326

5. GilbertN

BoyleS

FieglerH

WoodfineK

CarterNP

2004 Chromatin architecture of the human genome: gene-rich domains are enriched in open chromatin fibers. Cell 118 555 566

6. GuelenL

PagieL

BrassetE

MeulemanW

FazaMB

2008 Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453 948 951

7. Farkash-AmarS

LipsonD

PoltenA

GorenA

HelmstetterC

2008 Global organization of replication time zones of the mouse genome. Genome Res

8. HirataniI

RybaT

ItohM

YokochiT

SchwaigerM

2008 Global Reorganization of Replication Domains During Embryonic Stem Cell Differentiation. PLoS Biol 6 e245 doi:10.1371/journal.pbio.0060245

9. JeonY

BekiranovS

KarnaniN

KapranovP

GhoshS

2005 Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102 6419 6424

10. MacAlpineDM

RodriguezHK

BellSP

2004 Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18 3094 3105

11. SchubelerD

ScalzoD

KooperbergC

van SteenselB

DelrowJ

2002 Genome-wide DNA replication profile for Drosophila melanogaster: a link between transcription and replication timing. Nat Genet 32 438 442

12. WhiteEJ

EmanuelssonO

ScalzoD

RoyceT

KosakS

2004 DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states. Proc Natl Acad Sci U S A 101 17771 17776

13. WoodfineK

BeareDM

IchimuraK

DebernardiS

MungallAJ

2005 Replication timing of human chromosome 6. Cell Cycle 4 172 176

14. WoodfineK

FieglerH

BeareDM

CollinsJE

McCannOT

2004 Replication timing of the human genome. Hum Mol Genet 13 191 202

15. Lieberman-AidenE

van BerkumNL

WilliamsL

ImakaevM

RagoczyT

2009 Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326 289 293

16. HurstLD

PalC

LercherMJ

2004 The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5 299 310

17. SankoffD

2003 Rearrangements and chromosomal evolution. Curr Opin Genet Dev 13 583 587

18. BatadaNN

HurstLD

2007 Evolution of chromosome organization driven by selection for reduced gene expression noise. Nat Genet 39 945 949

19. AlekseyevMA

PevznerPA

2007 Are there rearrangement hotspots in the human genome? PLoS Comput Biol 3 e209 doi:10.1371/journal.pcbi.0030209

20. SankoffD

TrinhP

2005 Chromosomal breakpoint reuse in genome sequence rearrangement. J Comput Biol 12 812 821

21. HirataniI

RybaT

ItohM

RathjenJ

KulikM

2010 Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res 20 155 169

22. DespratR

Thierry-MiegD

LaillerN

LajugieJ

SchildkrautC

2009 Predictable dynamic program of timing of DNA replication in human cells. Genome Res 19 2288 2299

23. HansenRS

ThomasS

SandstromR

CanfieldTK

ThurmanRE

2010 Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 107 139 144

24. Farkash-AmarS

SimonI

2010 Genome-wide analysis of the replication program in mammals. Chromosome Res 18 115 125

25. KuhnRM

KarolchikD

ZweigAS

WangT

SmithKE

2009 The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 37 D755 761

26. ChiaromonteF

YapVB

MillerW

2002 Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 115 126

27. KentWJ

BaertschR

HinrichsA

MillerW

HausslerD

2003 Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A 100 11484 11489

28. SchwartzS

KentWJ

SmitA

ZhangZ

BaertschR

2003 Human-mouse alignments with BLASTZ. Genome Res 13 103 107

29. WaterstonRH

Lindblad-TohK

BirneyE

RogersJ

AbrilJF

2002 Initial sequencing and comparative analysis of the mouse genome. Nature 420 520 562

30. RybaT

HirataniI

LuJ

ItohM

KulikM

2010 Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res

31. JaschekS

TanayA

2009 Spatial Clustering of Multivariate Genomic and Epigenomic Information. Recomb

32. PinkCJ

HurstLD

2010 Timing of replication is a determinant of neutral substitution rates but does not explain slow Y chromosome evolution in rodents. Mol Biol Evol 27 1077 1086

33. ChenCL

RappaillesA

DuquenneL

HuvetM

GuilbaudG

2010 Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res 20 447 457

34. MaJ

ZhangL

SuhBB

RaneyBJ

BurhansRC

2006 Reconstructing contiguous regions of an ancestral genome. Genome Res 16 1557 1565

35. ManiRS

TomlinsSA

CallahanK

GhoshA

NyatiMK

2009 Induced chromosomal proximity and gene fusions in prostate cancer. Science 326 1230

36. KarubeT

WatanabeS

1988 Analysis of the chromosomal DNA replication pattern using the bromodeoxyuridine labeling method. Cancer Res 48 219 222

37. HarveyBJ

PerezC

LevitusM

2009 DNA sequence-dependent enhancement of Cy3 fluorescence. Photochem Photobiol Sci 8 1105 1110

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2010 Číslo 7
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#