TEATIME is a framework for estimating evolutionary parameters using single-timepoint WES or WGS data. In this workflow, we demonstrate the complete process using an example from MOBSTER dataset, including data preprocessing, visualization, clustering, and preparation of inputs for TEATIME analysis.
This example contains a simulated single-timepoint tumor composed of a root clone (clone0) and one expanding subclone (clone1). It includes the overall cellular composition, evolutionary parameters, and simulated sequencing reads.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
simulations<-readRDS("./exampledata.rds")
counts <- simulations$cell_counts["counts", ]
clone0_fraction <- as.numeric(counts["clone0"] / counts["total"])
clone1_fraction <- as.numeric(counts["clone1"] / counts["total"])
cat("Clone0 fraction (within tumor):", round(clone0_fraction * 100, 1), "%\n")
## Clone0 fraction (within tumor): 55.7 %
cat("Clone1 fraction (within tumor):", round(clone1_fraction * 100, 1), "%\n")
## Clone1 fraction (within tumor): 44.3 %
The two clones are roughly balanced (~56% vs ~44%) in this simulation.
Next, we examine some evolutionary parameters that describe the tumor’s growth dynamics.For example, the mutation rate (μ) and the fitness (s).
evo.parameter <- simulations$clone_parameters
mu <- evo.parameter$mutation_rate[1]
s <- (evo.parameter$birth_rate[2] - evo.parameter$death_rates[2]) / (evo.parameter$birth_rate[1] - evo.parameter$death_rates[1])-1
cat("Mutation rate:", mu, "\n")
## Mutation rate: 16
cat("s (clone1 vs clone0):", round(s, 2), "\n")
## s (clone1 vs clone0): 0.5
Next, we examine the sequencing data, which provide the variant allele frequencies (VAFs) used to infer evolutionary parameters. Since this simulation assumes a diploid genome and 100% purity, no additional filtering is required. In real datasets, it is generally recommended to retain only mutations located in copy-neutral (CN = 2) regions because of TEATIME’s assumption.
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
data = simulations$sequencing
head(data) %>% print
## # A tibble: 6 × 6
## VAF ALT DP CLONE TRUE_VAF TRUE_CLUSTER
## <dbl> <int> <int> <int> <dbl> <fct>
## 1 0.0676 5 74 1 0.0278 19
## 2 0.0526 6 114 1 0.0226 1c
## 3 0.0530 7 132 1 0.0568 15
## 4 0.0609 7 115 1 0.0568 15
## 5 0.0568 5 88 1 0.0568 15
## 6 0.0588 6 102 1 0.0568 15
data <- as.data.table(data)
data[, CN := 2]
TEATIME performs inference based on clustering results.
We have two options:
1. Directly input the sequencing data, or
2. Run MAGOS for clustering before input.
Other clustering methods can also be used — see option 3 on the TEATIME GitHub page for details.
data[, t_ref_count := DP - ALT]
data[, t_alt_count := ALT]
input.data <- data[, .(t_ref_count, t_alt_count)]
setnames(input.data, c("t_ref_count", "t_alt_count"), c("REF", "ALT"))
head(input.data)
## REF ALT
## <int> <int>
## 1: 69 5
## 2: 108 6
## 3: 125 7
## 4: 108 7
## 5: 83 5
## 6: 96 6
library(MAGOS)
run <- mag.single.run(input.data = input.data,fold=T)
saveRDS(run,"MAGOS.rds") #save it for TEATIME
input.file <- readRDS("MAGOS.rds")
plot(input.file$results$vaf.1, input.file$results$depth.1, col= input.file$results$colors, xlab= 'VAF', ylab='Depth')
As shown, MAGOS identified three clusters, indicating the presence of subclonal. This provides TEATIME with the necessary input to estimate evolutionary parameters.
After obtaining the clustering results, they can be passed directly
to TEATIME without additional processing.
Users may optionally specify the sequencing depth (a rounded integer),
but this is not required — depth is optional and can be
omitted.
library(TEATIME)
samplename <- "test" # unique id for each analysis
teapath <- "./" # Path to save TEATIME output (current directory)
output.folder <- file.path(teapath, samplename)
dir.create(output.folder, showWarnings = FALSE)
output.prefix <- "TEATIME" # Prefix for output files
depth=round(mean(input.file$results$depth.1))
### Run TEATIME ----
# Optional: check average sequencing depth (not required)
depth <- round(mean(input.file$results$depth.1))
# Run TEATIME using the MAGOS clustering result
TEATIME.result <- TEATIME.run(
input.file,
magos_object = TRUE,
steps = 1:5,
output.folder = output.folder,
output.prefix=output.prefix,
id = samplename,
depth = depth # Optional; TEATIME can run without this
)
In most cases, TEATIME generates an output file
named <output.prefix>.final.txt in the folder.
TEATIME.result <- read.table("TEATIME.final.txt",header=T)
print(TEATIME.result)
## name mu s t1 tend p
## 1 test 17.9 0.440502 10 31.31972 0.4062049
The TEATIME output file contains several key
inferred parameters: μ mutation rate,
s relative growth advantage of the subclone,
p cellular frequency of the subclone,
t1 relative emergence time of
subclone,tend relative sampling time. We can now
compare the estimated parameters with the true values from the
simulation.
For example, the relative error of the mutation rate, fitness, and
subclone frequency are calculated as follows:
mu_error <- (TEATIME.result$mu - mu) / mu
s_error <- (TEATIME.result$s - s) / s
p_error <- (TEATIME.result$p - clone1_fraction) / clone1_fraction
cat(sprintf("Relative error of mutation rate: %.2f%%\n", mu_error * 100))
## Relative error of mutation rate: 11.87%
cat(sprintf("Relative error of fitness: %.2f%%\n", s_error * 100))
## Relative error of fitness: -11.90%
cat(sprintf("Relative error of frequency: %.2f%%\n", p_error * 100))
## Relative error of frequency: -8.30%
Finally, if users prefer not to perform clustering by themselves, they can directly input the sequencing data into TEATIME.
input.data <- data[, .(t_ref_count, t_alt_count,CN)]
setnames(input.data, c("t_ref_count", "t_alt_count","CN"), c("REF", "ALT","CN"))
TEATIME.result <- TEATIME.run(input.data,steps = 0:5,output.folder = output.folder,id = samplename)
If users wish to customize the clustering process (tools such as PyClone), this is also supported — for example
dt <- as.data.table(data)
#pseudo-clustering using quantile bins
K <- 3
brks <- quantile(dt$VAF, probs = seq(0, 1, length.out = K + 1), na.rm = TRUE)
pseudo_clusters <- paste0("c", cut(dt$VAF, breaks = brks, include.lowest = TRUE, labels = FALSE))
# Assemble TEATIME input for Option 3:
# Column names must be exactly vaf.1, depth.1, colors
input.data <- data.frame(
vaf.1 = dt$VAF,
depth.1 = dt$DP,
colors = pseudo_clusters,
check.names = FALSE
)
TEATIME.run(input.data,magos_object = FALSE,steps=1:5,output.folder = output.folder,id = samplename)
When mutations in diploid regions are too few to support analysis, users may,as a last resort,consider using tools such as PureCN to obtain adjusted VAFs. The mean sequencing depth can then be used to simulate reference and alternate read counts as TEATIME inputs. However, because adjusted VAFs can vary across methods and may be inaccurate in some cases, this approach should be regarded as a fallback option only, and the resulting TEATIME estimates should be interpreted with caution.
Caravagna, G., Heide, T., Williams, M.J. et al. Subclonal reconstruction of tumors by using machine learning and population genetics. Nat Genet 52, 898–907 (2020). https://doi.org/10.1038/s41588-020-0675-5
Navid Ahmadinejad, Shayna Troftgruben, Junwen Wang, Pramod B Chandrashekar, Valentin Dinu, Carlo Maley, Li Liu, Accurate Identification of Subclones in Tumor Genomes, Molecular Biology and Evolution, Volume 39, Issue 7, July 2022, msac136, https://doi.org/10.1093/molbev/msac136
Riester, M., Singh, A.P., Brannon, A.R. et al. PureCN: copy number calling and SNV classification using targeted short read sequencing. Source Code Biol Med 11, 13 (2016). https://doi.org/10.1186/s13029-016-0060-z
Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883