TEATIME Example Workflow

TEATIME WORKFLOW

TEATIME is a framework for estimating evolutionary parameters using single-timepoint WES or WGS data. In this workflow, we demonstrate the complete process using an example from MOBSTER dataset, including data preprocessing, visualization, clustering, and preparation of inputs for TEATIME analysis.

Example case

This example contains a simulated single-timepoint tumor composed of a root clone (clone0) and one expanding subclone (clone1). It includes the overall cellular composition, evolutionary parameters, and simulated sequencing reads.

Clonal composition

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

simulations<-readRDS("./exampledata.rds")
counts <- simulations$cell_counts["counts", ]
clone0_fraction <- as.numeric(counts["clone0"] / counts["total"])
clone1_fraction <- as.numeric(counts["clone1"] / counts["total"])

cat("Clone0 fraction (within tumor):", round(clone0_fraction * 100, 1), "%\n")

## Clone0 fraction (within tumor): 55.7 %

cat("Clone1 fraction (within tumor):", round(clone1_fraction * 100, 1), "%\n")

## Clone1 fraction (within tumor): 44.3 %

The two clones are roughly balanced (~56% vs ~44%) in this simulation.

Evolutionary Parameters

Next, we examine some evolutionary parameters that describe the tumor’s growth dynamics.For example, the mutation rate (μ) and the fitness (s).

evo.parameter <- simulations$clone_parameters
mu <- evo.parameter$mutation_rate[1]
s <- (evo.parameter$birth_rate[2] - evo.parameter$death_rates[2]) / (evo.parameter$birth_rate[1] - evo.parameter$death_rates[1])-1

cat("Mutation rate:", mu, "\n")

## Mutation rate: 16

cat("s (clone1 vs clone0):", round(s, 2), "\n")

## s (clone1 vs clone0): 0.5

Sequencing Data

Next, we examine the sequencing data, which provide the variant allele frequencies (VAFs) used to infer evolutionary parameters. Since this simulation assumes a diploid genome and 100% purity, no additional filtering is required. In real datasets, it is generally recommended to retain only mutations located in copy-neutral (CN = 2) regions because of TEATIME’s assumption.

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

data = simulations$sequencing
head(data) %>% print

## # A tibble: 6 × 6
##      VAF   ALT    DP CLONE TRUE_VAF TRUE_CLUSTER
##    <dbl> <int> <int> <int>    <dbl> <fct>       
## 1 0.0676     5    74     1   0.0278 19          
## 2 0.0526     6   114     1   0.0226 1c          
## 3 0.0530     7   132     1   0.0568 15          
## 4 0.0609     7   115     1   0.0568 15          
## 5 0.0568     5    88     1   0.0568 15          
## 6 0.0588     6   102     1   0.0568 15

data <- as.data.table(data)
data[, CN := 2]

Analysis

Input options

TEATIME performs inference based on clustering results.
We have two options:
1. Directly input the sequencing data, or
2. Run MAGOS for clustering before input.

Other clustering methods can also be used — see option 3 on the TEATIME GitHub page for details.

Process seq data (Option 2)

data[, t_ref_count := DP - ALT]
data[, t_alt_count := ALT]

input.data <- data[, .(t_ref_count, t_alt_count)]
setnames(input.data, c("t_ref_count", "t_alt_count"), c("REF", "ALT"))
  

head(input.data)

##      REF   ALT
##    <int> <int>
## 1:    69     5
## 2:   108     6
## 3:   125     7
## 4:   108     7
## 5:    83     5
## 6:    96     6

Run clustering

library(MAGOS)
run <- mag.single.run(input.data = input.data,fold=T)
saveRDS(run,"MAGOS.rds") #save it for TEATIME

input.file <- readRDS("MAGOS.rds")
plot(input.file$results$vaf.1, input.file$results$depth.1, col= input.file$results$colors, xlab= 'VAF', ylab='Depth')

As shown, MAGOS identified three clusters, indicating the presence of subclonal. This provides TEATIME with the necessary input to estimate evolutionary parameters.

Run TEATIME

After obtaining the clustering results, they can be passed directly to TEATIME without additional processing.
Users may optionally specify the sequencing depth (a rounded integer), but this is not required — depth is optional and can be omitted.

library(TEATIME)

samplename <- "test"        # unique id for each analysis
teapath <- "./"             # Path to save TEATIME output (current directory)
output.folder <- file.path(teapath, samplename) 
dir.create(output.folder, showWarnings = FALSE)
output.prefix <- "TEATIME"  # Prefix for output files

depth=round(mean(input.file$results$depth.1))
   
### Run TEATIME ----

# Optional: check average sequencing depth (not required)
depth <- round(mean(input.file$results$depth.1))  

# Run TEATIME using the MAGOS clustering result
TEATIME.result <- TEATIME.run(
  input.file,
  magos_object = TRUE,        
  steps = 1:5,                
  output.folder = output.folder,
  output.prefix=output.prefix,
  id = samplename,
  depth = depth                 # Optional; TEATIME can run without this
)

In most cases, TEATIME generates an output file named <output.prefix>.final.txt in the folder.

TEATIME.result <- read.table("TEATIME.final.txt",header=T)
print(TEATIME.result)

##   name   mu        s t1     tend         p
## 1 test 17.9 0.440502 10 31.31972 0.4062049

The TEATIME output file contains several key inferred parameters: μ mutation rate, s relative growth advantage of the subclone, p cellular frequency of the subclone, t1 relative emergence time of subclone,tend relative sampling time. We can now compare the estimated parameters with the true values from the simulation.
For example, the relative error of the mutation rate, fitness, and subclone frequency are calculated as follows:

mu_error <- (TEATIME.result$mu - mu) / mu
s_error  <- (TEATIME.result$s - s) / s
p_error  <- (TEATIME.result$p - clone1_fraction) / clone1_fraction
cat(sprintf("Relative error of mutation rate: %.2f%%\n", mu_error * 100))

## Relative error of mutation rate: 11.87%

cat(sprintf("Relative error of fitness: %.2f%%\n", s_error * 100))

## Relative error of fitness: -11.90%

cat(sprintf("Relative error of frequency: %.2f%%\n", p_error * 100))

## Relative error of frequency: -8.30%

Alternative input options

Finally, if users prefer not to perform clustering by themselves, they can directly input the sequencing data into TEATIME.

input.data <- data[, .(t_ref_count, t_alt_count,CN)]
setnames(input.data, c("t_ref_count", "t_alt_count","CN"), c("REF", "ALT","CN"))
TEATIME.result <- TEATIME.run(input.data,steps = 0:5,output.folder = output.folder,id = samplename)

If users wish to customize the clustering process (tools such as PyClone), this is also supported — for example

dt <- as.data.table(data)
#pseudo-clustering using quantile bins
K <- 3
brks <- quantile(dt$VAF, probs = seq(0, 1, length.out = K + 1), na.rm = TRUE)
pseudo_clusters <- paste0("c", cut(dt$VAF, breaks = brks, include.lowest = TRUE, labels = FALSE))

# Assemble TEATIME input for Option 3:
# Column names must be exactly vaf.1, depth.1, colors
input.data <- data.frame(
  vaf.1   = dt$VAF,
  depth.1 = dt$DP,
  colors  = pseudo_clusters,
  check.names = FALSE
)

TEATIME.run(input.data,magos_object = FALSE,steps=1:5,output.folder = output.folder,id = samplename)

When mutations in diploid regions are too few to support analysis, users may,as a last resort,consider using tools such as PureCN to obtain adjusted VAFs. The mean sequencing depth can then be used to simulate reference and alternate read counts as TEATIME inputs. However, because adjusted VAFs can vary across methods and may be inaccurate in some cases, this approach should be regarded as a fallback option only, and the resulting TEATIME estimates should be interpreted with caution.

Reference

Caravagna, G., Heide, T., Williams, M.J. et al. Subclonal reconstruction of tumors by using machine learning and population genetics. Nat Genet 52, 898–907 (2020). https://doi.org/10.1038/s41588-020-0675-5

Navid Ahmadinejad, Shayna Troftgruben, Junwen Wang, Pramod B Chandrashekar, Valentin Dinu, Carlo Maley, Li Liu, Accurate Identification of Subclones in Tumor Genomes, Molecular Biology and Evolution, Volume 39, Issue 7, July 2022, msac136, https://doi.org/10.1093/molbev/msac136

Riester, M., Singh, A.P., Brannon, A.R. et al. PureCN: copy number calling and SNV classification using targeted short read sequencing. Source Code Biol Med 11, 13 (2016). https://doi.org/10.1186/s13029-016-0060-z

Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883