Author(s): Zongcheng Li; Yasi Zhang
Reviewer(s): Ying Ge
Date: 2025-05-20
If you use this code in your work or research, we kindly request that you cite our publication:
Xiaofan Lu, et al. (2025). FigureYa: A Standardized Visualization Framework for Enhancing Biomedical Data Interpretation and Research Efficiency. iMetaMed. https://doi.org/10.1002/imm3.70005
这个图信息量很大,非常高大上,热图和气泡图合用,展示差异富集信号通路信息。
This diagram is packed with information and looks very high-end. The combination of a heatmap and bubble chart effectively displays differential enrichment signaling pathways.
出自:https://linkinghub.elsevier.com/retrieve/pii/S0092867420300568
图1. 猕猴卵巢单细胞RNA测序分析揭示具有独特转录特征的卵巢细胞亚群。 (H) 左图:热图展示各细胞类型中前50个特异性表达基因的表达特征,每个基因的值为行标准化Z分数;右图:代表性GO条目。
Source: https://linkinghub.elsevier.com/retrieve/pii/S0092867420300568
Figure 1. Distinct Ovarian Cell Subpopulations with Transcriptional Signatures Determined by Single-Cell RNA-Seq Analysis of Monkey Ovaries. (H) Left: heatmap showing expression signatures of top 50 specifically expressed genes in each cell type; the value for each gene is row-scaled Z score. Right: representative GO terms.
单细胞RNA-seq的表达谱和基因功能富集分析结果的展示。
左侧表达谱热图跟右侧富集分析结果对应,更直观。
Single-cell RNA-seq expression profiles and gene functional enrichment analysis results.
The expression heatmap on the left corresponds to the enrichment analysis results on the right, providing a more intuitive visualization.
source("install_dependencies.R")
## Starting R package installation...
## ===========================================
##
## Installing CRAN packages...
## Package already installed: Hmisc
## Package already installed: RColorBrewer
## Package already installed: Seurat
## Package already installed: aplot
## Package already installed: batchelor
## Package already installed: magrittr
## Package already installed: patchwork
## Package already installed: pheatmap
## Package already installed: tidyverse
##
## ===========================================
## Package installation completed!
## You can now run your R scripts in this directory.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(magrittr)
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
library(Seurat) # v3.1.5
## Loading required package: SeuratObject
## Loading required package: sp
## 'SeuratObject' was built with package 'Matrix' 1.6.5 but the current
## version is 1.7.3; it is recomended that you reinstall 'SeuratObject' as
## the ABI for 'Matrix' may have changed
##
## Attaching package: 'SeuratObject'
##
## The following objects are masked from 'package:base':
##
## intersect, t
library(pheatmap)
library(RColorBrewer)
library(aplot)
library(patchwork)
# 显示英文报错信息
# Show English error messages
Sys.setenv(LANGUAGE = "en")
# 禁止chr转成factor
# Prevent character-to-factor conversion
options(stringsAsFactors = FALSE)
UMI count,从NCBIGSE130664下载:GSE130664_merge_UMI_count.txt.gz文件。
UMI counts: Download the
GSE130664_merge_UMI_count.txt.gz file from NCBIGSE130664
metadata: 从例文的Supplementary
Tables获得:1-s2.0-S0092867420300568-mmc1.xlsx
metadata: Obtain from the supplementary tables of the example paper:
1-s2.0-S0092867420300568-mmc1.xlsx
# 读取UMI计数矩阵
# Read UMI count matrix
umi <- read.table(file = gzfile("GSE130664_merge_UMI_count.txt.gz"), header = T, row.names = 1, sep = "\t")
# 读取质控指标(Excel第二工作表)
# Read QC metrics (Excel sheet 2)
qc <- readxl::read_excel("1-s2.0-S0092867420300568-mmc1.xlsx", sheet = 2)
# 读取元数据(Excel第三工作表)并设置细胞ID为行名
# Read metadata (Excel sheet 3) with cell IDs as rownames
meta <- readxl::read_excel("1-s2.0-S0092867420300568-mmc1.xlsx", 3) %>%
column_to_rownames("cell")
看方法学部分:QUANTIFICATION AND STATISTICAL ANALYSIS -> Single-Cell RNA-Seq Data Processing
See Methods: QUANTIFICATION AND STATISTICAL ANALYSIS -> Single-Cell RNA-Seq Data Processing
# 细胞质量控制
# Cell Quality Control
cells <- qc %>%
filter(`Mapping rate` >= 0.2 &
`Gene number` >= 700 &
UMI >= 3000) %>%
pull(Rename)
# 创建Seurat对象
# Create Seurat Object
sc <- CreateSeuratObject(counts = umi[,cells], meta.data = meta)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning: Data is of class data.frame. Coercing to dgCMatrix.
# 表达量转换
# Expression Transformation
counts_matrix <- LayerData(sc, layer = "counts")
normalized_data <- apply(counts_matrix, 2, function(x){
log2(10^5 * x / sum(x) + 1)
})
sc <- SetAssayData(sc, layer = "data", new.data = normalized_data)
# 细胞过滤
# Cell Filtering
sc <- sc[,sc$cluster != "other"]
# 细胞群重命名
# Cluster Renaming
sc$cluster_short <- factor(
plyr::mapvalues(sc$cluster,
c("Oocyte", "Natural killer T cell", "Macrophage",
"Granulosa cell", "Endothelial cell",
"Smooth muscle cell", "Stromal cell"),
c("OO", "NKT", "M", "GC", "EC", "SMC", "SC")),
levels = c("OO", "NKT", "M", "GC", "EC", "SMC", "SC"))
# 配色方案
# Color Scheme
cluster_colors <- setNames(brewer.pal(7, "Set1"), levels(sc$cluster_short))
# 数据保存
# Data Saving
#save(sc, cluster_colors, file = "sc.seurat.Rdata")
# 表达矩阵导出
# Expression Matrix Export
#write.csv(sc@assays$RNA@data, "easy_input_expr.csv", quote = F)
看方法学部分:
QUANTIFICATION AND STATISTICAL ANALYSIS -> Identification of Cell
Types and Cell Type-Specific Markers
原文:我们使用标准曲线下面积(AUC)分类器,通过R包Seurat(Satija等,2015)中的FindAllMarkers函数鉴定细胞类型特异性标记(不同细胞类型间的差异表达基因)。只有当log2转换后的TPM平均差异大于0.5、对应功效值大于0.25且表达细胞比例(TPM ≥ 0)大于30%时,才被选为标记基因。
See Methods:
QUANTIFICATION AND STATISTICAL ANALYSIS -> Identification of Cell
Types and Cell Type-Specific Markers
Original text: We used a standard area under the curve (AUC) classifier to identify cell type-specific markers (DEGs among different cell types) with the function FindAllMarkers in the R package Seurat (Satija et al., 2015). Markers were selected only if the average difference of the log2-transformed TPM was greater than 0.5, with a corresponding power value greater than 0.25 and a percentage of expressed cells (TPM s 0) greater than 30%.
#load(file = "sc.seurat.Rdata")
# 设置细胞标识
# Set Cell Identity
Idents(sc) <- "cluster_short"
# 差异表达分析
# Differential Expression Analysis
degs <- FindAllMarkers(sc, logfc.threshold = 0.5,
test.use = "roc",
return.thresh = 0.25,
min.pct = 0.3, only.pos = T)
## Calculating cluster OO
## Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
## Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Calculating cluster NKT
## Calculating cluster M
## Calculating cluster GC
## Calculating cluster EC
## Calculating cluster SMC
## Calculating cluster SC
# 注意:实际过滤时似乎只有logfc.threshold参数有效
# NOTE: seem like only parameter logfc.threshold is valid when do filter
# 筛选显著差异基因
# Filter Significant DEGs
degs_sig <- degs %>%
filter(pct.1 > 0.3 &
power > 0.25) %>%
filter(cluster != "other") %>%
arrange(cluster, -power)
# 结果输出
# Results Export
write.csv(degs_sig, file = "output_degs_sig.csv")
# 分群组输出基因列表
# Export Gene Lists by Cluster
for (clustername in levels(sc$cluster_short)){
write.table(rownames(degs_sig[degs_sig$cluster == clustername, ]), paste0("output_deg_", clustername, ".txt"), row.names = F, col.names = "Gene", quote = F)
}
为重复原文,这里也用ToppGene做富集分析(一个简单易用的在线工具)。
原文:GO analysis of cell type-specific markers was performed with ToppGene (Chen et al., 2009). We selected GO terms representing the function of each cell type with P value < 0.05 among top 30 terms (Figure 1H).
用除了Excel以外的文本编辑器,每次打开一个output_deg_*.txt文件,复制其中的基因名,粘贴到Enrichment Gene Set框,Background Gene Set框可以留空。
To replicate the original paper, we also used ToppGene for enrichment analysis (an easy-to-use webserver).
Original text: GO analysis of cell type-specific markers was performed with ToppGene (Chen et al., 2009). We selected GO terms representing the function of each cell type with P value < 0.05 among top 30 terms (Figure 1H).
Use a text editor other than Excel to open each
output_deg_*.txt file, copy the gene names, and paste them
into the Enrichment Gene Set box. The
Background Gene Set box can be left empty.
例文选了GO的MF、BP和CC,你也可以尝试其他注释
The example paper selected GO terms for MF, BP, and CC. You may also
try other annotations.
点击Download下载,把下载到的文件命名为ouput_enrich_[cluster的名字].txt,用于画dotplot。
例如:打开output_deg_OO.txt,复制基因名,用ToppGene富集分析获得的文件重命名为output_enrich_OO.txt。
Click Download to save the file, and rename it as
ouput_enrich_[cluster name].txt for dotplot
visualization.
For example: Open output_deg_OO.txt, copy the gene
names, use ToppGene for enrichment analysis, and rename the obtained
file as output_enrich_OO.txt.
所有cluster都按上述方法操作和命名,并保存在当前文件夹下。后面画图时会根据文件名批量读入富集分析结果文件。
另外,还可以用clusterprofiler等命令行式的工具,就可以批量做,更方便。后面画图用到的这几列在所有富集分析结果中都有:Name(term), p-value, Hit Count in Query List, Hit Count in Genome(背景)
Process and name all clusters according to the above method, and save them in the current folder. The enrichment analysis result files will be batch imported based on filenames during subsequent plotting.
Alternatively, command-line tools like clusterProfiler can be used for batch processing, which is more convenient. The following columns required for subsequent plotting are present in all enrichment analysis results: Name (term), p-value, Hit Count in Query List, Hit Count in Genome (background).
分别画热图和点图,然后拼起来。输出的pdf文件是矢量图,可以用Adobe Illustrator等矢量图编辑器进一步修饰图和文字。
Generate heatmaps and dot plots separately, then combine them. The output PDF files are vector graphics that can be further edited (e.g., adjusting figures and text) using vector graphic editors like Adobe Illustrator.
# 加载前面预处理好的scRNA-Seq数据,要用到表达矩阵、cluster及配色
#(load(file = "sc.seurat.Rdata"))
# 筛选热图用差异基因
# Select DEGs for Heatmap
degs_top50 <- degs_sig %>%
# filter(cluster!="other") %>%
group_by(cluster) %>%
top_n(50, power) %>%
top_n(50, avg_diff) %>%
arrange(cluster, -power)
# 获取标准化表达数据
# Get Normalized Expression Data
norm_data <- GetAssayData(sc, layer = "data")
# 基因存在性检查
# Gene Presence Check
missing_genes <- setdiff(degs_top50$gene, rownames(norm_data))
if (length(missing_genes) > 0) {
warning("The following genes are not present in the expression matrix and have been automatically excluded: ", paste(missing_genes, collapse = ", "))
degs_top50 <- degs_top50[degs_top50$gene %in% rownames(norm_data), ]
}
# 计算cluster平均表达量
# Calculate Cluster-wise Mean Expression
avgData <- norm_data[degs_top50$gene, ] %>%
apply(1, function(x) {
tapply(x, sc$cluster_short, mean)
}) %>%
t()
# 数据标准化处理
# Data Normalization
phData <- MinMax(scale(avgData), -2, 2)
rownames(phData) <- 1:nrow(phData)
# 绘制热图
# Generate Heatmap
phres <- pheatmap(
phData,
color = colorRampPalette(c("darkblue", "white", "red3"))(99),
scale = "row",
cluster_rows = F,
cluster_cols = T,
clustering_method = "complete",
show_rownames = F,
annotation_row = data.frame(cluster = degs_top50$cluster),
annotation_colors = list(cluster = cluster_colors))
原文:We selected GO terms representing the function of each cell type with P value < 0.05 among top 30 terms (Figure 1H).
原图每个cluster各画了三个GO
term。实际使用时,结合背景知识,把想展示的term写在go_used里。
Original text: We selected GO terms representing the function of each cell type with P value < 0.05 among top 30 terms (Figure 1H).
The original figure displayed three GO terms for each cluster. For
practical application, incorporate domain knowledge and specify the
terms you wish to present in the go_used list.
# 定义要展示的GO terms
# Define GO terms to display
go_used <- list(
OO = c("meiotic cell cycle", "gamete generation", "sexual reproduction"),
NKT = c("regulation of immune response", "regulation of immune system process", "T cell activation"),
M = c("immune effector process", "leukocyte activation", "inflammatory response"),
GC = c("anti-Mullerian hormone signaling pathway", "reproductive structure development", "reproductive system development"),
EC = c("angiogenesis", "blood vessel development", "vasculature development"),
SMC = c("muscle system process", "muscle contraction", "smooth muscle contraction"),
SC = c("extracellular matrix organization", "blood vessel development", "response to hormone")
)
# 统计各cluster差异基因数量
# Count DEGs per cluster
degs_n <- table(degs_sig$cluster)
# 批量读取富集结果
# Batch load enrichment results
lapply(levels(sc$cluster_short), function(i){
# 这里用文章附件里的富集分析结果画图
# Using supplementary data from paper
tmp <- readxl::read_excel("1-s2.0-S0092867420300568-mmc1.xlsx", paste0("GO_", i))
# 实际分析时使用以下代码读取本地结果
# For actual analysis use this to load local results:
#tmp <- read.table(paste0("output_enrich_", i, ".txt"), sep = "\t", header = T, check.names = F)
# 提取指定GO term的结果
# Extract results for specified GO terms
cbind(tmp[match(go_used[[i]], tmp$Name),
c("Name", "p-value", "Hit Count in Query List", "Hit Count in Genome")],
cluster = i, n_deg = degs_n[[i]])
}) %>% do.call(rbind, .) -> dotData
# 预览数据
# Preview data
head(dotData)
## Name p-value Hit Count in Query List
## 1 meiotic cell cycle 1.748e-14 74
## 2 gamete generation 3.358e-12 150
## 3 sexual reproduction 1.072e-11 175
## 4 regulation of immune response 3.910e-31 50
## 5 regulation of immune system process 5.583e-31 61
## 6 T cell activation 2.416e-22 33
## Hit Count in Genome cluster n_deg
## 1 237 OO 4261
## 2 692 OO 4261
## 3 857 OO 4261
## 4 899 NKT 212
## 5 1506 NKT 212
## 6 506 NKT 212
# 计算x轴范围
# Calculate x-axis limit
(xmax <- round(max(-log10(dotData$`p-value`))))
## [1] 30
# 绘制点图
# Create dot plot
dotplot <- ggplot(cbind(dotData, Order = nrow(dotData):1)) +
geom_point(mapping = aes(x = -log10(`p-value`),
y = Order,
size = `Hit Count in Query List`,
fill = `Hit Count in Query List`/n_deg),
shape = 21) +
scale_fill_gradientn(colours = c("grey", "gold", "red")) +
scale_y_continuous(position = "right",
breaks = 1:nrow(dotData),
labels = Hmisc::capitalize(rev(dotData$Name))) +
scale_x_continuous(breaks = seq(0, xmax, 10),
expand = expansion(mult = c(.15, .1))) +
labs(x = "-log10(P-value)", y = NULL) +
guides(size = guide_legend(title = "Gene count"),
fill = guide_colorbar(title = "GeneRatio")) +
#theme_classic() +
theme_bw() +
theme(panel.grid =element_blank())
dotplot
你可以使用Adobe Illustrator对以下图表进行修饰完善
You can go to Adobe Illustrator to polish the following figures
# 提取热图图形对象
# Extract heatmap graphical object
gh <- phres$gtable
# 转换ggplot点图为图形对象
# Convert ggplot dotplot to graphical object
gd <- ggplotGrob(dotplot)
# 组合图形
# Combine plots
combined_plot <- wrap_elements(gh) + wrap_elements(gd) +
plot_layout(widths = c(1.2, 1.6))
# 保存组合图
# Save combined plot
ggsave("scHeatmap.pdf", combined_plot, width = 12, height = 8)
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Asia/Shanghai
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.3.0 aplot_0.2.5 RColorBrewer_1.1-3 pheatmap_1.0.13
## [5] Seurat_5.3.0 SeuratObject_5.1.0 sp_2.2-0 magrittr_2.0.3
## [9] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [13] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1 tibble_3.3.0
## [17] ggplot2_3.5.2 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.17.1 jsonlite_2.0.0 spatstat.utils_3.1-5
## [4] farver_2.1.2 rmarkdown_2.29 ragg_1.4.0
## [7] fs_1.6.6 vctrs_0.6.5 ROCR_1.0-11
## [10] spatstat.explore_3.5-2 base64enc_0.1-3 htmltools_0.5.8.1
## [13] cellranger_1.1.0 Formula_1.2-5 gridGraphics_0.5-1
## [16] sass_0.4.10 sctransform_0.4.2 parallelly_1.45.0
## [19] KernSmooth_2.23-24 bslib_0.9.0 htmlwidgets_1.6.4
## [22] ica_1.0-3 plyr_1.8.9 plotly_4.10.4
## [25] zoo_1.8-14 cachem_1.1.0 igraph_2.0.3
## [28] mime_0.13 lifecycle_1.0.4 pkgconfig_2.0.3
## [31] Matrix_1.7-3 R6_2.6.1 fastmap_1.2.0
## [34] fitdistrplus_1.2-2 future_1.58.0 shiny_1.10.0
## [37] digest_0.6.37 colorspace_2.1-1 tensor_1.5
## [40] RSpectra_0.16-2 irlba_2.3.5.1 textshaping_1.0.1
## [43] Hmisc_5.2-3 labeling_0.4.3 progressr_0.15.1
## [46] spatstat.sparse_3.1-0 timechange_0.3.0 httr_1.4.7
## [49] polyclip_1.10-7 abind_1.4-8 compiler_4.4.1
## [52] withr_3.0.2 backports_1.5.0 htmlTable_2.4.3
## [55] fastDummies_1.7.5 MASS_7.3-61 tools_4.4.1
## [58] foreign_0.8-86 lmtest_0.9-40 httpuv_1.6.16
## [61] future.apply_1.20.0 nnet_7.3-19 goftest_1.2-3
## [64] glue_1.8.0 nlme_3.1-165 promises_1.3.3
## [67] grid_4.4.1 checkmate_2.3.2 Rtsne_0.17
## [70] cluster_2.1.6 reshape2_1.4.4 generics_0.1.4
## [73] gtable_0.3.6 spatstat.data_3.1-6 tzdb_0.5.0
## [76] data.table_1.17.4 hms_1.1.3 spatstat.geom_3.5-0
## [79] RcppAnnoy_0.0.22 ggrepel_0.9.6 RANN_2.6.2
## [82] pillar_1.10.2 yulab.utils_0.2.0 spam_2.11-1
## [85] RcppHNSW_0.6.0 later_1.4.2 splines_4.4.1
## [88] lattice_0.22-5 survival_3.7-0 deldir_2.0-4
## [91] tidyselect_1.2.1 miniUI_0.1.2 pbapply_1.7-2
## [94] knitr_1.50 gridExtra_2.3 scattermore_1.2
## [97] xfun_0.52 matrixStats_1.5.0 stringi_1.8.7
## [100] lazyeval_0.2.2 ggfun_0.1.8 yaml_2.3.10
## [103] evaluate_1.0.3 codetools_0.2-19 ggplotify_0.1.2
## [106] cli_3.6.5 rpart_4.1.23 uwot_0.2.3
## [109] systemfonts_1.2.3 xtable_1.8-4 reticulate_1.42.0
## [112] jquerylib_0.1.4 dichromat_2.0-0.1 Rcpp_1.0.14
## [115] readxl_1.4.5 globals_0.18.0 spatstat.random_3.4-1
## [118] png_0.1-8 spatstat.univar_3.1-4 parallel_4.4.1
## [121] dotCall64_1.2 listenv_0.9.1 viridisLite_0.4.2
## [124] scales_1.4.0 ggridges_0.5.6 crayon_1.5.3
## [127] rlang_1.1.6 cowplot_1.1.3