CHIPseek笔记记录
作者:互联网
有人在进行reads比对的时候,参考基因组用的是RefSeq的基因组(如斑马鱼的基因组),可以发现比对出来,染色体的编号是如图所示:但据我了解(知道的朋友可以交流),CHIPseeker仅识别染色体编号为chrX(X代表数字,且字母要严格小写),因此要顺利分析进行,必须进行编号改变
由下图可以发现上染色体编号的对应关系(所以直接替换就行)
# linux 批量修改文件的对应内容
for i in `ls *.fas`; do echo "sed 's/NC_007112.7/chr1/g' $i |sed 's/NC_007113.7/chr2/g' |sed 's/NC_007114.7/chr3/g' |sed 's/NC_007115.7/chr4/g' |sed 's/NC_007116.7/chr5/g' |sed 's/NC_007118.7/chr7/g' |sed 's/NC_007119.7/chr8/g' |sed 's/NC_007120.7/chr9/g' |sed 's/NC_007121.7/chr10/g' |sed 's/NC_007122.7/chr11/g' |sed 's/NC_007123.7/chr12/g' |sed 's/NC_007124.7/chr13/g' |sed 's/NC_007125.7/chr14/g' |sed 's/NC_007126.7/chr15/g' |sed 's/NC_007127.7/chr16/g' |sed 's/NC_007128.7/chr17/g' |sed 's/NC_007129.7/chr18/g' |sed 's/NC_007130.7/chr19/g' |sed 's/NC_007131.7/chr20/g' |sed 's/NC_007132.7/chr21/g' |sed 's/NC_007133.7/chr22/g' |sed 's/NC_007134.7/chr23/g' |sed 's/NC_007135.7/chr24/g' |sed 's/NC_007136.7/chr25/g' |sed 's/NC_002333.2/chrMT/g' > $i.re"; done >> run.sh
结果:
对不同物种进行分析,需要用到不同的注释库,如下方式可以得到对应的库(OrgDb和TxDb库)
分析代码:
# CHIPseeker_zebrafish # ly-2021-07-29 # org.Xx.eg.db系列包概述: http://www.bio-info-trainee.com/788.html cran_packages <- c("BiocManager", "reshape", "Vennerable", "RMariaDB", "UpSetR", "optparse", "stringr", "ggplot2") Biocdouctor_packages <- c("RBGL", "graph",
"TxDb.Hsapiens.UCSC.hg19.knownGene", #CHIPseeker的依赖
"TxDb.Hsapiens.UCSC.hg38.knownGene",
"CHIPseeker",
"TxDb.Drerio.UCSC.danRer11.refGene",
"clusterProfiler",
"org.Dr.eg.db")
# 加载或安装必须包 for(pkg in cran_packages){ if(!require(pkg, character.only = T)){ install.packages(pkg, ask=F, update=F) require(pkg, character.only = T) } } for(pkg in Biocdouctor_packages){ if(!require(pkg, character.only = T)){ BiocManager::install(pkg, ask = F, update = F) require(pkg, character.only = T) } } # 加载org.Dr.eg.db包可能会报错:$ operator is invalid for atomic vectors options(connectionObserver = NULL) library("org.Dr.eg.db") #不同物种对应的库有差别
# 安装Vennerable可能报错
install.packages("Vennerable", repos="http:/ /R-Forge.R-project.org")
library("Vennerable")
# 确定是否能加载所有包 for (pkg in c(cran_packages, Biocdouctor_packages)) { require(pkg, character.only = T) } library(CHIPseeker) #前面加在可能会出错
search() #[43]
# 开始分析 files <- getSampleFiles() # 读取例子文件 print(files) peak <- readPeakFile(files[[4]]) # peak calling peak
## 自己的bed文件(注意染色体名字要求)
#peak <- readPeakFile("ctrl_1_summits.bed")
# 首先可视化自己的bed文件(支持多个bed文件同时可视化:扩展阅读2)
# ?covplot covplot(peak, weightCol = "V5", chrs = c("chr17", "chr18"), # 默认整个 xlim = c(4.5e7, 5e7)) #weightCol展现表达量
# peak的注释
TxDb <- TxDb.Hsapiens.UCSC.hg38.knownGene #这里究竟该用什么版本,一定要跟上有分析对应
promter <- getPromoters(TxDb = TxDb, upstream = 3000, downstream = 3000) # 默认的是TxDb.Hsapiens.UCSC.hg19.knownGene
# filelist <- c("test")
# temp = get(filelist)
PeakAnno <- annotatePeak(peak,tssRegion = c(-3000, 3000),annoDb = "org.Hs.eg.db",TxDb=TxDb) #一定要在bed文件的染色体编号前加chr,否则会出现如下报错: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘NSBS’ for signature ‘"SortedByQueryHits"’
#自己定义promoter区域,上下游3000bp promoter <- getPromoters(upstream=3000, downstream=3000) #不理解这个函数也没关系,是为下一步做热图提供matrix tagMatrix <- getTagMatrix(peak, windows=promoter) tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color="red") #一键绘图,效果同上 # peakHeatmap(peak, TxDb=txdb, upstream=3000, downstream=3000, color="red") plotAvgProf(tagMatrix, xlim=c(-3000, 3000), xlab="Genomic Region (5'->3')", ylab = "Read Count Frequency") #加一个置信区间 plotAvgProf(tagMatrix, xlim=c(-3000, 3000), conf = 0.95, resample = 1000) peak #共计1331个peak peakAnno <- annotatePeak(files[[4]], tssRegion=c(-3000, 3000)) peakAnno class(peakAnno) peakAnno.df <- as.data.frame(peakAnno) peakAnno.gr <- as.GRanges(peakAnno) head(peakAnno.gr, 3) plotAnnoPie(peakAnno) plotAnnoBar(peakAnno) vennpie(peakAnno) install.packages("magick") install.packages("Rcpp") library(Rcpp) library(magick) ChIPseeker::upsetplot(peakAnno, vennpie=TRUE) upsetplot(peakAnno, vennpie=TRUE) #也行
其他都没问题,就在最后一步用upsetplot(peakAnno, vennpie=TRUE)画图时有点问题(运行就一直卡着不动,但又不报错):使用该函数时,需要依赖包magick和Rcpp,第一次安装之后,发现也不能加载,但不会报错(一直卡着不动),最后的解决办法是重新安装这两个包
在linux上安装CHIPseeker软件报错:
安装CHIPseeker时,会依赖TxDb.Hsapiens.UCSC.hg19.knownGene包,但安装时出现如图报错。起初以为是网络的问题,其实不是。解决办法为:Win+R,输入inetcpl.cpl 直接打开Internet选项
。打开后,在高级中勾选使用TSL 3.0、使用TLS 1.0、使用TLS 1.1、使用TLS 1.2。
扩展阅读:
标签:记录,CHIPseek,NC,笔记,sed,pkg,packages,CHIPseeker,TxDb 来源: https://www.cnblogs.com/ly-zy/p/15077007.html