编程语言
首页 > 编程语言> > java – 如何绘制quantil band(在R中)

java – 如何绘制quantil band(在R中)

作者:互联网

我有一个CSV文件,其中包含我感兴趣的每个(Java GC)事件的行.该对象由亚秒时间戳(非等距)和一些变量组成.该对象如下所示:

gcdata <- read.table("http://bernd.eckenfels.net/view/gc1001.ygc.csv",header=TRUE,sep=",", dec=".")
start = as.POSIXct(strptime("2012-01-01 00:00:00", format="%Y-%m-%d %H:%M:%S"))
gcdata.date = gcdata$Timestamp + start
gcdata = gcdata[,2:7] # remove old date col
gcdata=data.frame(date=gcdata.date,gcdata)
str(gcdata)

结果是

'data.frame':   2997 obs. of  7 variables:
 $date           : POSIXct, format: "2012-01-01 00:00:06" "2012-01-01 00:00:06" "2012-01-01 00:00:18" ...
 $Distance.s.    : num  0 0.165 11.289 9.029 11.161 ...
 $YGUsedBefore.K.: int  1610619 20140726 20148325 20213304 20310849 20404772 20561918 21115577 21479211 21544930 ...
 $YGUsedAfter.K. : int  7990 15589 80568 178113 272036 429182 982841 1346475 1412181 1355412 ...
 $Promoted.K.    : int  0 0 0 0 8226 937 65429 71166 62548 143638 ...
 $YGCapacity.K.  : int  22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 22649280 ...
 $Pause.s.       : num  0.0379 0.022 0.0287 0.0509 0.109 ...

在这种情况下,我关心暂停时间(以秒为单位).我想绘制一个图表,它将显示每个(挂钟)小时基本上平均值作为一条线,2%和98%作为灰色走廊,最大值(每小时内)作为一条红线.

我已经完成了一些工作,但使用q98函数很难看,不得不使用多行语句似乎是浪费,我不知道如何在q02和q98之间实现灰色区域:

q02 <- function(x, ...) {  x <- quantile(x,probs=c(0.2)) }
q98 <- function(x, ...) {  x <- quantile(x,probs=c(0.98)) }
hours = droplevels(cut(gcdata$date, breaks="hours")) # can I have 2 hours?
plot(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=max),ylim=c(0,2), col="red", ylab="Pause(s)", xlab="Days") # Is always black?
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=q98),ylim=c(0,2), col="green")
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=q02),ylim=c(0,2), col="green")
lines(aggregate(gcdata$Pause.s. ~ hours, data=gcdata, FUN=mean),ylim=c(0,2), col="blue")

现在得到一个图表,其中黑点为最大值,蓝线为小时平均值,下限和上限为0,2 0,98绿线.我认为有一个灰色的走廊,可能是一个虚线的最大(红色)线,并以某种方式固定轴标签将更好阅读.

有什么建议? (该文件可在上面获得)

解决方法:

很高兴见到这里的Debian老朋友:)你的答案已经相当不错了.由于我碰巧在时间序列中工作很多,我以为我会使用优秀的zooxts套件投入一个变体.后者建立在前者之上,除了其他功能之外,还有我们可以在这里使用的period.apply()函数以及endpoints()函数来获得两小时的聚合.

所以在顶部,我会使用

library(zoo)                                # for zoo objects
library(xts)                                # for period.apply

gcdata <- read.table("http://bernd.eckenfels.net/view/gc1001.ygc.csv",
                     header=TRUE, sep=",", dec=".")
timestamps <- gcdata$Timestamp + 
              as.POSIXct(strptime("2012-01-01 00:00:00", 
                         format="%Y-%m-%d %H:%M:%S"))
gcdatazoo <- zoo(gcdata[-1], order.by=timestamps)    # as zoo object

创建一个动物园对象.你的功能仍然是:

plotAreaCorridor <- function(x, y, col.poly1="lightgray", col.poly2="gray",...) {
    x.pol <- c(x, rev(x), x[1])
    y.pol <- c(y[,1], rev(y[,5]),y[,1][1])
    plot(x, y[,6]+1, type="n", ...) 
    polygon(x.pol, y.pol, col=col.poly1, lty=0)

    x.pol <- c(x, rev(x), x[1])
    y.pol <- c(y[,2], rev(y[,4]), y[,1][1])
    polygon(x.pol, y.pol, col=col.poly2, lty=0)

    lines(x, y[,3], col="blue") # median
    lines(x, y[,6], col="red")  # max

    invisible(NULL)
}

然后我们可以简化一点:

agg <- period.apply(gcdatazoo[,"Pause.s."],               # to which data
                    INDEX=endpoints(gcdatazoo, "hours", k=2), # every 2 hours
                    FUN=function(x) quantile(x,               # what fun.
                                             probs=c(5,20,50,80,95,100)/100)) 

#v99 = q99(gcdata$Pause.s.)        # what is q99 ?
v99 <- mean(agg[,5])                  # mean of 95-th percentile?
plotAreaCorridor(index(agg),          # use time index as x axis
                 coredata(agg),       # and matrix part of zoo object as data
                 ylim=c(0,max(agg[,5])*1.5),
                 ylab="Quantiles of GC events",
                 main="NewPar Collection Activity")
abline(h=median(gcdatazoo[,"Pause.s."]), col="lightblue")
abline(h=v99, col="grey")
labeltxt <- paste("99%=",round(v99,digits=3),"s n=", nrow(gcdatazoo),sep="")
text(x=index(agg)[20], y=1.5*v99, labeltxt, col="grey", pos=3)  # or legend()

这使

轴现在是自动的,只显示工作日,因为跨度小于一周;这可以根据需要覆盖.

标签:java,r,plot,quantile
来源: https://codeday.me/bug/20190529/1181389.html