考えられる解決策は次のとおりです(コードはかなり自明です):
text="
1
00:00:19,000 --> 00:00:21,989
I'm Annita McVeigh and welcome to Election Today where we'll bring you
2
00:00:22,000 --> 00:00:23,989
the latest from the campaign trail,
plus debate
and analysis.
3
00:00:24,000 --> 00:00:28,989
The Liberal Democrats promise to protect
the pay of millions"
con<-textConnection(text)
lines <- readLines(con)
# the previous lines of code are just to replicate you case, and
# they should be replaced by the following single line in the real case
# lines <- readLines(srtFileName)
listOfEntries <-
lapply(split(1:length(lines),cumsum(grepl("^\\s*$",lines))),function(blockIdx){
block <- lines[blockIdx]
block <- block[!grepl("^\\s*$",block)]
if(length(block) == 0){
return(NULL)
}
if(length(block) < 3){
warning("a block not respecting srt standards has been found")
}
return(data.frame(id=block[1],
times=block[2],
textString=paste0(block[3:length(block)],collapse="\n"),
stringsAsFactors = FALSE))
})
m <- do.call(rbind,listOfEntries)
# split start and end times
tmp <- do.call(rbind,strsplit(m[,'times'],' --> '))
m$startTime <- tmp[,1]
m$endTime <- tmp[,2]
# parse times
tmp <- do.call(rbind,lapply(strsplit(m$startTime,':|,'),as.numeric))
m$fromSeconds <- tmp %*% c(60*60,60,1,1/1000)
tmp <- do.call(rbind,lapply(strsplit(m$endTime,':|,'),as.numeric))
m$toSeconds <- tmp %*% c(60*60,60,1,1/1000)
# compute time difference in seconds
m$timeDiffInSecs <- m$toSeconds - m$fromSeconds
# word count
m$wordCount <- vapply(gregexpr("\\W+",m$textString),length,0) + 1
# or if you consider "I'm" a single word you can remove the occurrencies of ', e.g. :
#m$wordCount <- vapply(gregexpr("\\W+",gsub("'","",m$textString)),length,0) + 1
m$millisecsPerWord <- m$timeDiffInSecs * 1000 / m$wordCount
結果 :
> m
id times textString
2 1 00:00:19,000 --> 00:00:21,989 I'm Annita McVeigh and welcome to Election Today where we'll bring you
3 2 00:00:22,000 --> 00:00:23,989 the latest from the campaign trail, \nplus debate \nand analysis.
6 3 00:00:24,000 --> 00:00:28,989 The Liberal Democrats promise to protect \nthe pay of millions
startTime endTime fromSeconds toSeconds timeDiffInSecs wordCount millisecsPerWord
2 00:00:19,000 00:00:21,989 19 21.989 2.989 14 213.5000
3 00:00:22,000 00:00:23,989 22 23.989 1.989 11 180.8182
6 00:00:24,000 00:00:28,989 24 28.989 4.989 10 498.9000