image - マークダウンからdocxへのpandoc変換を使用した図のサイズ

Question

RstudioでRmarkdownを使用してレポートを入力します。knitrで変換すると、knitrで作成されたファイルhtmlもあります。markdown私はこのファイルを次のように変換しpandocます：

pandoc -f markdown -t docx input.md -o output.docx

このoutput.docxファイルは、1つの問題を除いて問題ありません。図のサイズが変更されているため、Wordで図のサイズを手動で変更する必要があります。pandoc適切な数字のサイズを取得するために、何かすることがありますか、おそらくオプションがありますか？

score 8 · Accepted Answer

k簡単な方法は、個々のチャンクオプションにスケール係数を含めることです。

{r, fig.width=8*k, fig.height=6*k}

dpiおよびグローバルチャンクオプションの変数：

opts_chunk$set(dpi = dpi)

次に、グローバル環境でファイルを編成する前に、の値を設定できdpiます。kRmd

dpi <<- 96    
k <<- 1

または、Rmdファイル内のチャンクに設定することもできます（kたとえば、最初のチャンクに設定します）。

score 3 · Accepted Answer

これは、RスクリプトのImageMagickを使用して図のサイズを変更するためのソリューションです。70％の比率は良い選択のようです。

# the path containing the Rmd file :
wd <- "..."
setwd(wd)

# the folder containing the figures :
fig.path <- paste0(wd, "/figure")
# all png figures :
figures <- list.files(fig.path, pattern=".png", all.files=TRUE)

# (safety) create copies of the original files
dir.create(paste0(fig.path,"_copy"))
for(i in 1:length(figures)){
  fig <- paste0(fig.path, "/", figures[i])
  file.copy(fig,"figure_copy")
}

# resize all figures
for(i in 1:length(figures)){
    fig <- paste0(fig.path, "/", figures[i])
    comm <- paste("convert -resize 70%", fig, fig)
    shell(comm)
}

# then run pandoc from a command line  
# or from the pandoc() function :
library(knitr)
pandoc("MyReport.md", "docx")

resizeImageMagickの機能に関する詳細情報： www.perturb.org

score 3 · Accepted Answer

また、Rマークダウンをhtmlと.docx / .odtの両方に変換し、適切なサイズと解像度の図を作成したいと思います。これまで、これを行う最良の方法は、.mdドキュメント（dpi、fig.width、fig.heightオプション）でグラフの解像度とサイズを明示的に定義することであることがわかりました。これを行うと、公開に使用できる優れたグラフが得られ、odt/docxは問題ありません。デフォルトの72dpiよりもはるかに高いdpiを使用する場合の問題は、htmlファイルでグラフが大きくなりすぎることです。これを処理するために私が使用した3つのアプローチは次のとおりです（NB私はspin（）構文でRスクリプトを使用します）：

1）knitrオプションでout.extra ='WIDTH = "75％"'を使用します。これにより、htmlのすべてのグラフがウィンドウ幅の75％を占めるようになります。これは迅速な解決策ですが、サイズが大きく異なるプロットがある場合は最適ではありません。（注：私はインチよりもセンチメートルで作業することを好みます。したがって、どこでも/2.54です）

library(knitr)
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
               fig.width = 8/2.54, fig.height = 8/2.54,
               out.extra ='WIDTH="75%"'
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

2）out.widthとout.heightを使用して、グラフのサイズをピクセル単位でhtmlファイルに指定します。定数「sc」を使用して、プロットのサイズをhtml出力に縮小します。これはより正確なアプローチですが、問題は、グラフごとにfig.witdth/heightとout.width/heightの両方を定義する必要があり、これは本当につまらないことです。理想的には、グローバルオプションで、たとえばout.width = 150 * fig.width（fig.widthがチャンクごとに変わる）を指定できる必要があります。たぶんそのようなことは可能ですが、私には方法がわかりません。

#+ echo = FALSE
library(knitr)
sc <- 150
opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), dpi = 400,
                fig.width = 8/2.54, fig.height = 8/2.54,
                out.width = sc*8/2.54, out.height = sc*8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54, out.width= sc * 14/2.54, out.height= sc * 10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

これらの2つのソリューションでは、mdファイルをpandocを使用してodtに直接変換することはできないと思います（図は含まれていません）。mdをhtmlに変換し、次にhtmlをodtに変換します（docxでは試していません）。そのようなもの（以前のRスクリプトが「figsize1.R」という名前の場合）：

library(knitr)
setwd("/home/gilles/")
spin("figsize1.R")

system("pandoc figsize1.md -o figsize1.html")
system("pandoc figsize1.html -o figsize1.odt")

3）ドキュメントを2回コンパイルするだけです。1回はhtml出力用に低dpi値（〜96）で、もう1回はodt / docx出力用に高解像度（〜300）でコンパイルします。これが今の私の好みの方法です。主な欠点は、2回コンパイルする必要があることですが、エンドユーザーに提供するためにジョブの最後にのみodtファイルが必要になるため、これは私にとって実際には問題ではありません。Rstudioのhtmlノートブックボタンを使用して、作業中にhtmlを定期的にコンパイルします。

#+ echo = FALSE
library(knitr)

opts_chunk$set(echo = FALSE, dev = c("png", "pdf"), 
               fig.width = 8/2.54, fig.height = 8/2.54
)

data(iris)

#' # Iris datatset
summary(iris)
boxplot(iris[,1:4])

#+ fig.width=14/2.54, fig.height=10/2.54
par(mar = c(2,2,2,2))
pairs(iris[,-5])

次に、次のスクリプトを使用して2つの出力をコンパイルします（ここでは、mdファイルをhtmlに直接変換できます）。

library(knitr)
setwd("/home/gilles")

opts_chunk$set(dpi=96)
spin("figsize3.R", knit=FALSE)
knit2html("figsize3.Rmd")

opts_chunk$set(dpi=400)
spin("figsize3.R")
system("pandoc figsize3.md -o figsize3.odt")

score 2 · Accepted Answer

これが私の解決策です。docxは単にxmlファイルのバンドルであり、図形のサイズを調整するのは非常に簡単なので、Pandocによって変換されたdocxをハックします。

word/document.xml変換されたdocxから抽出された図は次のようになります。

<w:p>
  <w:r>
    <w:drawing>
      <wp:inline>
        <wp:extent cx="1524000" cy="1524000" />
        ...
        <a:graphic>
          <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:pic>
              ...
              <pic:blipFill>
                <a:blip r:embed="rId23" />
                ...
              </pic:blipFill>
              <pic:spPr bwMode="auto">
                <a:xfrm>
                  <a:off x="0" y="0" />
                  <a:ext cx="1524000" cy="1524000" />
                </a:xfrm>
                ...
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

したがって、ノードのcx＆属性＆を目的の値に置き換えると、サイズ変更作業が行われます。次のRコードは私のために働きます。最も幅の広い図は、変数で指定された行全体の幅を占め、残りは比例してサイズ変更されます。cywp:extenta:extout.width

require(XML)

## default linewidth (inch) for Word 2003
out.width <- 5.77
docx.file <- "report.docx"

## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
wp.extent <- getNodeSet(xmlRoot(doc), "//wp:extent")
a.blip <- getNodeSet(xmlRoot(doc), "//a:blip")
a.ext <- getNodeSet(xmlRoot(doc), "//a:ext")

figid <- sapply(a.blip, xmlGetAttr, "r:embed")
figname <- dir("temp_dir/word/media/")
stopifnot(length(figid) == length(figname))
pdffig <- paste("temp_dir/word/media/",
                ## in case figure ids in docx are not in dir'ed order
                sort(figname)[match(figid, substr(figname, 1, nchar(figname) - 4))], sep="")

## get dimension info of included pdf figures
pdfsize <- do.call(rbind, lapply(pdffig, function (x) {
    fig.ext <- substr(x, nchar(x) - 2, nchar(x))
    pp <- pipe(paste(ifelse(fig.ext == 'pdf', "pdfinfo", "file"), x, sep=" "))
    pdfinfo <- readLines(pp); close(pp)
    sizestr <- unlist(regmatches(pdfinfo, gregexpr("[[:digit:].]+ X [[:digit:].]+", pdfinfo, ignore.case=T)))
    as.numeric(strsplit(sizestr, split=" x ")[[1]])
}))

## resizing pdf figures in xml DOM, with the widest figure taking up a line's width
wp.cx <- round(out.width*914400*pdfsize[,1]/max(pdfsize[,1]))
wp.cy <- round(wp.cx*pdfsize[, 2]/pdfsize[, 1])
wp.cx <- as.character(wp.cx)
wp.cy <- as.character(wp.cy)
sapply(1:length(wp.extent), function (i)
       xmlAttrs(wp.extent[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));
sapply(1:length(a.ext), function (i)
       xmlAttrs(a.ext[[i]]) <- c(cx = wp.cx[i], cy = wp.cy[i]));

## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")

image - マークダウンからdocxへのpandoc変換を使用した図のサイズ

4 に答える 4

Related

Reference