4

この質問の再現可能な例(RE)を考え出そうとしています:マージ中のデータフレーム列に関連するエラー。RE を持っていると見なされるためには、この質問には再現可能なデータしかありません。ただし、 のほぼ標準的なアプローチを使用しようとするとdput(head(myDataObj))、生成される出力は 14MB サイズのファイルになります。問題は、データ オブジェクトがデータ フレームのリストであるため、制限が再帰的head()に機能していないように見えることです。

複雑なオブジェクトのデータ サイズを再帰的に制御できるdput()および関数のオプションが見つかりませんでした。上記で間違っていない限り、この状況で最小のRE データセットを作成するための他のどのアプローチをお勧めしますか?head()

4

1 に答える 1

2

Along the lines of @MrFlick's comment of using lapply, you may use any of the apply family of functions to perform the head or sample functions depending on your needs in order to reduce the size for both REs and for testing purposes (I've found that working with subsets or subsamples of large sets of data is preferable for debugging and even charting).

It should be noted that head and tail provide the first or last bits of a structure, but sometimes these don't have sufficient variance in them for RE purposes, and are certainly not random, which is where sample may become more useful.

Suppose we have a hierarchical tree structure (list of lists of...) and we want to subset each "leaf" while preserving the structure and labels in the tree.

x <- list( 
    a=1:10, 
    b=list( ba=1:10, bb=1:10 ), 
    c=list( ca=list( caa=1:10, cab=letters[1:10], cac="hello" ), cb=toupper( letters[1:10] ) ) )

NOTE: In the following, I actually can't tell the difference between using how="replace" and how="list".

ALSO NOTE: This won't be great for data.frame leaf nodes.

# Set seed so the example is reproducible with randomized methods:
set.seed(1)

You can use the default head in a recursive apply in this way:

rapply( x, head, how="replace" )

Or pass an anonymous function that modifies the behavior:

# Complete anonymous function
rapply( x, function(y){ head(y,2) }, how="replace" )
# Same behavior, but using the rapply "..." argument to pass the n=2 to head.
rapply( x, head, how="replace", n=2 )

The following gets a randomized sample ordering of each leaf:

# This works because we use minimum in case leaves are shorter
# than the requested maximum length.
rapply( x, function(y){ sample(y, size=min(length(y),2) ) }, how="replace" )

# Less efficient, but maybe easier to read:
rapply( x, function(y){ head(sample(y)) }, how="replace" )  

# XXX: Does NOT work The following does **not** work 
# because `sample` with a `size` greater than the 
# item being sampled does not work (when 
# sampling without replacement)
rapply( x, function(y){ sample(y, size=2) }, how="replace" )
于 2015-04-06T21:01:30.157 に答える