如何将一个文件的内容重复n次？

问题描述

我正在尝试进行基准测试来比较处理文件的两种不同方式。我有少量的输入数据，但为了获得良好的比较，我需要多次重复测试。

我不想只是重复测试，而是多次复制输入数据(例如 1000 次)，这样 3 行文件就会变成 3000 行，这样我就可以运行更令人满意的测试。

我通过文件名传递输入数据：

mycommand input-data.txt

最佳思路

您不需要 input-duplicated.txt 。

尝试：

mycommand <(perl -0777pe '$_=$_ x 1000' input-data.txt)

Explanation

0777 : -0 设置输入记录分隔符(perl 特殊变量 $/ 默认为换行符)。将其设置为大于 0400 的值将导致 Perl 将整个输入文件放入内存中。
pe ：-p 表示“在应用 -e 给出的脚本后打印每个输入行”。
$_=$_ x 1000 : $_ 是当前输入线。由于我们由于 -0700 而立即读取整个文件，这意味着整个文件。 x 1000 将导致整个文件打印 1000 份。

次佳思路

我原本以为我必须生成一个辅助文件，但我可以在 Bash 中循环原始文件并使用一些重定向使其显示为文件。

可能有十几种不同的循环方式，但这里有四种：

mycommand <( seq 1000 | xargs -i -- cat input-data.txt )
mycommand <( for _ in {1..1000}; do cat input-data.txt; done )
mycommand <((for _ in {1..1000}; do echo input-data.txt; done) | xargs cat )
mycommand <(awk '{for(i=0; i<1000; i++)print}' input-data.txt)  #*

第三种方法是根据下面 maru 的评论即兴创作的，并为 cat 构建了一个大的输入文件名列表。 xargs 会将其拆分为系统允许的尽可能多的参数。比n只单独的 cat 快得多。

awk 方式(受 terdon’s answer 启发)可能是最优化的，但它一次复制每一行。这可能适合也可能不适合特定的应用程序，但它非常快速和高效。

但这是动态生成的。 Bash 输出可能比读取速度慢得多，因此您应该生成一个新文件进行测试。值得庆幸的是，这只是一个非常简单的扩展：

(for _ in {1..1000}; do echo input-data.txt; done) | xargs cat > input-duplicated.txt
mycommand input-duplicated.txt

第三种思路

我只想使用文本编辑器。

vi input-data.txt
gg (move cursor to the beginning of the file)
yG (yank til the end of the file)
G (move the cursor to the last line of the file)
999p (paste the yanked text 999 times)
:wq (save the file and exit)

如果您绝对需要通过命令行执行此操作(这需要您安装 vim，因为 vi 没有 :normal 命令)，您可以使用：

vim -es -u NONE "+normal ggyGG999p" +wq input-data.txt

在这里， -es (或 -e -s )使 vim 静默运行，因此它不应该接管你的终端窗口，并且 -u NONE 阻止它查看你的 vimrc ，这应该使它运行得比其他方式快一点(可能快得多，如果你使用很多 vim 插件)。

第四种思路

这是一个 awk 解决方案：

awk '{a[NR]=$0}END{for (i=0; i<1000; i++){for(k in a){print a[k]}}}' file

它本质上和@Gnuc 的 Perl 一样快(我运行了 1000 次并得到了平均时间)：

$ for i in {1..1000}; do 
 (time awk '{a[NR]=$0}END{for (i=0;i<1000;i++){for(k in a){print a[k]}}}' file > a) 2>&1 | 
    grep -oP 'real.*?m\K[\d\.]+'; done | awk '{k+=$1}END{print k/1000}'; 
0.00426

$ for i in {1..1000}; do 
  (time perl -0777pe '$_=$_ x 1000' file > a ) 2>&1 | 
    grep -oP 'real.*?m\K[\d\.]+'; done | awk '{k+=$1}END{print k/1000}'; 
0.004076

第五种思路

这是一个简单的 one-liner，不涉及脚本：

mycommand <(cat `yes input-data.txt | head -1000 | paste -s`)

Explanation

`yes input-data.txt | head -1000 | paste -s` 生成文本 input-data.txt 1000 次，由空格分隔
然后文本作为文件列表传递到 cat

参考资料

How can I repeat the content of a file n times?