为什么ls -R称为“recursive”列表？

问题描述

据我所知，ls -R显示目录列表。但为什么它是递归的？如何在过程中使用递归？

最佳解决方法

首先，让我们定义一个任意的文件夹结构：

.
├── a1 [D]
│   ├── b1 [D]
│   │   ├── c1
│   │   ├── c2 [D]
│   │   │   ├── d1
│   │   │   ├── d2
│   │   │   └── d3
│   │   ├── c3
│   │   ├── c4
│   │   └── c5
│   └── b2 [D]
│       ├── c6
│       └── c7
├── a2 [D]
│   ├── b3 [D]
│   │   ├── c8
│   │   └── c9
│   └── b4 [D]
│       ├── c10 
│       └── c11
├── a3 [D]
│   ├── b5
│   ├── b6
│   └── b7
└── a4 [D]

当我们执行ls时，我们只得到基本文件夹的输出：

a1 a2 a3 a4

但是，当我们调用ls -R时，我们会得到一些不同的东西：

.:
a1  a2  a3  a4

./a1:
b1  b2

./a1/b1:
c1  c2  c3  c4  c5

./a1/b1/c2:
d1  d2  d3

./a1/b2:
c6  c7

./a2:
b3  b4

./a2/b3:
c8  c9

./a2/b4:
c10  c11

./a3:
b5  b6  b7

./a4:

如您所见，它在基本文件夹上运行ls，然后运行所有子文件夹。和所有孙子文件夹，无限广告。实际上，命令以递归方式遍历每个文件夹，直到它到达目录树的末尾。此时，它会返回树中的一个分支，并为任何sub-folders执行相同的操作(如果有的话)。

或者，在pseudo-code中：

recursivelyList(directory) {
    files[] = listDirectory(directory)              // Get all files in the directory
    print(directory.name + ":\n" + files.join(" ")) // Print the "ls" output
    for (file : files) {                            // Go through all the files in the directory
        if (file.isDirectory()) {                   // Check if the current file being looked at is a directory
            recursivelyList(directory)              // If so, recursively list that directory
        }
    }
}

而且因为我可以，同样的reference Java implementation。

次佳解决方法

实际上，您可能会问两个紧密耦合的问题。

为什么走到文件系统层次结构中的每个条目的过程都是一个固有的递归过程？其他答案解决了这个问题，例如Zanna’s和Kaz Wolfe’s。
如何在ls的实现中使用递归技术？从你的措辞(“过程中如何使用递归？”)，我认为这是你想知道的一部分。这个答案解决了这个问题。

为什么使用递归技术实现`ls`是有意义的：

FOLDOC将recursion定义为：

When a function (or procedure) calls itself. Such a function is called “recursive”. If the call is via one or more other functions then this group of functions are called “mutually recursive”.

实现ls的自然方法是编写一个函数来构造要显示的文件系统条目列表，以及其他代码来处理路径和选项参数并根据需要显示条目。该函数极有可能以递归方式实现。

在选项处理期间，ls将确定是否已要求它以递归方式操作(通过使用-R标志调用)。如果是这样，构建要显示的条目列表的函数将为其列出的每个目录调用一次，但.和..除外。此函数可能有单独的递归和非递归版本，或者函数可能每次检查是否应该递归操作。

Ubuntu的/bin/ls是运行ls时运行的可执行文件，由GNU Coreutils提供，它具有许多功能。因此，its code比您预期的更长，更复杂。但是Ubuntu还包含一个更简单的ls版本，由BusyBox提供。您可以通过键入busybox ls来运行此操作。

`busybox ls`如何使用递归：

BusyBox中的ls在coreutils/ls.c中实现。它包含一个scan_and_display_dirs_recur()函数，该函数被调用以递归方式打印目录树：

static void scan_and_display_dirs_recur(struct dnode **dn, int first)
{
    unsigned nfiles;
    struct dnode **subdnp;

    for (; *dn; dn++) {
        if (G.all_fmt & (DISP_DIRNAME | DISP_RECURSIVE)) {
            if (!first)
                bb_putchar('\n');
            first = 0;
            printf("%s:\n", (*dn)->fullname);
        }
        subdnp = scan_one_dir((*dn)->fullname, &nfiles);
#if ENABLE_DESKTOP
        if ((G.all_fmt & STYLE_MASK) == STYLE_LONG || (G.all_fmt & LIST_BLOCKS))
            printf("total %"OFF_FMT"u\n", calculate_blocks(subdnp));
#endif
        if (nfiles > 0) {
            /* list all files at this level */
            sort_and_display_files(subdnp, nfiles);

            if (ENABLE_FEATURE_LS_RECURSIVE
             && (G.all_fmt & DISP_RECURSIVE)
            ) {
                struct dnode **dnd;
                unsigned dndirs;
                /* recursive - list the sub-dirs */
                dnd = splitdnarray(subdnp, SPLIT_SUBDIR);
                dndirs = count_dirs(subdnp, SPLIT_SUBDIR);
                if (dndirs > 0) {
                    dnsort(dnd, dndirs);
                    scan_and_display_dirs_recur(dnd, 0);
                    /* free the array of dnode pointers to the dirs */
                    free(dnd);
                }
            }
            /* free the dnodes and the fullname mem */
            dfree(subdnp);
        }
    }
}

递归函数调用发生的行是：

                    scan_and_display_dirs_recur(dnd, 0);

看到递归函数调用发生时：

如果在调试器中运行busybox ls，则可以在运行中看到此信息。首先通过enabling -dbgsym.ddeb packages安装debug symbols，然后安装busybox-static-dbgsym软件包。同样安装gdb(这是调试器)。

sudo apt-get update
sudo apt-get install gdb busybox-static-dbgsym

我建议在一个简单的目录树上调试coreutils ls。

如果你没有一个方便，那就做一个(这与WinEunuuchs2Unix’s answer中的mkdir -p命令的工作方式相同)：

mkdir -pv foo/{bar/foobar,baz/quux}

并用一些文件填充它：

(shopt -s globstar; for d in foo/**; do touch "$d/file$((i++))"; done)

您可以验证busybox ls -R foo是否按预期工作，产生以下输出：

foo:
bar    baz    file0

foo/bar:
file1   foobar

foo/bar/foobar:
file2

foo/baz:
file3  quux

foo/baz/quux:
file4

在调试器中打开busybox：

gdb busybox

GDB将打印一些有关自身的信息。然后应该说：

Reading symbols from busybox...Reading symbols from /usr/lib/debug/.build-id/5c/e436575b628a8f54c2a346cc6e758d494c33fe.debug...done.
done.
(gdb)

(gdb)是调试器中的提示。你要告诉GDB在这个提示上要做的第一件事就是在scan_and_display_dirs_recur()函数的开头设置一个断点：

b scan_and_display_dirs_recur

当你运行它时，GDB应该告诉你类似的东西：

Breakpoint 1 at 0x5545b4: file coreutils/ls.c, line 1026.

现在告诉GDB使用ls -R foo(或您想要的任何目录名)运行busybox作为其参数：

run ls -R foo

你可能会看到这样的事情：

Starting program: /bin/busybox ls -R foo

Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6c60, first=1) at coreutils/ls.c:1026
1026    coreutils/ls.c: No such file or directory.

如果你看到No such file or directory，如上所述，那没关系。本演示的目的只是查看何时调用scan_and_display_dirs_recur()函数，因此GDB不需要检查实际的源代码。

请注意，调试器甚至在打印任何目录条目之前都会遇到断点。它打破了该函数的入口，但该函数中的代码必须为要枚举的任何目录运行以进行打印。

要告诉GDB继续，请运行：

每次调用scan_and_display_dirs_recur()时，断点都会再次被击中，因此您将看到递归操作。它看起来像这样(包括(gdb)提示和你的命令)：

(gdb) c
Continuing.
foo:
bar    baz    file0

Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cb0, first=first@entry=0) at coreutils/ls.c:1026
1026    in coreutils/ls.c
(gdb) c
Continuing.

foo/bar:
file1   foobar

Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cf0, first=first@entry=0) at coreutils/ls.c:1026
1026    in coreutils/ls.c
(gdb) c
Continuing.

foo/bar/foobar:
file2

foo/baz:
file3  quux

Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6cf0, first=first@entry=0) at coreutils/ls.c:1026
1026    in coreutils/ls.c
(gdb) c
Continuing.

foo/baz/quux:
file4
[Inferior 1 (process 2321) exited normally]

该函数的名称中有recur …当给出-R标志时，BusyBox只使用它吗？在调试器中，这很容易找到：

(gdb) run ls foo
Starting program: /bin/busybox ls foo

Breakpoint 1, scan_and_display_dirs_recur (dn=dn@entry=0x7e6c60, first=1) at coreutils/ls.c:1026
1026    in coreutils/ls.c
(gdb) c
Continuing.
bar    baz    file0
[Inferior 1 (process 2327) exited normally]

即使没有-R，ls的这个特定实现也使用相同的函数来找出存在的文件系统条目并显示它们。

如果要退出调试器，只需告诉它：

`scan_and_display_dirs_recur()`如何知道它是否应该自行调用：

具体来说，当-R标志通过时，它的工作方式有何不同？检查source code(可能不是Ubuntu系统上的确切版本)显示它检查其内部数据结构G.all_fmt，它存储调用的选项：

            if (ENABLE_FEATURE_LS_RECURSIVE
             && (G.all_fmt & DISP_RECURSIVE)

(如果编译BusyBox时不支持-R，那么它也不会尝试递归显示文件系统条目;这就是ENABLE_FEATURE_LS_RECURSIVE部分的内容。)

仅当G.all_fmt & DISP_RECURSIVE为true时，才会运行包含递归函数调用的代码。

                struct dnode **dnd;
                unsigned dndirs;
                /* recursive - list the sub-dirs */
                dnd = splitdnarray(subdnp, SPLIT_SUBDIR);
                dndirs = count_dirs(subdnp, SPLIT_SUBDIR);
                if (dndirs > 0) {
                    dnsort(dnd, dndirs);
                    scan_and_display_dirs_recur(dnd, 0);
                    /* free the array of dnode pointers to the dirs */
                    free(dnd);
                }

否则，该函数只运行一次(在命令行上指定的每个目录)。

第三种解决方法

当你考虑它时，”recursive”对于作用于目录及其文件和目录及其文件和目录及其文件和目录及其文件的命令是有意义的………

….直到整个树从指定的点向下被命令操作，在这种情况下列出任何子目录的任何子目录的任何子目录的内容……….存在于命令的参数

第四种方法

-R用于递归，可以松散地称为”repeatedly”。

以此代码为例：

───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/a
───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/b/1
───────────────────────────────────────────────────────────────────────────────
$ mkdir -p temp/c/1/2
───────────────────────────────────────────────────────────────────────────────
$ ls -R temp
temp:
a  b  c

temp/a:

temp/b:
1

temp/b/1:

temp/c:
1

temp/c/1:
2

temp/c/1/2:

制作目录中的-p允许您使用单个命令批量创建目录。如果一个或多个top-middle目录已经存在，则不是错误，并且创建了middle-lower目录。

然后，ls -R以递归方式列出以temp开头的每个目录，并沿着树向下运行到所有分支。

现在让我们看一下ls -R命令的补充，即tree命令：

$ tree temp
temp
├── a
├── b
│   └── 1
└── c
    └── 1
        └── 2

6 directories, 0 files

正如您所见，tree完成与ls -R相同的工作，除了更简洁，我敢说”prettier”。

现在让我们看看如何以一个简单的命令递归删除我们刚创建的目录：

$ rm -r temp

这递归地删除了temp和它下面的所有sub-directories。即temp/a，temp/b/1和temp/c/1/2以及其间的中间目录。

第五种方法

这是一个简单的解释，它是有道理的，因为当涉及到显示子目录的内容时，同一个函数已经知道如何处理目录。因此，它只需要在每个子目录上调用自己来获得该结果！

在伪代码中，它看起来像这样：

recursive_ls(dir)
    print(files and directories)
    foreach (directoriy in dir)
        recursive_ls(directory)
    end foreach
end recursive_ls

参考资料

Why is ls -R called “recursive” listing?