shell - wget を使用して、任意のファイルを含むディレクトリを再帰的にフェッチする

Question

いくつかの構成ファイルを保存する Web ディレクトリがあります。wget を使用してこれらのファイルをプルダウンし、現在の構造を維持したいと考えています。たとえば、リモートディレクトリは次のようになります。

http://mysite.com/configs/.vim/

.vim は、複数のファイルとディレクトリを保持します。wget を使用して、クライアント上でそれを複製したいと考えています。これを行うための wget フラグの適切な組み合わせが見つからないようです。何か案は？

score 1103 · Accepted Answer

-np/--no-parentオプションを(もちろん/wgetに加えて)に渡す必要があります。そうしないと、私のサイトのディレクトリインデックスにある親ディレクトリへのリンクがたどられます。したがって、コマンドは次のようになります。-r--recursive

wget --recursive --no-parent http://example.com/configs/.vim/

自動生成されたindex.htmlファイルをダウンロードしないようにするには、-R/--rejectオプションを使用します。

wget -r -np -R "index.html*" http://example.com/configs/.vim/

score 137 · Accepted Answer

ディレクトリを再帰的にダウンロードするには、index.html* ファイルを拒否し、ホスト名、親ディレクトリ、およびディレクトリ構造全体なしでダウンロードします。

wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://mysite.com/dir1/dir2/data

score 123 · Accepted Answer

同様の問題を抱えている他の人のために。Wget が続くrobots.txtため、サイトを取得できない可能性があります。心配する必要はありません。オフにできます。

wget -e robots=off http://www.example.com/

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

score 42 · Accepted Answer

-m (ミラー) フラグを使用する必要があります。これは、タイムスタンプを台無しにせず、無期限に再帰するように注意するためです。

wget -m http://example.com/configs/.vim/

このスレッドで他の人が言及したポイントを追加すると、次のようになります。

wget -m -e robots=off --no-parent http://example.com/configs/.vim/

score 40 · Accepted Answer

これは、サーバーのディレクトリからファイルをダウンロードするために機能した完全な wget コマンドです（無視robots.txt）：

wget -e robots=off --cut-dirs=3 --user-agent=Mozilla/5.0 --reject="index.html*" --no-parent --recursive --relative --level=1 --no-directories http://www.example.com/archive/example/5.3.0/

score 8 · Accepted Answer

解決しない場合--no-parentは、オプションを使用--includeできます。

ディレクトリ構造:

http://<host>/downloads/good
http://<host>/downloads/bad

そして、ディレクトリdownloads/goodではなくダウンロードしたい：downloads/bad

wget --include downloads/good --mirror --execute robots=off --no-host-directories --cut-dirs=1 --reject="index.html*" --continue http://<host>/downloads/good

score 7 · Accepted Answer

wget -r http://mysite.com/configs/.vim/

私のために働きます。

おそらく、干渉している .wgetrc がありますか?

score 4 · Accepted Answer

このバージョンは再帰的にダウンロードし、親ディレクトリを作成しません。

wgetod() {
    NSLASH="$(echo "$1" | perl -pe 's|.*://[^/]+(.*?)/?$|\1|' | grep -o / | wc -l)"
    NCUT=$((NSLASH > 0 ? NSLASH-1 : 0))
    wget -r -nH --user-agent=Mozilla/5.0 --cut-dirs=$NCUT --no-parent --reject="index.html*" "$1"
}

使用法：

端末に追加~/.bashrcまたは貼り付け
wgetod "http://example.com/x/"

score 2 · Accepted Answer

Wget 1.18 の方が適切に動作する可能性があります。たとえば、バージョン 1.12 のバグに噛まれました...

wget --recursive (...)

...すべてのファイルではなく index.html のみを取得します。

回避策は、いくつかの 301 リダイレクトに気付き、新しい場所を試すことでした。新しい URL を指定すると、wget はディレクトリ内のすべてのファイルを取得しました。

score -1 · Accepted Answer

-1

-r を追加するだけでそれを実行できるはずです

wget -r http://stackoverflow.com/

于 2008-11-07T21:50:44.320 に答える

shell - wget を使用して、任意のファイルを含むディレクトリを再帰的にフェッチする

16 に答える 16

Related

Reference