rsync alternative? (too many files)

Tony Godshall togo at of.net
Sun Mar 6 13:17:02 PST 2011


Hi Rick.  Thanks for responding.  Comments inline:

On Fri, Mar 4, 2011 at 17:23, Rick Moen <rick at linuxmafia.com> wrote:
> Quoting Tony Godshall (tony at godshall.org):
>
>> Anyone know of an rsync alternative or workaround for huge
>> batches of files?
>
> rsync's FAQ suggests ameliorating measures:
>
>  out of memory
>
>  The usual reason for "out of memory" when running rsync is that you are
>  transferring a _very_ large number of files. The size of the files
>  doesn't matter, only the total number of files. If memory is a problem,
>  first try to use the incremental recursion mode: upgrade both sides to
>  rsync 3.0.0 or newer and avoid options that disable incremental
>  recursion (e.g., use --delete-delay instead of --delete-after). If this
>  is not possible, you can break the rsync run into smaller chunks
>  operating on individual subdirectories using --relative and/or exclude
>  rules.

Yes.  Still not enough.

> Or, you could tweak the size of the array of pointers to the file-list
> entries (8 MB, last I heard) in rsync to a larger value and recompile.
>
> But maybe incremental recursion is simply getting switched off?  Quoting
> the rsync manpage:
>
>  Some options require rsync to know the full file list, so  these
>  options  disable the incremental recursion mode.  These include:
>  --delete-before,   --delete-after,    --prune-empty-dirs, and
>  --delay-updates.   Because of this, the default delete mode when
>  you specify --delete is now --delete-during when  both ends  of
>  the  connection are at least 3.0.0 (use --del or --delete-during
>  to request this improved deletion mode  explicitly).   See also
>  the  --delete-delay  option  that  is a better choice than using
>  --delete-after.

Yes.  Still not enough.

>> In particular I'm looking for the ability to do the
>> hardlink-a-tree-then-rsync way of making copies of a complete
>> filesystem without duplicating files and without rsync crashing on me
>> when the number of files to be transferred gets too big.
>
> I'm not sure I followed the first half of that sentence, so apologies if
> I don't 'get' your desired scenario.  Speaking generically, if too many
> files are making rsync hit 'out of memory in flist_expand' or 'out of
> memory in glob_expand' or such, you _could_ switch to caveman methods
> for finding then copying files, such as
>
> find . -type f -print0  -xdev | xargs -0
> ...running cp piped into ssh, or whatever.  'Ware slowness.

Yeah.  I looked at doing a find ... -type d -print0 | xargs -0 mkdir
followed by a one that does the rsyncs without the recursion[1] so
that each rsync would have only one file to do, but that doesn't,
unless I'm missing something, preserve the hardlinks, which is pretty
important since I've got something like 2.5TB residing in about 1.5TB
after file-level deduplication that I'm trying to copy to a 2TB
removable volume.

Tony

[1]  per manpage, -a is -rlptgoD, so the flags would be -lptgoD)


More information about the bad mailing list