rsync alternative? (too many files)
Rick Moen
rick at linuxmafia.com
Mon Mar 7 15:57:49 PST 2011
Quoting Tony Godshall (togo at of.net):
> Hi Seth.
>
> I must not have expressed myself clearly.
>
> There are excessive unique files, not duplicate entries in a list of files.
>
> The files have already been deduplicated in the sense that entries to
> files containing the same content are hardlinks.
>
> If I were to copy the files to new media without retaining the
> hardlinks, they would take up way more space.
I'm afraid I initially didn't quite understand your phrase 'copy the
files without retaining the hardlinks' in this context -- though I now
have a hunch about what you're talking about. (It's entirely possible I
need more caffeine.)
I mean, files _are_ hardlinks. They're data structures that dereference
to inodes. If you copy a file, you by implication copy the hardlink in
the process in so doing -- or, at least, copy all the hardlinks you care
about.
But you mean 'preserving the hard links as being multiple maps to shared
inodes rather than maps to individual, hardlink-specific inodes', right?
Apologies for having not grasped your meaning.
Also, you never really clarified whether you were talking about copying
files within a host or across a network between hosts. I now strongly
suspect you meant the _former_ (and thus the hardlinks you wish to
'preserve' are between source and destination, i.e., not needing to
create new inodes for the destination copy). Most of us think about the
copying problem, especially when hauling out rsync, within the context
of inter-host file copying.
Here's a page that says cpio is the right tool in this context:
http://jeremy.zawodny.com/blog/archives/010037.html
(You'll reportedly need to run it as root if you want to preserve access
times.)
I'll bet GNU tar would also work.
By the way, you _did_ attempt the rsync copy with the -H flag, right?
You probably did: I am not bothering to check the upthread posts. It's
necessary for the 'preserve hard links' behaviour you want, although
running out of RAM for huge copies can still be a problem.
More information about the bad
mailing list