rsync alternative? (too many files)

Tony Godshall togo at of.net
Tue Mar 8 12:36:21 PST 2011


Best Regards.
This is unedited.
P-)



On Mon, Mar 7, 2011 at 18:49, Seth David Schoen <schoen at loyalty.org> wrote:
> Tony Godshall writes:
>
>> On Sun, Mar 6, 2011 at 15:19, Seth David Schoen <schoen at loyalty.org> wrote:
>>
>> [Seth]
>> > If you're sure that the filenames don't contain tabs, you can...
>>
>> Hi Seth.
>>
>> I must not have expressed myself clearly.
>>
>> There are excessive unique files, not duplicate entries in a list of files.
>>
>> The files have already been deduplicated in the sense that entries to
>> files containing the same content are hardlinks.
>>
>> If I were to copy the files to new media without retaining the
>> hardlinks, they would take up way more space.
>
> Hi Tony,
>
> My approach copies "unique" files in the sense of inode uniqueness
> (not content uniqueness or name uniqueness) so I think it does
> address what you want.  I was using find to print inode numbers
> as the basis of a solution that preserves hard link structure,
> since two files are hard links if and only if they have the same
> inode number.
>
> The only reason that my advice worries about whitespace in the
> filenames is that it will confuse some of the programs that are
> being asked to process those filenames, not that the filenames
> themselves are being used to identify or distinguish "unique"
> files.

Ah, I misunderstood.

Will look at your strategy of explicitly detecting/using inodes further.

>> Yes, what I need is for the destination to have the same hardlink structure
>> as the source- it's a file by file backup of a bunch of machines and many
>> of the files were identical and will never be written and we don't care about
>> mtimes so they've been hardlinked.  That's what I mean when I say the
>> amount of storage will go way up if I rsync directory by directory without -r
>> each directory entry will duplicate a copy of the file

>So, duplicating inode uniqueness/nonuniqueness structure will
>accomplish this, which is why you can use find -printf with %i
>and %p together with sort and uniq to get information sufficient
>to recreate the hardlink structure. :-)

And doing so very explicitly.  Very nitty gritty.

Grazie.  Ciao.

...
> http://vitanuova.loyalty.org/

(taking introductory Italian  BTW)


More information about the bad mailing list