rsync alternative? (too many files)

Thu Mar 10 21:58:59 PST 2011

Hello All,

Might I suggest "pax" instead of "cpio"?  It's an ANSI standard; so
it's in Windows and many operating systems as well.  It's basically a
re-write of "cpio" and seems to have taken some stuff from tar as
well.

Regards,
Michael Cheselka
650-488-4820

On Tue, Mar 8, 2011 at 12:36, Tony Godshall <togo at of.net> wrote:
> Best Regards.
> This is unedited.
> P-)
>
>
>
> On Mon, Mar 7, 2011 at 18:49, Seth David Schoen <schoen at loyalty.org> wrote:
>> Tony Godshall writes:
>>
>>> On Sun, Mar 6, 2011 at 15:19, Seth David Schoen <schoen at loyalty.org> wrote:
>>>
>>> [Seth]
>>> > If you're sure that the filenames don't contain tabs, you can...
>>>
>>> Hi Seth.
>>>
>>> I must not have expressed myself clearly.
>>>
>>> There are excessive unique files, not duplicate entries in a list of files.
>>>
>>> The files have already been deduplicated in the sense that entries to
>>> files containing the same content are hardlinks.
>>>
>>> If I were to copy the files to new media without retaining the
>>> hardlinks, they would take up way more space.
>>
>> Hi Tony,
>>
>> My approach copies "unique" files in the sense of inode uniqueness
>> (not content uniqueness or name uniqueness) so I think it does
>> address what you want.  I was using find to print inode numbers
>> as the basis of a solution that preserves hard link structure,
>> since two files are hard links if and only if they have the same
>> inode number.
>>
>> The only reason that my advice worries about whitespace in the
>> filenames is that it will confuse some of the programs that are
>> being asked to process those filenames, not that the filenames
>> themselves are being used to identify or distinguish "unique"
>> files.
>
> Ah, I misunderstood.
>
> Will look at your strategy of explicitly detecting/using inodes further.
>
>>> Yes, what I need is for the destination to have the same hardlink structure
>>> as the source- it's a file by file backup of a bunch of machines and many
>>> of the files were identical and will never be written and we don't care about
>>> mtimes so they've been hardlinked.  That's what I mean when I say the
>>> amount of storage will go way up if I rsync directory by directory without -r
>>> each directory entry will duplicate a copy of the file
>
>>So, duplicating inode uniqueness/nonuniqueness structure will
>>accomplish this, which is why you can use find -printf with %i
>>and %p together with sort and uniq to get information sufficient
>>to recreate the hardlink structure. :-)
>
> And doing so very explicitly.  Very nitty gritty.
>
> Grazie.  Ciao.
>
> ...
>> http://vitanuova.loyalty.org/
>
> (taking introductory Italian  BTW)
> --
> bad mailing list
> bad at bad.debian.net
> http://bad.debian.net/cgi-bin/mailman/listinfo/bad
>