Date: Sat, 24 Mar 2007 23:47:21 -0700 From: James Long <list@museum.rain.com> To: freebsd-questions@freebsd.org, Wojciech Puchar <wojtek@tensor.gdynia.pl> Subject: Re: mkisofs,cd9660 and hard links Message-ID: <20070325064721.GA26420@ns.umpquanet.com> In-Reply-To: <20070324191600.0673616A4CC@hub.freebsd.org> References: <20070324191600.0673616A4CC@hub.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Sat, 24 Mar 2007 20:15:50 +0100 (CET) > From: Wojciech Puchar <wojtek@tensor.gdynia.pl> > Subject: mkisofs,cd9660 and hard links > To: freebsd-questions@freebsd.org > Message-ID: <20070324201201.D6725@chylonia.3miasto.net> > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > i did copy of small server (taking about 3GB space) to DVD with growisofs > -R and using --exclude to not copy /dev etc.. > > worked fine. > > and recovered fine, but taking much more space, because all hardlinks are > now separate files. > > it looks like cd9660 filesystem doesn't "see" hardlinked files as > hardlinked, but as separate ones. > > is there any program to fix it like comparing all very similar files on > disk and hardlinking them? My brief analysis of this is that there's only so much that can be done, at least programmatically. Your DVD copy does not contain sufficient information to differentiate between hardlinks, apparently, and may not allow you to determine where softlinks used to exist, either. And then there may be some files that were simply two copies of the same content, and should not be construed as linked files. That said, I have done similar tasks (like deleting duplicate copies of files stored on two machines) by writing a shell script to calculate a checksum of each file on disk, then sorting the output based on the checksum. Where you find duplicate checksum values, you likely have files that could be hard-linked to each other. It would require some manual vetting of the identified duplicates to determine whether the files are supposed to be hardlinks, symlinks or simply two discrete files with the same content. This can be time-consuming for large filesystems, but for 3 Gigs, you can just start it and walk away until it's done. This example is rather clumsy, and if someone can show me how to do this without having to pipe the output into sh, I'd be edified to know that. On the other hand, I often like to construct xarg lines like this so I can see and inspect the commands that will be executed, before actually committing to piping it into the shell. find / -type f -print0 | xargs -0 -Ixx -n1 echo echo \$\(sha256 -q \"xx\"\) \"xx\" | sh > md5-list.out Then use awk/sort/uniq/grep to find duplicate checksums, and determine which files have identical checksum values. Manually examine those files to determine whether they should be hardlinks, symlinks, or remain as separate files. Note that this necessarily excludes directories, which could be symlinks of other directories, such as /etc/namedb vs. /var/named/etc/namedb. Jim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070325064721.GA26420>