From owner-freebsd-questions@FreeBSD.ORG Tue Jul 19 09:53:44 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01A651065670 for ; Tue, 19 Jul 2011 09:53:44 +0000 (UTC) (envelope-from lars@larseighner.com) Received: from imta-38.everyone.net (imta-35.everyone.net [216.200.145.35]) by mx1.freebsd.org (Postfix) with ESMTP id D42598FC13 for ; Tue, 19 Jul 2011 09:53:43 +0000 (UTC) Received: from pps.filterd (omta001 [127.0.0.1]) by imta-38.everyone.net (8.14.4/8.14.4) with SMTP id p6J9qir5009738; Tue, 19 Jul 2011 02:53:31 -0700 X-Eon-Dm: dm0219 Received: by dm0219.mta.everyone.net (EON-AUTHRELAY2 - cde8da1a) id dm0219.4e1e18f0.12377c; Tue, 19 Jul 2011 02:53:28 -0700 X-Eon-Sig: AQNtR3BOJVQYwPhecgIAAAAE,accae67707e1040eca2f96fdf51b5d27 Received: by larseighner.com (nbSMTP-1.00) for uid 1001 lars@larseighner.com; Tue, 19 Jul 2011 04:51:20 -0500 (CDT) Date: Tue, 19 Jul 2011 04:51:14 -0500 (CDT) From: Lars Eighner X-X-Sender: lars@noos.basicisp.net To: "C. P. Ghost" In-Reply-To: Message-ID: References: <201107190549.p6J5n6sP028960@mail.r-bonomi.com> <4E252119.3030208@esiee.fr> <89EB5E14-AA8E-4265-9C5D-22641ECC1C37@my.gd> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.4.6813, 1.0.211, 0.0.0000 definitions=2011-07-18_08:2011-07-18, 2011-07-18, 1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=2 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=6.0.2-1012030000 definitions=main-1107190020 Cc: Frank Bonnet , Damien Fleuriot , "freebsd-questions@freebsd.org" Subject: Re: Tools to find "unlegal" files ( videos , music etc ) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2011 09:53:44 -0000 On Tue, 19 Jul 2011, C. P. Ghost wrote: > Speaking with my university sysadmin hat on: you're NOT allowed to > peek inside personal files of your users, UNLESS the user has waived > his/her rights to privacy by explicitly agreeing to the TOS and > there's legal language in the TOS that allows staff to inspect files > (and then staff needs to abide by those rules in a very strict and > cautious manner). So unless the TOS are very explicit, a sysadmin or > an IT head can get in deep trouble w.r.t. privacy laws. Yes, but I am not an expert on privacy laws in France, and I suspect you are not either. Whether examining the magic number (first four bytes) of a file constitutes a breach of privacy is a matter for legal advice applicable to the particular jurisdiction. You certainly can look at the external package: file size and name. >> You may want to look for files that are unusually large. >> They could possibly be ISOs, dvdrips, HD movie dumps... > > Not to forget encrypted RAR files (which btw. could contain anything, > including legitimate content, so be careful here). > >> We have the same problem here with users sharing movies on the file >> servers, and what makes it worse is some of their movie files are >> legit because they're, for example, official trailers that are >> reworked and redistributed to our customers. >> >> You won't win this, tell your boss it can not be done. > > What can technically be done is that the copyright owner provides a > list of hashes for his files, and requests that you traverse your > filesystems, looking for files that match those hashes. AND, even > then, all you can do is flag the files, and you'll have to check with > the user that he/she doesn't own a license permitting him/her to own > that file! You cannot generate a hash without at a certain automated level opening the file. If you can do that, couldn't you generate a hash of the first four bytes to match with hashes of known magic numbers? If you can "look" at the whole file, surely you can "look" at just the first four bytes. Of course software cannot determine legal issues, such as whether works are properly licensed or are pornographic according to local legislation, etc. > However, even that isn't foolproof: nothing prevents a user from > flipping a bit or two, rescaling, resampling, splitting the files into > multiple files in a non-obvious manner, adding random bytes at the end > etc...: the result would still be infringing, but can't be detected > automatically (at least not in a reasonable amount of time). This is a bit like security. There is no absolute that can be achieved. You don't have to be smarter than God, you just have to be smarter than the users. Now the whole point of infringing schemes is that most dumb users have to be able to use the files they download. They can reasonablely do things like rename the files or pass them through a commonly available decoder. No point in trying to "file share" if users have to be the NSA to play the music. You can scan (where legal) for the common stuff. You can't find stuff encoded by Dr. Evil Genius Hacker -- but neither can the party claiming to be infringed and neither can Suzie Shebop who just wants free music. -- Lars Eighner http://www.larseighner.com/index.html 8800 N IH35 APT 1191 AUSTIN TX 78753-5266