Date: Mon, 18 Jul 2011 11:46:40 +0200 From: Frank Bonnet <f.bonnet@esiee.fr> To: freebsd-questions@freebsd.org Subject: Re: Tools to find "unlegal" files ( videos , music etc ) Message-ID: <4E240100.5070506@esiee.fr> In-Reply-To: <201107180944.p6I9iAJ9022931@mail.r-bonomi.com> References: <201107180944.p6I9iAJ9022931@mail.r-bonomi.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07/18/2011 11:44 AM, Robert Bonomi wrote: >> From owner-freebsd-questions@freebsd.org Mon Jul 18 03:55:59 2011 >> Date: Mon, 18 Jul 2011 10:55:58 +0200 >> From: Frank Bonnet<f.bonnet@esiee.fr> >> To: freebsd-questions@freebsd.org >> Subject: Re: Tools to find "unlegal" files ( videos , music etc ) >> >> On 07/18/2011 10:45 AM, Polytropon wrote: >>> On Mon, 18 Jul 2011 10:38:22 +0200, Frank Bonnet wrote: >>>> On 07/18/2011 10:10 AM, Polytropon wrote: >>>>> On Mon, 18 Jul 2011 09:55:09 +0200, Frank Bonnet wrote: >>>>>> Hello >>>>>> >>>>>> Anyone knows an utility that I could pipe to the "find" command >>>>>> in order to detect video, music, games ... etc files ? >>>>>> >>>>>> I need a tool that could "inspect" inside files because many users >>>>>> rename those filename to "inoffensive" ones :-) >>>>> One way could be to define a list of file extensions that >>>>> commonly matches the content you want to track. Of course, >>>>> the file name does not directly correspond to the content, >>>>> but it often gives a good hint to search for *.wmv, *.flv, >>>>> *.avi, *.mp(e)g, *.mp3, *.wma, *.exe - and of course all >>>>> the variations of the extensions with uppercase letters. >>>>> Also consider *.rar and maybe *.zip for compressed content. >>>>> >>>>> If file extensions have been manipulated (rare case), the >>>>> "file" command can still identify the correct file type. >>>>> >>>>> >>>>> >>>>> >>>> yes thanks , gonna try with the file command >>> You could make a simple script that lists "file" output for >>> all files (just to be sure because of possible suffix renaming) >>> for further inspection. Sometimes, you can also run "strings" >>> for a given file - maybe that can be used to identify typical >>> suspicious string patters for a "strings + grep" combination >>> so less manual identification has to be done. >>> >>> >> yes , my main problem is the huge number of files >> but anyway I'm gonna first check files greater than 500 Mb >> it could be a good start > That's what 'find(1)' is for. Something like (run as superuser): > > find / -exec ./inspect {}>> /tmp/suspects \; > > with './inspect' being a trivial (executable!) shell-script: > > #!/bin/sh > file $1 | awk -f ./inspect.awk > > and './inspect.awk' is: > > {file = $1 ; $1 = "";} > /regex1/ {printf("%s %s\n",file,$0;next); > /regex2/ {printf("%s %s\n",file,$0;next); > /regex3/ {printf("%s %s\n",file,$0;next); > ... ... > ... ... > {next;} > > where 'regex1', 'regex2', etc. are things to select 'files' of interest, > based on what 'file' reports. The awk code strips out the file name, so > that the regex will match only against the 'file' output, with no false- > Positives against a substring in the file name itself. > > See the find(1) manpage for things you can put before the '-exec' param, > to filter by size, etc. You can also limit the search to a specific > part of the filesystem tree, by replacing '/' with the name of the directory > hierarchy you want to search -- e.g. '/home' (if that's where all 'user' > files are) -- although, 'for completeness' (given the 'legal" issues) you > may well want to run it over 'everything'. > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" Thanks a lot for your help !
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E240100.5070506>