Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Dec 2014 09:39:24 -0800
From:      Kevin Oberman <rkoberman@gmail.com>
To:        "Julian H. Stacey" <jhs@berklix.com>
Cc:        "ports@FreeBSD.org" <ports@freebsd.org>
Subject:   Re: Any tool known to demangle special chars in MS tree path names ?
Message-ID:  <CAN6yY1vsSnreUJnZ4qh9t=ib7scrpiB9dEJNW8vFmMF7Vkh3AA@mail.gmail.com>
In-Reply-To: <201412081619.sB8GIpas073368@fire.js.berklix.net>
References:  <201412081619.sB8GIpas073368@fire.js.berklix.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 8, 2014 at 8:18 AM, Julian H. Stacey <jhs@berklix.com> wrote:

> Hi ports@
> Is there a utility in ports/ to automatically clean disgusting path
> names in big trees one acquires from Microsoft users ?
>
> Trees with in both directories & filenames, masses of meta characters
> such as as ' ` . * | \ & space (& accents & high parity bit national
> extensions eg german umlauts) etc.
>
> It's known:
>         tr exists,
>         find has -X
>         xargs has -0
>         One can delimit path names on command line,
>         One could reinvent the wheel, writing & improving a scarey
>         shell script (that would probably break while debugging &
>         trash some data)
>
> But I'm looking for something better, that probably already exists, that
> will permanently clean trees, to no longer need to delimit various
> utilities against nasty names, each time tree is accessed.
>
> Either:
>         An existing tool (preferably C) one can run automatically
>         to forcibly rename dirs & files in a Unix friendly manner ?
>
> Or if none exists I'll write a C program to run from find.
>         If so, I'll probably just map nasty chars inc. any high bit
>         parity (accents umlauts & other noise) to eg "0xAB" expansion.
>         ( I dont care about national accents & char sets. )
>
> PS Assume files are big, copying not viable, rename/mv via link & unlink
> best.
>
> Any tools (URLs) known ?  Or should I write my own ?
>
> Cheers,
> Julian
> --
> Julian Stacey, BSD Linux Unix'78 C Sys Eng Consultant Munich
> http://berklix.com
>  Indent previous with "> ".  Interleave reply paragraphs like a play
> script.
>  Send plain text, not quoted-printable, HTML, base64, or
> multipart/alternative.
>
>
Around  decade or so ago I was looking for a tool to clean up all of the
Microsoft "special" characters in web pages. I found "demoroniser", a
public domain tool written by "John Walker". As is, it does not meet your
needs, but  goes a long way in the right direction. It expects to work with
files, not directory trees, but modifying to would be quite trivial, mostly
wrapping it in a recursive loop that uses opendir, readdir, and closedir to
walk the tree and feed it the directory names. (No, I am not volunteering.)

Most notably, it is written in Perl, not C. Perl is now very of of fashion.

In any case, it is available at:
http://www.fourmilab.ch/webtools/demoroniser/
--
Kevin Oberman, Network Engineer, Retired



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1vsSnreUJnZ4qh9t=ib7scrpiB9dEJNW8vFmMF7Vkh3AA>