Date: Thu, 31 Jul 2014 14:09:37 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Phil Shafer <phil@juniper.net> Cc: sjg@freebsd.org, arch@freebsd.org, marcel@freebsd.org Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML Message-ID: <20140731210937.GV43962@funkthat.com> In-Reply-To: <201407311839.s6VIdlMK096434@idle.juniper.net> References: <20140731175547.GO43962@funkthat.com> <201407311839.s6VIdlMK096434@idle.juniper.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Phil Shafer wrote this message on Thu, Jul 31, 2014 at 14:39 -0400: > John-Mark Gurney writes: > >Return an error? printf can return an error, yet most people don't > >check it.. so no real difference in API/bugs... > > My concern is emitting half a string, where the half we don't emit > is something important. I don't want to make the opposite of an > injection attack, where arranging some daemon to call xo_emit with > a broken UTF-8 string allows an evil-doer to fix their evil content > into the other half of the string. > > I'm escaping XML, JSON, and HTML content already, so the simplest > scheme is to: > > a) UTF-8 check the format string; > if it fails, nothing is emitted > b) for each format descriptor, check the content generared; > if it fails, nothing is emitted from the xo_emit call > anything already generated is discarded > > Simple and easy. Seem reasonable? The other option would be to > discard only that specific format descriptor or only that field > description. > > xo_emit("{:good/%d}{:bad/%d%s}{:ugly}", 0, 55, "\xff\x01\xff", "cat"); > > Does the "<ugly>cat</ugly>" get emitted? Is "<bad>55</bad>" emitted? > > If "ugly" was <run-this-command-as-user>phil</...>, and the bogus > string blocked the generation of that vital bit of info, life could > be bad. I agree... > Unfortunately, even this isn't a simple fix for "w", which wants > call wcsftime() to get wide values for month and day-of-the-week > names. Does wcsrtombs() convert this to UTF-8? Is there a locale > for UTF-8? Well, from my understanding there can't be a "locale" that is UTF-8 as a locale contains more than just character encoding... It also includes month/day names, sorting, etc... I think you can get a C locale (the default) w/ UTF-8 by setting the correct environment variables, but I don't know them well enough to say... Should we add a locale that does this? There is UTF-8 in /usr/share/locale, but if you set LANG to it, things don't work.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140731210937.GV43962>