From owner-freebsd-sparc64@freebsd.org Mon Apr 11 21:57:38 2016 Return-Path: Delivered-To: freebsd-sparc64@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F241EB0C425 for ; Mon, 11 Apr 2016 21:57:38 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "alchemy.franken.de", Issuer "alchemy.franken.de" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 9D11A1E4D; Mon, 11 Apr 2016 21:57:38 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.15.2/8.15.2/ALCHEMY.FRANKEN.DE) with ESMTPS id u3BLvZUA002405 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 11 Apr 2016 23:57:35 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.15.2/8.15.2/Submit) id u3BLvYXi002404; Mon, 11 Apr 2016 23:57:34 +0200 (CEST) (envelope-from marius) Date: Mon, 11 Apr 2016 23:57:34 +0200 From: Marius Strobl To: Shigeharu TAKENO , gabor@FreeBSD.org Cc: freebsd-sparc64@freebsd.org, Joerg Wunsch Subject: Re: /usr/bin/sort may be incorrect Message-ID: <20160411215734.GA2101@alchemy.franken.de> References: <201603250229.u2P2TVLp003567@pc98tak.iee.niit.ac.jp> <201603310446.u2V4kLGM003303@pc98tak.iee.niit.ac.jp> <20160331072303.GP53011@uriah.heep.sax.de> <201603311122.u2VBMPam007017@pc98tak.iee.niit.ac.jp> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="/NkBOFFp2J2Af1nK" Content-Disposition: inline In-Reply-To: <201603311122.u2VBMPam007017@pc98tak.iee.niit.ac.jp> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (alchemy.franken.de [0.0.0.0]); Mon, 11 Apr 2016 23:57:35 +0200 (CEST) X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2016 21:57:39 -0000 --/NkBOFFp2J2Af1nK Content-Type: multipart/mixed; boundary="qMm9M+Fa2AknHoGS" Content-Disposition: inline --qMm9M+Fa2AknHoGS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 31, 2016 at 08:22:25PM +0900, Shigeharu TAKENO wrote: > shige 03/31 2016 > ---------------- >=20 > Thank you for your reply. >=20 > Joerg Wunsch wrote: >=20 > | struct key_value > | { > | struct bwstring *k; > | struct key_hint hint[]; > | }; > |=20 > | If that works for you, too, I think it would be the preferrable way to > | write it. >=20 > Unfortunately this does not fix the problem. >=20 >=20 > | > The k field of key_value may be overwritten by the hint field > | > in numcoll_impl(), gnumcoll() and monthcoll() (coll.c), and the > | > pointer value of k may change to incorrect value. > |=20 > | Are you saying that something like > |=20 > | struct key_value *kw; > |=20 > | ... > |=20 > | kw->hint[-1] =3D something; > |=20 > | happens? That would certainly be a bug in the code then that ought to > | be fixed, rather than worked around. >=20 > I tested under your suggestion "struct key_hint hint[]", which=20 > behaves as the same of default sort command. >=20 > % ( echo 2 5 8 ; echo 2 6 5 ) | sort -n +0 -1 +1 -2 +2 -3 >=20 >=20 > In key_coll(struct keys_array *ps1, struct keys_array *ps2,=20 > size_t offset) (in coll.c), initial pointer values are the > followings: >=20 > &(ps1->key[0]) =3D 0x40c140f8 > &(ps1->key[1]) =3D 0x40c14100 > &(ps1->key[2]) =3D 0x40c14108 > &(ps2->key[0]) =3D 0x40c14088 > &(ps2->key[1]) =3D 0x40c14090 > &(ps2->key[2]) =3D 0x40c14198 > (the pointer repeat is only 8 byte.) >=20 > ps1->key[0].k =3D 0x40c060e0 > ps1->key[1].k =3D 0x40c060f0 > ps1->key[2].k =3D 0x40c06100 > ps2->key[0].k =3D 0x40c060a0 > ps2->key[1].k =3D 0x40c060b0 > ps2->key[2].k =3D 0x40c060c0 >=20 > key_coll() calls sm->func() =3D numcoll(), and it uses > numcoll_impl(struct key_value *kv1, struct key_value *kv2) with > ps1->key[i] and ps2->key[i]. The function numcoll_impl() uses k > field and hint field of struct key_value. >=20 >=20 > For i =3D 0, the k field pointers of arguments kv1 and kv2 of=20 > numcoll_impl() are correct: >=20 > kv1->k =3D 0x40c060e0, kv2->k =3D 0x40c060a0 >=20 > but the hint field pointers of kv1, kv2 are doughtful: >=20 > &(kv1->hint) =3D 0x40c14100, &(kv2->hint) =3D 0x40c14090 >=20 > which are the same value of &(ps1->key[1]) and &(ps2->key[1]). >=20 >=20 > And for i =3D 1, the k field pointers of arguments kv1 and kv2=20 > become incorrect: >=20 > kv1->k =3D 0x140c060f0, kv2->k =3D 0x140c060b0 >=20 > which are added 0x100000000 to the original pointer value.=20 > The sort command stops where it uses the value. >=20 >=20 > If we use the definition "struct key_hint hint[1]", the repeat > of pointers of ps1->key[i] becomes 32 byte, and incorrect changes=20 > of pointers do not occur. >=20 > &(ps1->key[0]) =3D 0x40c08208 > &(ps1->key[1]) =3D 0x40c08228 > &(ps1->key[2]) =3D 0x40c08248 >=20 AFAICT is sort(1) relying on undefined behavior. It builds up an array of struct keys_array, which in turn has a zero length array (apparently unnecessary) of struct key_value, which in turn has a zero length array of struct key_hint. While sort(1) takes care to allocate space for the latter based on need_hint, it relies on the compiler to determine the correct offsets of the struct key_value elements within the array of struct keys_array. Given that the compiler isn't aware of the actual size - for essentially the same reason zero length arrays also don't contribute to the sizeof() of such a struct - of struct key_value, that doesn't work, i. e. struct key_hint isn't accounted for in the offset calculation leading to the corruption described above. I'm not exactly sure why this bug apparently doesn't affect x86 but I think that's due to its little-endian addressing avoiding corruption in practice; that's somewhat hard to think through given that struct bwstring and the use thereof are a similarly =2E.. interesting constructs. Anyway, using an accessor function for determining the correct offsets of struct key_value elements within the array of struct keys_array as in the attached patch fixes the problem for me. However, I suspect that the space savings by only allocating memory for struct key_hint when needed don't really justify all the hackery involved in that approach. At least, changing struct key_hint to be a fixed member of struct key_value and removing all parts hanging off from need_hint would make the code a lot cleaner and readable. Gabor, what do you think? Marius --qMm9M+Fa2AknHoGS Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="sort.diff" Content-Transfer-Encoding: quoted-printable Index: coll.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- coll.c (revision 297803) +++ coll.c (working copy) @@ -105,14 +105,29 @@ { =20 if (ka) { - for (size_t i =3D 0; i < keys_num; ++i) - if (ka->key[i].k && ka->key[i].k !=3D s) - bwsfree(ka->key[i].k); + for (size_t i =3D 0; i < keys_num; ++i) { + const struct key_value *kv; + + kv =3D get_key_from_keys_array(ka, i); + if (kv->k && kv->k !=3D s) + bwsfree(kv->k); + } memset(ka, 0, keys_array_size()); } } =20 /* + * Get pointer to a key value in the keys set + */ +struct key_value * +get_key_from_keys_array(struct keys_array *ka, size_t ind) +{ + + return ((struct key_value *)((caddr_t)&(ka->key[ind]) + + (key_hint_size() * ind))); +} + +/* * Set value of a key in the keys set */ void @@ -122,7 +137,7 @@ if (ka && keys_num > ind) { struct key_value *kv; =20 - kv =3D &(ka->key[ind]); + kv =3D get_key_from_keys_array(ka, ind); =20 if (kv->k && kv->k !=3D s) bwsfree(kv->k); @@ -156,9 +171,9 @@ if (si->str) ret +=3D bws_memsize(si->str); for (size_t i =3D 0; i < keys_num; ++i) { - struct key_value *kv; + const struct key_value *kv; =20 - kv =3D &(si->ka.key[i]); + kv =3D get_key_from_keys_array(&si->ka, i); =20 if (kv->k !=3D si->str) ret +=3D bws_memsize(kv->k); @@ -475,16 +490,19 @@ int key_coll(struct keys_array *ps1, struct keys_array *ps2, size_t offset) { + struct key_value *kv1, *kv2; struct sort_mods *sm; int res =3D 0; =20 for (size_t i =3D 0; i < keys_num; ++i) { + kv1 =3D get_key_from_keys_array(ps1, i); + kv2 =3D get_key_from_keys_array(ps2, i); sm =3D &(keys[i].sm); =20 if (sm->rflag) - res =3D sm->func(&(ps2->key[i]), &(ps1->key[i]), offset); + res =3D sm->func(kv2, kv1, offset); else - res =3D sm->func(&(ps1->key[i]), &(ps2->key[i]), offset); + res =3D sm->func(kv1, kv2, offset); =20 if (res) break; Index: coll.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- coll.h (revision 297803) +++ coll.h (working copy) @@ -91,7 +91,7 @@ { struct bwstring *k; /* key string */ struct key_hint hint[0]; /* key sort hint */ -}; +} __packed; =20 /* * Set of keys container object. @@ -146,6 +146,7 @@ =20 struct keys_array *keys_array_alloc(void); size_t keys_array_size(void); +struct key_value *get_key_from_keys_array(struct keys_array *ka, size_t in= d); void set_key_on_keys_array(struct keys_array *ka, struct bwstring *s, size= _t ind); void clean_keys_array(const struct bwstring *s, struct keys_array *ka); =20 Index: radixsort.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- radixsort.c (revision 297803) +++ radixsort.c (working copy) @@ -243,9 +243,11 @@ static inline int get_wc_index(struct sort_list_item *sli, size_t level) { + const struct key_value *kv; const struct bwstring *bws; =20 - bws =3D sli->ka.key[0].k; + kv =3D get_key_from_keys_array(&sli->ka, 0); + bws =3D kv->k; =20 if ((BWSLEN(bws) > level)) return (unsigned char) BWS_GET(bws,level); --qMm9M+Fa2AknHoGS-- --/NkBOFFp2J2Af1nK Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJXDB3JXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ1M0Q5QjQzNTVGOTU5ODBGQzVENzZCMDIy MEI3MERFMTNGMUQxRTRGAAoJECC3DeE/HR5PQYgP/jfxi7sCgIH7Nnn2ylnM3Ixx lLAYZxlqsdMp/cUAtzP1MMWtYjdobhgiXDY9+289Azf2I3S8bC6lh6STqRsP2dp3 Ouh/YkSc0sG+TeuBo28wv7EnGk4ubEDyhMLBZg/J6wouTo4QI2ioj3Lq77AJHdmo 4+UO5OC5npBRM37ICOirhSdqbEkjIdYgvdY1ZgZ2TpY8u0P/oJ0Prra1imBJ6dpL nTC6ZykAyoNvwOc9mWxjXmMQcCIDYdeljCjtSSZhGXqXiqy+FDiFVvKtDWngPcRo sfpVEVcgmjto7Tn7QncVAz7n9JYPGyeZaxLFyGGn3nDTpR8Eu4X9okpVcbQi67Hq +/3ZoB8Hm3burBENNo0h860PUymC+8an24iHfmZ/S1nOoVkgEvCddrFSCZNbtVdR 04CeBzE+Qlyx0s+0YXovfuvZZ1Mzz4GAo8z6Psj3FgJ0Nij4y0P9cnP04B1phYFA MkmOgdJu0br3iz4RkAzdgL+TN/6z9nSJUBpnADukWTizglf0PhgCj7ztUgH7A05H YKD5sB5bYp3dhrTM1kGqawDljq3u9dqIEoQfglpjiStBePAwOo45RTrx4Zh16b62 toqrwn4hZnXlWkwPUpU4alPM4/jJYACqOhoAoR/qHP0d9Vc94c6kSIUoJKeDtDRV vlpPci8CMMdMg9HehT8z =umiB -----END PGP SIGNATURE----- --/NkBOFFp2J2Af1nK--