Date: Tue, 28 Feb 2017 13:27:52 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Conrad Meyer <cem@freebsd.org> Cc: Bruce Evans <brde@optusnet.com.au>, Konstantin Belousov <kostikbel@gmail.com>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r313006 - in head: sys/conf sys/libkern sys/libkern/x86 sys/sys tests/sys/kern Message-ID: <20170228121335.Q2733@besplex.bde.org> In-Reply-To: <CAG6CVpV8fqMd82hjYoyDfO3f5P-x6%2B0OJDoQHtqXqY_tfWtZsA@mail.gmail.com> References: <201701310326.v0V3QW30024375@repo.freebsd.org> <20170202184819.GP2092@kib.kiev.ua> <20170203062806.A2690@besplex.bde.org> <CAG6CVpV8fqMd82hjYoyDfO3f5P-x6%2B0OJDoQHtqXqY_tfWtZsA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-952914049-1488248872=:2733 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Mon, 27 Feb 2017, Conrad Meyer wrote: > On Thu, Feb 2, 2017 at 12:29 PM, Bruce Evans <brde@optusnet.com.au> wrote: >> I've almost finished fixing and optimizing this. I didn't manage to fix >> all the compiler pessimizations, but the result is within 5% of optimal >> for buffers larger than a few K. > > Did you ever get to a final patch that you are satisfied with? It > would be good to get this improvement into the tree. I'm happy with this version (attached and partly enclosed). You need to test it in the kernel and commit it (I on;y did simple correctness tests in userland). X Index: conf/files.amd64 X =================================================================== X --- conf/files.amd64 (revision 314363) X +++ conf/files.amd64 (working copy) X @@ -545,6 +545,9 @@ X isa/vga_isa.c optional vga X kern/kern_clocksource.c standard X kern/link_elf_obj.c standard X +libkern/x86/crc32_sse42.c standard X +libkern/memmove.c standard X +libkern/memset.c standard Also fix some nearby disorder. X ... X Index: libkern/x86/crc32_sse42.c X =================================================================== X --- libkern/x86/crc32_sse42.c (revision 314363) X +++ libkern/x86/crc32_sse42.c (working copy) X @@ -31,15 +31,41 @@ X */ X #ifdef USERSPACE_TESTING X #include <stdint.h> X +#include <stdlib.h> X #else X #include <sys/param.h> X +#include <sys/systm.h> X #include <sys/kernel.h> X -#include <sys/libkern.h> X -#include <sys/systm.h> X #endif Also fix minor #include errors. X X -#include <nmmintrin.h> X +static __inline uint32_t X +_mm_crc32_u8(uint32_t x, uint8_t y) X +{ X + /* X + * clang (at least 3.9.[0-1]) pessimizes "rm" (y) and "m" (y) X + * significantly and "r" (y) a lot by copying y to a different X + * local variable (on the stack or in a register), so only use X + * the latter. This costs a register and an instruction but X + * not a uop. X + */ X + __asm("crc32b %1,%0" : "+r" (x) : "r" (y)); X + return (x); X +} Using intrinsics avoids the silly copying via the stack, and allows more unrolling. Old gcc does more unrolling with just asms. Unrolling is almost useless (some details below). X @@ -47,12 +73,14 @@ X * Block sizes for three-way parallel crc computation. LONG and SHORT must X * both be powers of two. X */ X -#define LONG 8192 X -#define SHORT 256 X +#define LONG 128 X +#define SHORT 64 These are aggressively low. Note that small buffers aren't handled very well. SHORT = 64 means that a buffer of size 3 * 64 = 192 is handled entirely by the "SHORT" loop. 192 is not very small, but any smaller than that and overheads for adjustment at the end of the loop are too large for the "SHORT" loop to be worth doing. Almost any value of LONG larger than 128 works OK now, but if LONG is large then it gives too much work for the "SHORT" loop, since normal buffer sizes are not a multiple of 3. E.g., with the old LONG and SHORT, a buffer of size 128 was decomposed as 5 * 24K (done almost optimally by the "LONG" loop) + 10 * 768 (done a bit less optimally by the "SHORT" loop) + 10 * 768 + 64 * 8 (done pessimally). I didn't get around to ifdefing this for i386. On i386, the loops take twice as many crc32 instructions for a given byte count, so the timing is satisfed by a byte count half as large, so SHORT and LONG can be reduced by a factor of 2 to give faster handling for small buffers without affecting the speed for large buffers significantly. X X /* Tables for hardware crc that shift a crc by LONG and SHORT zeros. */ X static uint32_t crc32c_long[4][256]; X +static uint32_t crc32c_2long[4][256]; X static uint32_t crc32c_short[4][256]; X +static uint32_t crc32c_2short[4][256]; I didn't get around to updating the comment. 2long shifts by 2*LONG zeros, etc. Shifts by 3N are done by adding shifts by 1N and 2N in parallel. I couldn't get the direct 3N shift to run any faster. X @@ -190,7 +220,11 @@ X const size_t align = 4; X #endif X const unsigned char *next, *end; X - uint64_t crc0, crc1, crc2; /* need to be 64 bits for crc32q */ X +#ifdef __amd64__ X + uint64_t crc0, crc1, crc2; X +#else X + uint32_t crc0, crc1, crc2; X +#endif X X next = buf; X crc0 = crc; 64 bits of course isn't needed for i386. It isn't needed for amd64 either. I think the crc32 instruction zeros the top 32 bits so they can be ignored. However, when I modified the asm to return 32 bits to tell the compiler about this (which the intrinsic wouldn't be able to do) and used 32 bits here, this was just slightly slower. For some intermediate crc calculations, only 32 bits are used, and the compiler can see this. clang on amd64 optimizes this better than gcc, starting with all the intermediate crc variables declared as 64 bits. gcc generates worst code when some of the intermediates are declared as 32 bits. So keep using 64 bits on amd64 here. X @@ -202,6 +236,7 @@ X len--; X } X X +#if LONG > SHORT X /* X * Compute the crc on sets of LONG*3 bytes, executing three independent X * crc instructions, each on LONG bytes -- this is optimized for the LONG = SHORT = 64 works OK on Haswell, but I suspect that slower CPUs benefit from larger values and I want to keep SHORT as small as possible for the fastest CPUs. X @@ -209,6 +244,7 @@ X * have a throughput of one crc per cycle, but a latency of three X * cycles. X */ X + crc = 0; X while (len >= LONG * 3) { X crc1 = 0; X crc2 = 0; X @@ -229,16 +265,64 @@ X #endif X next += align; X } while (next < end); X - crc0 = crc32c_shift(crc32c_long, crc0) ^ crc1; X - crc0 = crc32c_shift(crc32c_long, crc0) ^ crc2; X + /*- X + * Update the crc. Try to do it in parallel with the inner X + * loop. 'crc' is used to accumulate crc0 and crc1 X + * produced by the inner loop so that the next iteration X + * of the loop doesn't depend on anything except crc2. X + * X + * The full expression for the update is: X + * crc = S*S*S*crc + S*S*crc0 + S*crc1 X + * where the terms are polynomials modulo the CRC polynomial. X + * We regroup this subtly as: X + * crc = S*S * (S*crc + crc0) + S*crc1. X + * This has an extra dependency which reduces possible X + * parallelism for the expression, but it turns out to be X + * best to intentionally delay evaluation of this expression X + * so that it competes less with the inner loop. X + * X + * We also intentionally reduce parallelism by feedng back X + * crc2 to the inner loop as crc0 instead of accumulating X + * it in crc. This synchronizes the loop with crc update. X + * CPU and/or compiler schedulers produced bad order without X + * this. X + * X + * Shifts take about 12 cycles each, so 3 here with 2 X + * parallelizable take about 24 cycles and the crc update X + * takes slightly longer. 8 dependent crc32 instructions X + * can run in 24 cycles, so the 3-way blocking is worse X + * than useless for sizes less than 8 * <word size> = 64 X + * on amd64. In practice, SHORT = 32 confirms these X + * timing calculations by giving a small improvement X + * starting at size 96. Then the inner loop takes about X + * 12 cycles and the crc update about 24, but these are X + * partly in parallel so the total time is less than the X + * 36 cycles that 12 dependent crc32 instructions would X + * take. X + * X + * To have a chance of completely hiding the overhead for X + * the crc update, the inner loop must take considerably X + * longer than 24 cycles. LONG = 64 makes the inner loop X + * take about 24 cycles, so is not quite large enough. X + * LONG = 128 works OK. Unhideable overheads are about X + * 12 cycles per inner loop. All assuming timing like X + * Haswell. X + */ X + crc = crc32c_shift(crc32c_long, crc) ^ crc0; X + crc1 = crc32c_shift(crc32c_long, crc1); X + crc = crc32c_shift(crc32c_2long, crc) ^ crc1; X + crc0 = crc2; X next += LONG * 2; X len -= LONG * 3; X } X + crc0 ^= crc; X +#endif /* LONG > SHORT */ X X /* X * Do the same thing, but now on SHORT*3 blocks for the remaining data X * less than a LONG*3 block X */ X + crc = 0; X while (len >= SHORT * 3) { X crc1 = 0; X crc2 = 0; See the comment. X @@ -259,11 +343,14 @@ X #endif X next += align; When SHORT is about what it is (64), on amd64 the "SHORT" loop has 24 crc32 instructions and compilers sometimes to unroll them all. This makes little difference. X } while (next < end); X - crc0 = crc32c_shift(crc32c_short, crc0) ^ crc1; X - crc0 = crc32c_shift(crc32c_short, crc0) ^ crc2; X + crc = crc32c_shift(crc32c_short, crc) ^ crc0; X + crc1 = crc32c_shift(crc32c_short, crc1); X + crc = crc32c_shift(crc32c_2short, crc) ^ crc1; X + crc0 = crc2; X next += SHORT * 2; X len -= SHORT * 3; X } X + crc0 ^= crc; The change is perhaps easier to understand without looking at the comment. We accumulate changes in crc instead of into crc0, so that the next iteration can start without waiting for accumulation. This requires more shifting steps, and we try to arrange these optimally. X X /* Compute the crc on the remaining bytes at native word size. */ X end = next + (len - (len & (align - 1))); The adjustments for alignment are slow if they are not null, and wasteful if they are null, but have relatively little cost for the non-small buffers that are handled well, so I didn't remove them. Bruce --0-952914049-1488248872=:2733 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="crc32.dif" Content-Transfer-Encoding: BASE64 Content-ID: <20170228132752.J2733@besplex.bde.org> Content-Description: Content-Disposition: attachment; filename="crc32.dif" SW5kZXg6IGNvbmYvZmlsZXMuYW1kNjQNCj09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT0NCi0tLSBjb25mL2ZpbGVzLmFtZDY0CShyZXZpc2lvbiAzMTQzNjMpDQor KysgY29uZi9maWxlcy5hbWQ2NAkod29ya2luZyBjb3B5KQ0KQEAgLTU0NSw2 ICs1NDUsOSBAQA0KIGlzYS92Z2FfaXNhLmMJCQlvcHRpb25hbAl2Z2ENCiBr ZXJuL2tlcm5fY2xvY2tzb3VyY2UuYwkJc3RhbmRhcmQNCiBrZXJuL2xpbmtf ZWxmX29iai5jCQlzdGFuZGFyZA0KK2xpYmtlcm4veDg2L2NyYzMyX3NzZTQy LmMJc3RhbmRhcmQNCitsaWJrZXJuL21lbW1vdmUuYwkJc3RhbmRhcmQNCits aWJrZXJuL21lbXNldC5jCQlzdGFuZGFyZA0KICMNCiAjIElBMzIgYmluYXJ5 IHN1cHBvcnQNCiAjDQpAQCAtNjAyLDE0ICs2MDUsNiBAQA0KIGNvbXBhdC9u ZGlzL3N1YnJfdXNiZC5jCQlvcHRpb25hbAluZGlzYXBpIHBjaQ0KIGNvbXBh dC9uZGlzL3dpbng2NF93cmFwLlMJb3B0aW9uYWwJbmRpc2FwaSBwY2kNCiAj DQotY3JjMzJfc3NlNDIubwkJCXN0YW5kYXJkCQkJCVwNCi0JZGVwZW5kZW5j eQkiJFMvbGlia2Vybi94ODYvY3JjMzJfc3NlNDIuYyIJCQlcDQotCWNvbXBp bGUtd2l0aAkiJHtDQ30gLWMgJHtDRkxBR1M6Ti1ub3N0ZGluY30gJHtXRVJS T1J9ICR7UFJPRn0gLW1zc2U0ICR7LklNUFNSQ30iIFwNCi0Jbm8taW1wbGlj aXQtcnVsZQkJCQkJCVwNCi0JY2xlYW4JCSJjcmMzMl9zc2U0Mi5vIg0KLWxp Ymtlcm4vbWVtbW92ZS5jCQlzdGFuZGFyZA0KLWxpYmtlcm4vbWVtc2V0LmMJ CXN0YW5kYXJkDQotIw0KICMgeDg2IHJlYWwgbW9kZSBCSU9TIGVtdWxhdG9y LCByZXF1aXJlZCBieSBkcG1zL3BjaS92ZXNhDQogIw0KIGNvbXBhdC94ODZi aW9zL3g4NmJpb3MuYwlvcHRpb25hbCB4ODZiaW9zIHwgZHBtcyB8IHBjaSB8 IHZlc2ENCkluZGV4OiBjb25mL2ZpbGVzLmkzODYNCj09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT0NCi0tLSBjb25mL2ZpbGVzLmkzODYJKHJldmlzaW9uIDMxNDM2 MykNCisrKyBjb25mL2ZpbGVzLmkzODYJKHdvcmtpbmcgY29weSkNCkBAIC01 NTcsMTEgKzU1Nyw2IEBADQoga2Vybi9pbWdhY3RfYW91dC5jCQlvcHRpb25h bCBjb21wYXRfYW91dA0KIGtlcm4vaW1nYWN0X2d6aXAuYwkJb3B0aW9uYWwg Z3ppcA0KIGtlcm4vc3Vicl9zZmJ1Zi5jCQlzdGFuZGFyZA0KLWNyYzMyX3Nz ZTQyLm8JCQlzdGFuZGFyZAkJCQlcDQotCWRlcGVuZGVuY3kJIiRTL2xpYmtl cm4veDg2L2NyYzMyX3NzZTQyLmMiCQkJXA0KLQljb21waWxlLXdpdGgJIiR7 Q0N9IC1jICR7Q0ZMQUdTOk4tbm9zdGRpbmN9ICR7V0VSUk9SfSAke1BST0Z9 IC1tc3NlNCAkey5JTVBTUkN9IiBcDQotCW5vLWltcGxpY2l0LXJ1bGUJCQkJ CQlcDQotCWNsZWFuCQkiY3JjMzJfc3NlNDIubyINCiBsaWJrZXJuL2RpdmRp My5jCQlzdGFuZGFyZA0KIGxpYmtlcm4vZmZzbGwuYwkJCXN0YW5kYXJkDQog bGlia2Vybi9mbHNsbC5jCQkJc3RhbmRhcmQNCkBAIC01NzIsNiArNTY3LDcg QEANCiBsaWJrZXJuL3VjbXBkaTIuYwkJc3RhbmRhcmQNCiBsaWJrZXJuL3Vk aXZkaTMuYwkJc3RhbmRhcmQNCiBsaWJrZXJuL3Vtb2RkaTMuYwkJc3RhbmRh cmQNCitsaWJrZXJuL3g4Ni9jcmMzMl9zc2U0Mi5jCXN0YW5kYXJkDQogaTM4 Ni94Ym94L3hib3guYwkJb3B0aW9uYWwgeGJveA0KIGkzODYveGJveC94Ym94 ZmIuYwkJb3B0aW9uYWwgeGJveGZiDQogZGV2L2ZiL2Jvb3RfZm9udC5jCQlv cHRpb25hbCB4Ym94ZmINCkluZGV4OiBsaWJrZXJuL3g4Ni9jcmMzMl9zc2U0 Mi5jDQo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09DQotLS0gbGlia2Vybi94ODYv Y3JjMzJfc3NlNDIuYwkocmV2aXNpb24gMzE0MzYzKQ0KKysrIGxpYmtlcm4v eDg2L2NyYzMyX3NzZTQyLmMJKHdvcmtpbmcgY29weSkNCkBAIC0zMSwxNSAr MzEsNDEgQEANCiAgKi8NCiAjaWZkZWYgVVNFUlNQQUNFX1RFU1RJTkcNCiAj aW5jbHVkZSA8c3RkaW50Lmg+DQorI2luY2x1ZGUgPHN0ZGxpYi5oPg0KICNl bHNlDQogI2luY2x1ZGUgPHN5cy9wYXJhbS5oPg0KKyNpbmNsdWRlIDxzeXMv c3lzdG0uaD4NCiAjaW5jbHVkZSA8c3lzL2tlcm5lbC5oPg0KLSNpbmNsdWRl IDxzeXMvbGlia2Vybi5oPg0KLSNpbmNsdWRlIDxzeXMvc3lzdG0uaD4NCiAj ZW5kaWYNCiANCi0jaW5jbHVkZSA8bm1taW50cmluLmg+DQorc3RhdGljIF9f aW5saW5lIHVpbnQzMl90DQorX21tX2NyYzMyX3U4KHVpbnQzMl90IHgsIHVp bnQ4X3QgeSkNCit7DQorCS8qDQorCSAqIGNsYW5nIChhdCBsZWFzdCAzLjku WzAtMV0pIHBlc3NpbWl6ZXMgInJtIiAoeSkgYW5kICJtIiAoeSkNCisJICog c2lnbmlmaWNhbnRseSBhbmQgInIiICh5KSBhIGxvdCBieSBjb3B5aW5nIHkg dG8gYSBkaWZmZXJlbnQNCisJICogbG9jYWwgdmFyaWFibGUgKG9uIHRoZSBz dGFjayBvciBpbiBhIHJlZ2lzdGVyKSwgc28gb25seSB1c2UNCisJICogdGhl IGxhdHRlci4gIFRoaXMgY29zdHMgYSByZWdpc3RlciBhbmQgYW4gaW5zdHJ1 Y3Rpb24gYnV0DQorCSAqIG5vdCBhIHVvcC4NCisJICovDQorCV9fYXNtKCJj cmMzMmIgJTEsJTAiIDogIityIiAoeCkgOiAiciIgKHkpKTsNCisJcmV0dXJu ICh4KTsNCit9DQogDQorc3RhdGljIF9faW5saW5lIHVpbnQzMl90DQorX21t X2NyYzMyX3UzMih1aW50MzJfdCB4LCB1aW50MzJfdCB5KQ0KK3sNCisJX19h c20oImNyYzMybCAlMSwlMCIgOiAiK3IiICh4KSA6ICJyIiAoeSkpOw0KKwly ZXR1cm4gKHgpOw0KK30NCisNCitzdGF0aWMgX19pbmxpbmUgdWludDY0X3QN CitfbW1fY3JjMzJfdTY0KHVpbnQ2NF90IHgsIHVpbnQ2NF90IHkpDQorew0K KwlfX2FzbSgiY3JjMzJxICUxLCUwIiA6ICIrciIgKHgpIDogInIiICh5KSk7 DQorCXJldHVybiAoeCk7DQorfQ0KKw0KIC8qIENSQy0zMkMgKGlTQ1NJKSBw b2x5bm9taWFsIGluIHJldmVyc2VkIGJpdCBvcmRlci4gKi8NCiAjZGVmaW5l IFBPTFkJMHg4MmY2M2I3OA0KIA0KQEAgLTQ3LDEyICs3MywxNCBAQA0KICAq IEJsb2NrIHNpemVzIGZvciB0aHJlZS13YXkgcGFyYWxsZWwgY3JjIGNvbXB1 dGF0aW9uLiAgTE9ORyBhbmQgU0hPUlQgbXVzdA0KICAqIGJvdGggYmUgcG93 ZXJzIG9mIHR3by4NCiAgKi8NCi0jZGVmaW5lIExPTkcJODE5Mg0KLSNkZWZp bmUgU0hPUlQJMjU2DQorI2RlZmluZSBMT05HCTEyOA0KKyNkZWZpbmUgU0hP UlQJNjQNCiANCiAvKiBUYWJsZXMgZm9yIGhhcmR3YXJlIGNyYyB0aGF0IHNo aWZ0IGEgY3JjIGJ5IExPTkcgYW5kIFNIT1JUIHplcm9zLiAqLw0KIHN0YXRp YyB1aW50MzJfdCBjcmMzMmNfbG9uZ1s0XVsyNTZdOw0KK3N0YXRpYyB1aW50 MzJfdCBjcmMzMmNfMmxvbmdbNF1bMjU2XTsNCiBzdGF0aWMgdWludDMyX3Qg Y3JjMzJjX3Nob3J0WzRdWzI1Nl07DQorc3RhdGljIHVpbnQzMl90IGNyYzMy Y18yc2hvcnRbNF1bMjU2XTsNCiANCiAvKg0KICAqIE11bHRpcGx5IGEgbWF0 cml4IHRpbWVzIGEgdmVjdG9yIG92ZXIgdGhlIEdhbG9pcyBmaWVsZCBvZiB0 d28gZWxlbWVudHMsDQpAQCAtMTcxLDcgKzE5OSw5IEBADQogY3JjMzJjX2lu aXRfaHcodm9pZCkNCiB7DQogCWNyYzMyY196ZXJvcyhjcmMzMmNfbG9uZywg TE9ORyk7DQorCWNyYzMyY196ZXJvcyhjcmMzMmNfMmxvbmcsIDIgKiBMT05H KTsNCiAJY3JjMzJjX3plcm9zKGNyYzMyY19zaG9ydCwgU0hPUlQpOw0KKwlj cmMzMmNfemVyb3MoY3JjMzJjXzJzaG9ydCwgMiAqIFNIT1JUKTsNCiB9DQog I2lmZGVmIF9LRVJORUwNCiBTWVNJTklUKGNyYzMyY19zc2U0MiwgU0lfU1VC X0xPQ0ssIFNJX09SREVSX0FOWSwgY3JjMzJjX2luaXRfaHcsIE5VTEwpOw0K QEAgLTE5MCw3ICsyMjAsMTEgQEANCiAJY29uc3Qgc2l6ZV90IGFsaWduID0g NDsNCiAjZW5kaWYNCiAJY29uc3QgdW5zaWduZWQgY2hhciAqbmV4dCwgKmVu ZDsNCi0JdWludDY0X3QgY3JjMCwgY3JjMSwgY3JjMjsgICAgICAvKiBuZWVk IHRvIGJlIDY0IGJpdHMgZm9yIGNyYzMycSAqLw0KKyNpZmRlZiBfX2FtZDY0 X18NCisJdWludDY0X3QgY3JjMCwgY3JjMSwgY3JjMjsNCisjZWxzZQ0KKwl1 aW50MzJfdCBjcmMwLCBjcmMxLCBjcmMyOw0KKyNlbmRpZg0KIA0KIAluZXh0 ID0gYnVmOw0KIAljcmMwID0gY3JjOw0KQEAgLTIwMiw2ICsyMzYsNyBAQA0K IAkJbGVuLS07DQogCX0NCiANCisjaWYgTE9ORyA+IFNIT1JUDQogCS8qDQog CSAqIENvbXB1dGUgdGhlIGNyYyBvbiBzZXRzIG9mIExPTkcqMyBieXRlcywg ZXhlY3V0aW5nIHRocmVlIGluZGVwZW5kZW50DQogCSAqIGNyYyBpbnN0cnVj dGlvbnMsIGVhY2ggb24gTE9ORyBieXRlcyAtLSB0aGlzIGlzIG9wdGltaXpl ZCBmb3IgdGhlDQpAQCAtMjA5LDYgKzI0NCw3IEBADQogCSAqIGhhdmUgYSB0 aHJvdWdocHV0IG9mIG9uZSBjcmMgcGVyIGN5Y2xlLCBidXQgYSBsYXRlbmN5 IG9mIHRocmVlDQogCSAqIGN5Y2xlcy4NCiAJICovDQorCWNyYyA9IDA7DQog CXdoaWxlIChsZW4gPj0gTE9ORyAqIDMpIHsNCiAJCWNyYzEgPSAwOw0KIAkJ Y3JjMiA9IDA7DQpAQCAtMjI5LDE2ICsyNjUsNjQgQEANCiAjZW5kaWYNCiAJ CQluZXh0ICs9IGFsaWduOw0KIAkJfSB3aGlsZSAobmV4dCA8IGVuZCk7DQot CQljcmMwID0gY3JjMzJjX3NoaWZ0KGNyYzMyY19sb25nLCBjcmMwKSBeIGNy YzE7DQotCQljcmMwID0gY3JjMzJjX3NoaWZ0KGNyYzMyY19sb25nLCBjcmMw KSBeIGNyYzI7DQorCQkvKi0NCisJCSAqIFVwZGF0ZSB0aGUgY3JjLiAgVHJ5 IHRvIGRvIGl0IGluIHBhcmFsbGVsIHdpdGggdGhlIGlubmVyDQorCQkgKiBs b29wLiAgJ2NyYycgaXMgdXNlZCB0byBhY2N1bXVsYXRlIGNyYzAgYW5kIGNy YzENCisJCSAqIHByb2R1Y2VkIGJ5IHRoZSBpbm5lciBsb29wIHNvIHRoYXQg dGhlIG5leHQgaXRlcmF0aW9uDQorCQkgKiBvZiB0aGUgbG9vcCBkb2Vzbid0 IGRlcGVuZCBvbiBhbnl0aGluZyBleGNlcHQgY3JjMi4NCisJCSAqDQorCQkg KiBUaGUgZnVsbCBleHByZXNzaW9uIGZvciB0aGUgdXBkYXRlIGlzOg0KKwkJ ICogICAgIGNyYyA9IFMqUypTKmNyYyArIFMqUypjcmMwICsgUypjcmMxDQor CQkgKiB3aGVyZSB0aGUgdGVybXMgYXJlIHBvbHlub21pYWxzIG1vZHVsbyB0 aGUgQ1JDIHBvbHlub21pYWwuDQorCQkgKiBXZSByZWdyb3VwIHRoaXMgc3Vi dGx5IGFzOg0KKwkJICogICAgIGNyYyA9IFMqUyAqIChTKmNyYyArIGNyYzAp ICsgUypjcmMxLg0KKwkJICogVGhpcyBoYXMgYW4gZXh0cmEgZGVwZW5kZW5j eSB3aGljaCByZWR1Y2VzIHBvc3NpYmxlDQorCQkgKiBwYXJhbGxlbGlzbSBm b3IgdGhlIGV4cHJlc3Npb24sIGJ1dCBpdCB0dXJucyBvdXQgdG8gYmUNCisJ CSAqIGJlc3QgdG8gaW50ZW50aW9uYWxseSBkZWxheSBldmFsdWF0aW9uIG9m IHRoaXMgZXhwcmVzc2lvbg0KKwkJICogc28gdGhhdCBpdCBjb21wZXRlcyBs ZXNzIHdpdGggdGhlIGlubmVyIGxvb3AuDQorCQkgKg0KKwkJICogV2UgYWxz byBpbnRlbnRpb25hbGx5IHJlZHVjZSBwYXJhbGxlbGlzbSBieSBmZWVkbmcg YmFjaw0KKwkJICogY3JjMiB0byB0aGUgaW5uZXIgbG9vcCBhcyBjcmMwIGlu c3RlYWQgb2YgYWNjdW11bGF0aW5nDQorCQkgKiBpdCBpbiBjcmMuICBUaGlz IHN5bmNocm9uaXplcyB0aGUgbG9vcCB3aXRoIGNyYyB1cGRhdGUuDQorCQkg KiBDUFUgYW5kL29yIGNvbXBpbGVyIHNjaGVkdWxlcnMgcHJvZHVjZWQgYmFk IG9yZGVyIHdpdGhvdXQNCisJCSAqIHRoaXMuDQorCQkgKg0KKwkJICogU2hp ZnRzIHRha2UgYWJvdXQgMTIgY3ljbGVzIGVhY2gsIHNvIDMgaGVyZSB3aXRo IDINCisJCSAqIHBhcmFsbGVsaXphYmxlIHRha2UgYWJvdXQgMjQgY3ljbGVz IGFuZCB0aGUgY3JjIHVwZGF0ZQ0KKwkJICogdGFrZXMgc2xpZ2h0bHkgbG9u Z2VyLiAgOCBkZXBlbmRlbnQgY3JjMzIgaW5zdHJ1Y3Rpb25zDQorCQkgKiBj YW4gcnVuIGluIDI0IGN5Y2xlcywgc28gdGhlIDMtd2F5IGJsb2NraW5nIGlz IHdvcnNlDQorCQkgKiB0aGFuIHVzZWxlc3MgZm9yIHNpemVzIGxlc3MgdGhh biA4ICogPHdvcmQgc2l6ZT4gPSA2NA0KKwkJICogb24gYW1kNjQuICBJbiBw cmFjdGljZSwgU0hPUlQgPSAzMiBjb25maXJtcyB0aGVzZQ0KKwkJICogdGlt aW5nIGNhbGN1bGF0aW9ucyBieSBnaXZpbmcgYSBzbWFsbCBpbXByb3ZlbWVu dA0KKwkJICogc3RhcnRpbmcgYXQgc2l6ZSA5Ni4gIFRoZW4gdGhlIGlubmVy IGxvb3AgdGFrZXMgYWJvdXQNCisJCSAqIDEyIGN5Y2xlcyBhbmQgdGhlIGNy YyB1cGRhdGUgYWJvdXQgMjQsIGJ1dCB0aGVzZSBhcmUNCisJCSAqIHBhcnRs eSBpbiBwYXJhbGxlbCBzbyB0aGUgdG90YWwgdGltZSBpcyBsZXNzIHRoYW4g dGhlDQorCQkgKiAzNiBjeWNsZXMgdGhhdCAxMiBkZXBlbmRlbnQgY3JjMzIg aW5zdHJ1Y3Rpb25zIHdvdWxkDQorCQkgKiB0YWtlLg0KKwkJICoNCisJCSAq IFRvIGhhdmUgYSBjaGFuY2Ugb2YgY29tcGxldGVseSBoaWRpbmcgdGhlIG92 ZXJoZWFkIGZvcg0KKwkJICogdGhlIGNyYyB1cGRhdGUsIHRoZSBpbm5lciBs b29wIG11c3QgdGFrZSBjb25zaWRlcmFibHkNCisJCSAqIGxvbmdlciB0aGFu IDI0IGN5Y2xlcy4gIExPTkcgPSA2NCBtYWtlcyB0aGUgaW5uZXIgbG9vcA0K KwkJICogdGFrZSBhYm91dCAyNCBjeWNsZXMsIHNvIGlzIG5vdCBxdWl0ZSBs YXJnZSBlbm91Z2guDQorCQkgKiBMT05HID0gMTI4IHdvcmtzIE9LLiAgVW5o aWRlYWJsZSBvdmVyaGVhZHMgYXJlIGFib3V0DQorCQkgKiAxMiBjeWNsZXMg cGVyIGlubmVyIGxvb3AuICBBbGwgYXNzdW1pbmcgdGltaW5nIGxpa2UNCisJ CSAqIEhhc3dlbGwuDQorCQkgKi8NCisJCWNyYyA9IGNyYzMyY19zaGlmdChj cmMzMmNfbG9uZywgY3JjKSBeIGNyYzA7DQorCQljcmMxID0gY3JjMzJjX3No aWZ0KGNyYzMyY19sb25nLCBjcmMxKTsNCisJCWNyYyA9IGNyYzMyY19zaGlm dChjcmMzMmNfMmxvbmcsIGNyYykgXiBjcmMxOw0KKwkJY3JjMCA9IGNyYzI7 DQogCQluZXh0ICs9IExPTkcgKiAyOw0KIAkJbGVuIC09IExPTkcgKiAzOw0K IAl9DQorCWNyYzAgXj0gY3JjOw0KKyNlbmRpZiAvKiBMT05HID4gU0hPUlQg Ki8NCiANCiAJLyoNCiAJICogRG8gdGhlIHNhbWUgdGhpbmcsIGJ1dCBub3cg b24gU0hPUlQqMyBibG9ja3MgZm9yIHRoZSByZW1haW5pbmcgZGF0YQ0KIAkg KiBsZXNzIHRoYW4gYSBMT05HKjMgYmxvY2sNCiAJICovDQorCWNyYyA9IDA7 DQogCXdoaWxlIChsZW4gPj0gU0hPUlQgKiAzKSB7DQogCQljcmMxID0gMDsN CiAJCWNyYzIgPSAwOw0KQEAgLTI1OSwxMSArMzQzLDE0IEBADQogI2VuZGlm DQogCQkJbmV4dCArPSBhbGlnbjsNCiAJCX0gd2hpbGUgKG5leHQgPCBlbmQp Ow0KLQkJY3JjMCA9IGNyYzMyY19zaGlmdChjcmMzMmNfc2hvcnQsIGNyYzAp IF4gY3JjMTsNCi0JCWNyYzAgPSBjcmMzMmNfc2hpZnQoY3JjMzJjX3Nob3J0 LCBjcmMwKSBeIGNyYzI7DQorCQljcmMgPSBjcmMzMmNfc2hpZnQoY3JjMzJj X3Nob3J0LCBjcmMpIF4gY3JjMDsNCisJCWNyYzEgPSBjcmMzMmNfc2hpZnQo Y3JjMzJjX3Nob3J0LCBjcmMxKTsNCisJCWNyYyA9IGNyYzMyY19zaGlmdChj cmMzMmNfMnNob3J0LCBjcmMpIF4gY3JjMTsNCisJCWNyYzAgPSBjcmMyOw0K IAkJbmV4dCArPSBTSE9SVCAqIDI7DQogCQlsZW4gLT0gU0hPUlQgKiAzOw0K IAl9DQorCWNyYzAgXj0gY3JjOw0KIA0KIAkvKiBDb21wdXRlIHRoZSBjcmMg b24gdGhlIHJlbWFpbmluZyBieXRlcyBhdCBuYXRpdmUgd29yZCBzaXplLiAq Lw0KIAllbmQgPSBuZXh0ICsgKGxlbiAtIChsZW4gJiAoYWxpZ24gLSAxKSkp Ow0K --0-952914049-1488248872=:2733--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170228121335.Q2733>