From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 20 10:35:26 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D0759E3; Mon, 20 Oct 2014 10:35:26 +0000 (UTC) Received: from ustc.edu.cn (email6.ustc.edu.cn [IPv6:2001:da8:d800::8]) by mx1.freebsd.org (Postfix) with ESMTP id A5F41F1D; Mon, 20 Oct 2014 10:35:25 +0000 (UTC) Received: from freebsd.my.domain (unknown [58.211.218.74]) by newmailweb.ustc.edu.cn (Coremail) with SMTP id LkAmygBXORaH40RUxoJ8AA--.12897S2; Mon, 20 Oct 2014 18:27:26 +0800 (CST) From: Tiwei Bie To: kostikbel@gmail.com Subject: Re: Re: Re: Fwd: Questions with the in_cksumdata() function in sys/amd64/amd64/in_cksum.c Date: Mon, 20 Oct 2014 18:27:07 +0800 Message-Id: <1413800827-78140-1-git-send-email-btw@mail.ustc.edu.cn> X-Mailer: git-send-email 2.1.0 X-CM-TRANSID: LkAmygBXORaH40RUxoJ8AA--.12897S2 X-Coremail-Antispam: 1UD129KBjvAXoW3Zr4fAw1xZr4kKFWxWr4kJFb_yoW8XF4kCo W5Zr17ZF48Aw1avw1kt34YgrnrGa4Dt3y7ZFyrJrZ3Cwnxt3Z8urn7Xa1rGFZxJrWfA3W8 ZFZrXr1UAry7Grs8n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUY87k0a2IF6w4kM7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0 x2IEx4CE42xK8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj4 1l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0 I7IYx2IY6xkF7I0E14v26F4j6r4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwV C2z280aVCY1x0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC 0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Cr0_Gr 1UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY0x0EwIxGrwCY02Avz4vE14v_Gw1l42xK 82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGw C20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI7VAKI48J MIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMI IF0xvE42xK8VAvwI8IcIk0rVWrZr1j6s0DMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvE x4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU5xWFUUUUUU== X-CM-SenderInfo: xewzqzxdloh3xvwfhvlgxou0/1tbiAQUBAVQhl8bqJQAbs0 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Oct 2014 10:35:26 -0000 > > > I would be not surprised if this manual prefetching by explicit reads > > > causes slowdown of the function. I suspect it could confuse hardware > > > prefetcher by breaking the linear pattern, or the patch could break > > > the logic of the limited forward-looking oracle by reading too far > > > from the current linear read tip. > > > > > > Also, it could confuse the data flow engine if the register allocator > > > is unable to see that the read value is needed not right now, and cause > > > unneeded stall while next cache line is fetched. > > > > > > Sure, all my speculations are pure garbage until confirmed by > > > measurements with pmc, but I think that the patch below must be > > > benchmarked to confirm any value it provides as well. My opinion is, > > > we should either remove the manual prefetch, or do it with PREFETCHLX > > > instructions only, instead of real read. > > > > I have done a rather simple test. And the results are listed as follows: > > > Yes, too simple to draw conclusion, IMO. > > Please look at the ministat(1). I think that the test run length > is too short to come with any decisions. The length x 3 runs does not > give enough confidence; but ministat would provide the numbers to judge. I have run ministat with these results, and the following is the output of ministat. It said that the difference is at 99.5% confidence. $ ministat -w 80 -s -c 99.5 32BYTES_WITH_PRE_READ 32BYTES_WITHOUT_MANUAL_PREFETCH 64BYTES_WITH_PRE_READ 64BYTES_WITHOUT_MANUAL_PREFETCH 64BYTES_WITH_PREFETCH_INSTRUCTION x 32BYTES_WITH_PRE_READ + 32BYTES_WITHOUT_MANUAL_PREFETCH * 64BYTES_WITH_PRE_READ % 64BYTES_WITHOUT_MANUAL_PREFETCH # 64BYTES_WITH_PREFETCH_INSTRUCTION +--------------------------------------------------------------------------------+ |% # | |% *** # # ++ + xx x| | |_MA__|| | |A_| | | |A| | |A | | |MA_| | +--------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 0.768854 0.773803 0.770332 0.77099633 0.0025405028 + 3 0.725956 0.728841 0.726651 0.72714933 0.0015056754 Difference at 99.5% confidence -0.043847 +/- 0.0122301 -5.68706% +/- 1.58627% (Student's t, pooled s = 0.00208821) * 3 0.702416 0.704498 0.703648 0.70352067 0.0010468244 Difference at 99.5% confidence -0.0674757 +/- 0.0113792 -8.75175% +/- 1.47591% (Student's t, pooled s = 0.00194294) % 3 0.697971 0.698314 0.698071 0.69811867 0.00017639822 Difference at 99.5% confidence -0.0728777 +/- 0.0105464 -9.4524% +/- 1.36789% (Student's t, pooled s = 0.00180073) # 3 0.711962 0.715883 0.712118 0.713321 0.0022201277 Difference at 99.5% confidence -0.0576753 +/- 0.0139724 -7.48062% +/- 1.81225% (Student's t, pooled s = 0.0023857) > > #1. Read 32 bytes with manual pre-read in each loop: > > > > $ cc main.c -D_32BYTES_WITH_PRE_READ > > $ for i in `seq 3`; do ./a.out; done > > 0.768854 > > 0.770332 > > 0.773803 > > > > #2. Read 64 bytes with manual pre-read in each loop: > > > > $ cc main.c -D_64BYTES_WITH_PRE_READ > > $ for i in `seq 3`; do ./a.out; done > > 0.702416 > > 0.703648 > > 0.704498 > > > > #3. Read 32 bytes without manual prefetch in each loop: > > > > $ cc main.c -D_32BYTES_WITHOUT_MANUAL_PREFETCH > > $ for i in `seq 3`; do ./a.out; done > > 0.726651 > > 0.728841 > > 0.725956 > > > > #4. Read 64 bytes without manual prefetch in each loop: > > > > $ cc main.c -D_64BYTES_WITHOUT_MANUAL_PREFETCH > > $ for i in `seq 3`; do ./a.out; done > > 0.698071 > > 0.697971 > > 0.698314 > > > > #5. Read 64 bytes with PREFETCH instruction: > > > > $ cc main.c -D_64BYTES_WITH_PREFETCH_INSTRUCTION > > $ for i in `seq 3`; do ./a.out; done > > 0.715883 > > 0.712118 > > 0.711962 > > > > The test is very simple. I just run the in_cksumdata() function on one > > million packets. And the result is the time spent on calculating these > > checksums. > > > > As we can see from the results, when reading 64 bytes data without manual > > prefetch operation in each loop, the speed is fastest. So, I think read > > a whole cache line in each loop is helpful. > > > > --- > > > > The computer that I run the test program on: > > > > $ dmesg | grep CPU: > > CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (3093.03-MHz K8-class CPU) > > > > --- > > > > The test program: > > > > #include > > #include > > #include > > > > /* ------------------------------------------------------------------------ */ > > > > #define PACKET_SIZE 1500 > > #define BUFFER_SIZE ((PACKET_SIZE) << 20) > > static unsigned char buffer[BUFFER_SIZE]; > > > > /* ------------------------------------------------------------------------ */ > > > > /* > > * Checksum routine for Internet Protocol family headers > > * (Portable Alpha version). > > * > > * This routine is very heavily used in the network > > * code and should be modified for each CPU to be as fast as possible. > > */ > > > > #define ADDCARRY(x) (x > 65535 ? x -= 65535 : x) > > #define REDUCE32 \ > > { \ > > q_util.q = sum; \ > > sum = q_util.s[0] + q_util.s[1] + q_util.s[2] + q_util.s[3]; \ > > } > > #define REDUCE16 \ > > { \ > > q_util.q = sum; \ > > l_util.l = q_util.s[0] + q_util.s[1] + q_util.s[2] + q_util.s[3]; \ > > sum = l_util.s[0] + l_util.s[1]; \ > > ADDCARRY(sum); \ > > } > > > > static const u_int32_t in_masks[] = { > > /*0 bytes*/ /*1 byte*/ /*2 bytes*/ /*3 bytes*/ > > 0x00000000, 0x000000FF, 0x0000FFFF, 0x00FFFFFF, /* offset 0 */ > > 0x00000000, 0x0000FF00, 0x00FFFF00, 0xFFFFFF00, /* offset 1 */ > > 0x00000000, 0x00FF0000, 0xFFFF0000, 0xFFFF0000, /* offset 2 */ > > 0x00000000, 0xFF000000, 0xFF000000, 0xFF000000, /* offset 3 */ > > }; > > > > union l_util { > > u_int16_t s[2]; > > u_int32_t l; > > }; > > union q_util { > > u_int16_t s[4]; > > u_int32_t l[2]; > > u_int64_t q; > > }; > > > > /* ------------------------------------------------------------------------ */ > > > > //#define _32BYTES_WITH_PRE_READ > > //#define _64BYTES_WITH_PRE_READ > > //#define _32BYTES_WITHOUT_MANUAL_PREFETCH > > //#define _64BYTES_WITHOUT_MANUAL_PREFETCH > > //#define _64BYTES_WITH_PREFETCH_INSTRUCTION > > > > #ifdef _32BYTES_WITH_PRE_READ > > static u_int64_t > > in_cksumdata(const void *buf, int len) > > { > > const u_int32_t *lw = (const u_int32_t *) buf; > > u_int64_t sum = 0; > > u_int64_t prefilled; > > int offset; > > union q_util q_util; > > > > if ((3 & (long) lw) == 0 && len == 20) { > > sum = (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3] + lw[4]; > > REDUCE32; > > return sum; > > } > > > > if ((offset = 3 & (long) lw) != 0) { > > const u_int32_t *masks = in_masks + (offset << 2); > > lw = (u_int32_t *) (((long) lw) - offset); > > sum = *lw++ & masks[len >= 3 ? 3 : len]; > > len -= 4 - offset; > > if (len <= 0) { > > REDUCE32; > > return sum; > > } > > } > > #if 0 > > /* > > * Force to cache line boundary. > > */ > > offset = 32 - (0x1f & (long) lw); > > if (offset < 32 && len > offset) { > > len -= offset; > > if (4 & offset) { > > sum += (u_int64_t) lw[0]; > > lw += 1; > > } > > if (8 & offset) { > > sum += (u_int64_t) lw[0] + lw[1]; > > lw += 2; > > } > > if (16 & offset) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > } > > #endif > > /* > > * access prefilling to start load of next cache line. > > * then add current cache line > > * save result of prefilling for loop iteration. > > */ > > prefilled = lw[0]; > > while ((len -= 32) >= 4) { > > u_int64_t prefilling = lw[8]; > > sum += prefilled + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7]; > > lw += 8; > > prefilled = prefilling; > > } > > if (len >= 0) { > > sum += prefilled + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7]; > > lw += 8; > > } else { > > len += 32; > > } > > while ((len -= 16) >= 0) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > len += 16; > > while ((len -= 4) >= 0) { > > sum += (u_int64_t) *lw++; > > } > > len += 4; > > if (len > 0) > > sum += (u_int64_t) (in_masks[len] & *lw); > > REDUCE32; > > return sum; > > } > > #endif > > > > #ifdef _64BYTES_WITH_PRE_READ > > static u_int64_t > > in_cksumdata(const void *buf, int len) > > { > > const u_int32_t *lw = (const u_int32_t *) buf; > > u_int64_t sum = 0; > > u_int64_t prefilled; > > int offset; > > union q_util q_util; > > > > if ((3 & (long) lw) == 0 && len == 20) { > > sum = (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3] + lw[4]; > > REDUCE32; > > return sum; > > } > > > > if ((offset = 3 & (long) lw) != 0) { > > const u_int32_t *masks = in_masks + (offset << 2); > > lw = (u_int32_t *) (((long) lw) - offset); > > sum = *lw++ & masks[len >= 3 ? 3 : len]; > > len -= 4 - offset; > > if (len <= 0) { > > REDUCE32; > > return sum; > > } > > } > > #if 0 > > /* > > * Force to cache line boundary. > > */ > > offset = 32 - (0x1f & (long) lw); > > if (offset < 32 && len > offset) { > > len -= offset; > > if (4 & offset) { > > sum += (u_int64_t) lw[0]; > > lw += 1; > > } > > if (8 & offset) { > > sum += (u_int64_t) lw[0] + lw[1]; > > lw += 2; > > } > > if (16 & offset) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > } > > #endif > > /* > > * access prefilling to start load of next cache line. > > * then add current cache line > > * save result of prefilling for loop iteration. > > */ > > prefilled = lw[0]; > > while ((len -= 64) >= 4) { > > u_int64_t prefilling = lw[16]; > > sum += prefilled + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > prefilled = prefilling; > > } > > if (len >= 0) { > > sum += prefilled + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > } else { > > len += 64; > > } > > while ((len -= 16) >= 0) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > len += 16; > > while ((len -= 4) >= 0) { > > sum += (u_int64_t) *lw++; > > } > > len += 4; > > if (len > 0) > > sum += (u_int64_t) (in_masks[len] & *lw); > > REDUCE32; > > return sum; > > } > > #endif > > > > #ifdef _32BYTES_WITHOUT_MANUAL_PREFETCH > > static u_int64_t > > in_cksumdata(const void *buf, int len) > > { > > const u_int32_t *lw = (const u_int32_t *) buf; > > u_int64_t sum = 0; > > int offset; > > union q_util q_util; > > > > if ((3 & (long) lw) == 0 && len == 20) { > > sum = (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3] + lw[4]; > > REDUCE32; > > return sum; > > } > > > > if ((offset = 3 & (long) lw) != 0) { > > const u_int32_t *masks = in_masks + (offset << 2); > > lw = (u_int32_t *) (((long) lw) - offset); > > sum = *lw++ & masks[len >= 3 ? 3 : len]; > > len -= 4 - offset; > > if (len <= 0) { > > REDUCE32; > > return sum; > > } > > } > > #if 0 > > /* > > * Force to cache line boundary. > > */ > > offset = 32 - (0x1f & (long) lw); > > if (offset < 32 && len > offset) { > > len -= offset; > > if (4 & offset) { > > sum += (u_int64_t) lw[0]; > > lw += 1; > > } > > if (8 & offset) { > > sum += (u_int64_t) lw[0] + lw[1]; > > lw += 2; > > } > > if (16 & offset) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > } > > #endif > > /* > > * access prefilling to start load of next cache line. > > * then add current cache line > > * save result of prefilling for loop iteration. > > */ > > while ((len -= 32) >= 4) { > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7]; > > lw += 8; > > } > > if (len >= 0) { > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7]; > > lw += 8; > > } else { > > len += 32; > > } > > while ((len -= 16) >= 0) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > len += 16; > > while ((len -= 4) >= 0) { > > sum += (u_int64_t) *lw++; > > } > > len += 4; > > if (len > 0) > > sum += (u_int64_t) (in_masks[len] & *lw); > > REDUCE32; > > return sum; > > } > > #endif > > > > #ifdef _64BYTES_WITHOUT_MANUAL_PREFETCH > > static u_int64_t > > in_cksumdata(const void *buf, int len) > > { > > const u_int32_t *lw = (const u_int32_t *) buf; > > u_int64_t sum = 0; > > int offset; > > union q_util q_util; > > > > if ((3 & (long) lw) == 0 && len == 20) { > > sum = (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3] + lw[4]; > > REDUCE32; > > return sum; > > } > > > > if ((offset = 3 & (long) lw) != 0) { > > const u_int32_t *masks = in_masks + (offset << 2); > > lw = (u_int32_t *) (((long) lw) - offset); > > sum = *lw++ & masks[len >= 3 ? 3 : len]; > > len -= 4 - offset; > > if (len <= 0) { > > REDUCE32; > > return sum; > > } > > } > > #if 0 > > /* > > * Force to cache line boundary. > > */ > > offset = 32 - (0x1f & (long) lw); > > if (offset < 32 && len > offset) { > > len -= offset; > > if (4 & offset) { > > sum += (u_int64_t) lw[0]; > > lw += 1; > > } > > if (8 & offset) { > > sum += (u_int64_t) lw[0] + lw[1]; > > lw += 2; > > } > > if (16 & offset) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > } > > #endif > > /* > > * access prefilling to start load of next cache line. > > * then add current cache line > > * save result of prefilling for loop iteration. > > */ > > while ((len -= 64) >= 4) { > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > } > > if (len >= 0) { > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > } else { > > len += 64; > > } > > while ((len -= 16) >= 0) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > len += 16; > > while ((len -= 4) >= 0) { > > sum += (u_int64_t) *lw++; > > } > > len += 4; > > if (len > 0) > > sum += (u_int64_t) (in_masks[len] & *lw); > > REDUCE32; > > return sum; > > } > > #endif > > > > #ifdef _64BYTES_WITH_PREFETCH_INSTRUCTION > > static u_int64_t > > in_cksumdata(const void *buf, int len) > > { > > const u_int32_t *lw = (const u_int32_t *) buf; > > u_int64_t sum = 0; > > int offset; > > union q_util q_util; > > > > if ((3 & (long) lw) == 0 && len == 20) { > > sum = (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3] + lw[4]; > > REDUCE32; > > return sum; > > } > > > > if ((offset = 3 & (long) lw) != 0) { > > const u_int32_t *masks = in_masks + (offset << 2); > > lw = (u_int32_t *) (((long) lw) - offset); > > sum = *lw++ & masks[len >= 3 ? 3 : len]; > > len -= 4 - offset; > > if (len <= 0) { > > REDUCE32; > > return sum; > > } > > } > > #if 0 > > /* > > * Force to cache line boundary. > > */ > > offset = 32 - (0x1f & (long) lw); > > if (offset < 32 && len > offset) { > > len -= offset; > > if (4 & offset) { > > sum += (u_int64_t) lw[0]; > > lw += 1; > > } > > if (8 & offset) { > > sum += (u_int64_t) lw[0] + lw[1]; > > lw += 2; > > } > > if (16 & offset) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > } > > #endif > > /* > > * access prefilling to start load of next cache line. > > * then add current cache line > > * save result of prefilling for loop iteration. > > */ > > __builtin_prefetch(&lw[0]); > > while ((len -= 64) >= 4) { > > __builtin_prefetch(&lw[16]); > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > } > > if (len >= 0) { > > sum += lw[0] + lw[1] + lw[2] + lw[3] > > + lw[4] + lw[5] + lw[6] + lw[7] > > + lw[8] + lw[9] + lw[10] + lw[11] > > + lw[12] + lw[13] + lw[14] + lw[15]; > > lw += 16; > > } else { > > len += 64; > > } > > while ((len -= 16) >= 0) { > > sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3]; > > lw += 4; > > } > > len += 16; > > while ((len -= 4) >= 0) { > > sum += (u_int64_t) *lw++; > > } > > len += 4; > > if (len > 0) > > sum += (u_int64_t) (in_masks[len] & *lw); > > REDUCE32; > > return sum; > > } > > #endif > > > > /* ------------------------------------------------------------------------ */ > > > > int main(void) > > { > > int i; > > int sum; > > struct timeval tv1, tv2, res; > > > > gettimeofday(&tv1, NULL); > > for (i = 0; i < BUFFER_SIZE; i += PACKET_SIZE) > > sum = in_cksumdata(&buffer[i], PACKET_SIZE); > > gettimeofday(&tv2, NULL); > > > > timersub(&tv2, &tv1, &res); > > printf("%ld.%6ld\n", res.tv_sec, res.tv_usec); > > > > return (0); > > } > >