Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Oct 2014 19:53:32 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>, btw@mail.ustc.edu.cn
Subject:   Re: Fwd: Questions with the in_cksumdata() function in sys/amd64/amd64/in_cksum.c
Message-ID:  <20141018165332.GK2153@kib.kiev.ua>
In-Reply-To: <5764827.IHU6o1xrAW@ralph.baldwin.cx>
References:  <5764827.IHU6o1xrAW@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 18, 2014 at 07:29:12AM -0400, John Baldwin wrote:
> Do you have any thoughts on this?  On modern Intel CPUs with the hardware 
> prefetcher this would seem to be a waste, so I wonder how useful this is
> in practice.
I would be not surprised if this manual prefetching by explicit reads
causes slowdown of the function.  I suspect it could confuse hardware
prefetcher by breaking the linear pattern, or the patch could break
the logic of the limited forward-looking oracle by reading too far
from the current linear read tip.

Also, it could confuse the data flow engine if the register allocator
is unable to see that the read value is needed not right now, and cause
unneeded stall while next cache line is fetched.

Sure, all my speculations are pure garbage until confirmed by
measurements with pmc, but I think that the patch below must be
benchmarked to confirm any value it provides as well. My opinion is,
we should either remove the manual prefetch, or do it with PREFETCHLX
instructions only, instead of real read.

> 
> ----------  Forwarded Message  ----------
> 
> Subject: Questions with the in_cksumdata() function in 
> sys/amd64/amd64/in_cksum.c
> Date: Friday, September 26, 2014, 10:51:36 PM
> From: btw@mail.ustc.edu.cn
> To: freebsd-hackers <freebsd-hackers@freebsd.org>
> 
> Hi All,
> 
> I'm reading the in_cksumdata() function in sys/amd64/amd64/in_cksum.c, and
> I have some questions with the following comment and code:
> 
> static u_int64_t
> in_cksumdata(const void *buf, int len)
> {
>         ......
> 
>         /*
>          * access prefilling to start load of next cache line.
>          * then add current cache line
>          * save result of prefilling for loop iteration.
>          */
>         prefilled = lw[0];
>         while ((len -= 32) >= 4) {
>                 u_int64_t prefilling = lw[8];
>                 sum += prefilled + lw[1] + lw[2] + lw[3]
>                         + lw[4] + lw[5] + lw[6] + lw[7];
>                 lw += 8;
>                 prefilled = prefilling;
>         }
> 
>         ......
> }
> 
> It said that it adds the current cache line, and it adds 32 bytes actually,
> while on amd64 platform, the size of each cache line is 64 bytes. So I think
> the correct code should be something like this:
> 
> static u_int64_t
> in_cksumdata(const void *buf, int len)
> {
>         ......
> 
>         /*
>          * access prefilling to start load of next cache line.
>          * then add current cache line
>          * save result of prefilling for loop iteration.
>          */
>         prefilled = lw[0];
>         while ((len -= 64) >= 4) {
>                 u_int64_t prefilling = lw[16];
>                 sum += prefilled + lw[1] + lw[2] + lw[3]
>                         + lw[4] + lw[5] + lw[6] + lw[7]
>                         + lw[8] + lw[9] + lw[10] + lw[11]
>                         + lw[12] + lw[13] + lw[14] + lw[15];
>                 lw += 16;
>                 prefilled = prefilling;
>         }
> 
>         ......
> }
> 
> The full patch is:
> 
> diff --git a/in_cksum.c b/in_cksum.c
> index 2ae3a0c..4f141f8 100644
> --- a/in_cksum.c
> +++ b/in_cksum.c
> @@ -140,19 +140,23 @@ in_cksumdata(const void *buf, int len)
>          * save result of prefilling for loop iteration.
>          */
>         prefilled = lw[0];
> -       while ((len -= 32) >= 4) {
> -               u_int64_t prefilling = lw[8];
> +       while ((len -= 64) >= 4) {
> +               u_int64_t prefilling = lw[16];
>                 sum += prefilled + lw[1] + lw[2] + lw[3]
> -                       + lw[4] + lw[5] + lw[6] + lw[7];
> -               lw += 8;
> +                       + lw[4] + lw[5] + lw[6] + lw[7]
> +                       + lw[8] + lw[9] + lw[10] + lw[11]
> +                       + lw[12] + lw[13] + lw[14] + lw[15];
> +               lw += 16;
>                 prefilled = prefilling;
>         }
>         if (len >= 0) {
>                 sum += prefilled + lw[1] + lw[2] + lw[3]
> -                       + lw[4] + lw[5] + lw[6] + lw[7];
> -               lw += 8;
> +                       + lw[4] + lw[5] + lw[6] + lw[7]
> +                       + lw[8] + lw[9] + lw[10] + lw[11]
> +                       + lw[12] + lw[13] + lw[14] + lw[15];
> +               lw += 16;
>         } else {
> -               len += 32;
> +               len += 64;
>         }
>         while ((len -= 16) >= 0) {
>                 sum += (u_int64_t) lw[0] + lw[1] + lw[2] + lw[3];
> 
> Sorry about the confusion if I did something wrong.
> 
> - twb
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> -----------------------------------------
> 
> -- 
> John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141018165332.GK2153>