From owner-freebsd-current@FreeBSD.ORG Sat Mar 28 00:43:15 2015 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 067A6E05; Sat, 28 Mar 2015 00:43:15 +0000 (UTC) Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BD05EFE9; Sat, 28 Mar 2015 00:43:14 +0000 (UTC) Received: by igbud6 with SMTP id ud6so34723450igb.1; Fri, 27 Mar 2015 17:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=VbbiS7z2fY+zS4DxbFEA0hWOb+hHsTT8cObLEpFVqBg=; b=bzVNemECPA6fLiY117QrNkoJzbPm9rKwoqeUzt+5CNan9LTWLylR1TLp0ICxxaJJQ+ 1gkHS2GUxp22d8Mhe8A8FU8Om5C5amr0kGrqIg/gvMkplaGohvY5HgzbCq4dazHlzQQa MmK74M6GgLeu2YAVD66z80M4cUaN8mONMSGQVhbK9OefJ4KSTCt62c/6ckZ93z0w7ifY 7NKWTvqdWKQ9JaOGs2ndvqfUDei5bnXxJ0+SXctubvRQZ9UxfDqgKXWMhlKjja7SxhjO BwKLKEqC8kqCarcAIrHdfvUT0ExgCtoMLLdxQdtrQtLEAGla99gRhJDS3lXp4EaU/7w/ QJFw== MIME-Version: 1.0 X-Received: by 10.42.109.12 with SMTP id j12mr6948527icp.22.1427503394155; Fri, 27 Mar 2015 17:43:14 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.17.194 with HTTP; Fri, 27 Mar 2015 17:43:14 -0700 (PDT) In-Reply-To: References: <5515AED9.8040408@FreeBSD.org> <20150327214057.GA3766@stack.nl> Date: Fri, 27 Mar 2015 17:43:14 -0700 X-Google-Sender-Auth: UpWFb6g7lNjd4siCCW-ZqlHdZF8 Message-ID: Subject: Re: SSE in libthr From: Adrian Chadd To: Alan Somers Content-Type: text/plain; charset=UTF-8 Cc: Eric van Gyzen , "current@freebsd.org" , Jilles Tjoelker X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Mar 2015 00:43:15 -0000 On 27 March 2015 at 16:03, Alan Somers wrote: > On Fri, Mar 27, 2015 at 4:36 PM, Adrian Chadd wrote: >> hi, >> >> please don't try to microoptimise crap like strlen(). >> >> The TL;DR for performant high-throughput code is: if strlen() or >> memcpy() is the thing that's costing you the most, you're doing it >> wrong. >> >> >> >> -adrian > > I respectfully disagree. A well-optimized libc will benefit > _every_single_program_ that uses strlen. That includes Apache, Samba, > Memcached, Quake, and basically every single program that every single > FreeBSD user uses. There's no reason that 3rd party software > maintainers should have to rewrite basic libc functions in order to > get decent performance on FreeBSD. And the downsides are so small! > In 2015, we should assume by default that most userland software is > using SIMD instructions. As Eric noticed, Clang emits them freely. > What's the point to lazily saving the SSE registers on context > switches if essentially all programs compiled from Ports will be using > those registers anyway? I agree with Jilles; I think we should always > save the SSE registers for userland programs. That's fine, but those benchmarks and improvements also have to take into account the environment that these programs are running in, and all of the other things that are going on with it. Fixing strlen() to use SSE2 is great, but if the gains are offset by fpu save/restore when doing fine grain locking that's blocking under real world workloads, what's the benefit? What about if the system is context switching over a million times a second? These are real life things I see servers running all of the above software /do/. One only knows with benchmarking, not microbenchmarking. Microbenchmarks are great. They serve a purpose, which is "how the heck is the current silicon I'm running on run some code that I've cleverly crafted to hopefully run well." I'm totally for saving/restoring SSE registers for userland programs. But that's not where that kind of "make stuff fast" work should stop. If it does, and that's where your benchmarking for the real world stops, then you're doing it wrong. Everything is a toss-up. For this userland based netmap packet pushing app, SEE may be nice for some instructions, but know what else screws things? The fact that the default scheduler policy is terrible and crap gets scheduled /everywhere/ under any appreciable amount of load. That the context switch rate is high, the interrupt rate is also high, and with a little locking going on, I see fpu save/restore occur for a non-insignificant fraction of CPU. Optimising strlen() or memcpy() is great, but when my system context switches a million times a second, we're never going to reach the steady state that these CPUs can really crank out real work at under those conditions. So, cool. Please keep poking at that stuff. But if you stop short of making the system actually /be able to take advantage of them under load/, I respectfully ask for a nice knob I can use to turn them off. :) -adrian (Know where the slowdowns for memcached are? Hint - not strlen or memcpy. Yes, I've been down that rabbit hole recently. Know what /i/ have? 1 million UDP transactions a second working on 16 core sandybridge systems. Know what I didn't optimise? memcpy or strlen. The network stack locking and pthreads overhead is what sucks.)