From owner-freebsd-performance@FreeBSD.ORG Thu May 22 06:07:07 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 84A50487; Thu, 22 May 2014 06:07:07 +0000 (UTC) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C27AA2AAA; Thu, 22 May 2014 06:07:06 +0000 (UTC) Received: by mail-wi0-f169.google.com with SMTP id hi2so9413170wib.0 for ; Wed, 21 May 2014 23:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:date:message-id:subject:from:to :content-type; bh=FdhgYqVCaPh+BeIHnJlDrpGL0fR054JT1JRFG0JHSmI=; b=MDTAZGzhCnDl/bDO4QorM48b/4GyuA9M4UC+BTBbV386lyNRJZD5cLN4jINsixoW+u egEA2doU5icE+uy8VA1tUyfAdxlVJn1G9pVcujD7A5awSj5NU9bHKN7pkuiaPOBp3qDx g9gijWrQHPQ4R9wxXLsJBgCno6AZ/1eCwD1T1+4vn4XOA3HMNn1GV41Q6iwuKk52z2es QaWu6UYkm1UMlJ0y7iYaW0qDMHbHvF+aG0LqaRRPXYPO5T75cd/fAVEoNLZZM1rZc4q6 j5INumeygYQuR6aWi1w6kCwnSMefy8+LLIM52mQhVZCkTbatYEceg9QoLsYtxdqn2IZM TFFA== MIME-Version: 1.0 X-Received: by 10.194.60.114 with SMTP id g18mr697807wjr.61.1400738824857; Wed, 21 May 2014 23:07:04 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.61.196 with HTTP; Wed, 21 May 2014 23:07:04 -0700 (PDT) Date: Wed, 21 May 2014 23:07:04 -0700 X-Google-Sender-Auth: xGaUHRF-S8ov-FjEsNMKgy42p5U Message-ID: Subject: FreeBSD 10 and PostgreSQL 9.3 scalability issues From: Attilio Rao To: freebsd-performance@freebsd.org, Bryan Drewery , Florian Smeets Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Thu, 22 May 2014 17:33:35 +0000 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 May 2014 06:07:07 -0000 [ Please CC me as I'm not subscribed to FreeBSD mailing lists ] Recently Bryan Drewery and I have been looking at this issue, in particular after that some people has been pointing us to DragonflyBSD / Linux benchmarks. Usually DB workloads are interesting mostly because they can expose some real performance problems in kernel intensive workloads (scalability of scheduler, VM, VFS, network stack, etc.). More generally, however, some extra-attention must be put when the test is performed, especially by avoiding I/O (to increase predictability and avoid latency fluctuations). We have done tests similar to what Florian Smeets has been doing on netperf's cluster giant-ape1 (a XEON E7 4(nodes)x10(cores) machine) and I've come to the conclusion that the tests comparing dragonflybsd, Linux and FreeBSD have intrinsics problems. Essentially having the client and the backend of PGSQL on the same machine makes them share the data, getting to much faster results. However, the more cache levels they share, the faster the results will be. When the client becomes heavily multithreaded, in particular, the data becomes so unpredictably spreaded that it is difficult to say how much cache-sharing/trashing effects are coming into play. However I can make you an example that explains it well: with a full DB in memory and writes to tmpfs (so no real I/O) and a *single client* configuration, we were getting around +20% if the client and the backend were running on the same chip (so sharing L2 cache) rather than running on 2 different domains. If you consider multiple clients, all touching the same data, caming to play, it becomes a pretty unpredictable behaviour. I can also tell that Florian has tried to benchmark on the same machine in the past and got very unstable numbers, as when using all the 40-cores available, with fluctuations in the range of +/-10%, when trying around 10 times. I think that this explains why this was really the case. I'm not going to claim that FreeBSD will be kick-ass on this type of workload but I think that results reported so far are biased and I think that a more realistic behaviour (that I hope to start exploring soon) would involve having PGSQL clients to run on separate machines so that we can just benchmark the backend behaviour. After all, PGSQL people raccoment this as well: http://www.postgresql.org/docs/devel/static/pgbench.html "A limitation of pgbench is that it can itself become the bottleneck when trying to test a large number of client sessions. This can be alleviated by running pgbench on a different machine from the database server, although low network latency will be essential. It might even be useful to run several pgbench instances concurrently, on several client machines, against the same database server." To be honest, I'm a bit worried that with a realistic/physical test FreeBSD is going to be more limited by NIC / TCP stack bottlenecks than real CPU / memory ones (the ones really interesting to analyze further kernel scalability) but there is no way than try it with performing hardware to see where we are staying. A fairer approach would be maybe to just stick all the clients into a single, different domain and the backend into another. However there will still be some data sharing among them, invalidating the test at all the effects and having clients/backend to compete for the same resources. While looking into this, however I noted something that is interesting: the EST / cpufreq driver is essentially broken. It is not going to attach to newest Intel microarchs (Nehalem, SB, etc). From what my experience is, I can tell that enabling EST and possibly disabling turbo-boost makes a nice difference. Without such capabilities controlled by the est driver we can end up in having sub-optimal performances on Intel CPUs (I can tell that for giant-ape1 this wasn't the case, as everything was already setup properly, but we cannot assume this for all the machines booting FreeBSD I think). Possibly some time might be spent on this part of the code to be properly available. Attilio -- Peace can only be achieved by understanding - A. Einstein