Date: Wed, 21 May 2014 23:07:04 -0700 From: Attilio Rao <attilio@freebsd.org> To: freebsd-performance@freebsd.org, Bryan Drewery <bdrewery@freebsd.org>, Florian Smeets <flo@freebsd.org> Subject: FreeBSD 10 and PostgreSQL 9.3 scalability issues Message-ID: <CAJ-FndBYbvB50p%2BFEqyuGhk4m-4GPgOH4%2BA9SQHBN-urz9P%2BzA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
[ Please CC me as I'm not subscribed to FreeBSD mailing lists ] Recently Bryan Drewery and I have been looking at this issue, in particular after that some people has been pointing us to DragonflyBSD / Linux benchmarks. Usually DB workloads are interesting mostly because they can expose some real performance problems in kernel intensive workloads (scalability of scheduler, VM, VFS, network stack, etc.). More generally, however, some extra-attention must be put when the test is performed, especially by avoiding I/O (to increase predictability and avoid latency fluctuations). We have done tests similar to what Florian Smeets has been doing on netperf's cluster giant-ape1 (a XEON E7 4(nodes)x10(cores) machine) and I've come to the conclusion that the tests comparing dragonflybsd, Linux and FreeBSD have intrinsics problems. Essentially having the client and the backend of PGSQL on the same machine makes them share the data, getting to much faster results. However, the more cache levels they share, the faster the results will be. When the client becomes heavily multithreaded, in particular, the data becomes so unpredictably spreaded that it is difficult to say how much cache-sharing/trashing effects are coming into play. However I can make you an example that explains it well: with a full DB in memory and writes to tmpfs (so no real I/O) and a *single client* configuration, we were getting around +20% if the client and the backend were running on the same chip (so sharing L2 cache) rather than running on 2 different domains. If you consider multiple clients, all touching the same data, caming to play, it becomes a pretty unpredictable behaviour. I can also tell that Florian has tried to benchmark on the same machine in the past and got very unstable numbers, as when using all the 40-cores available, with fluctuations in the range of +/-10%, when trying around 10 times. I think that this explains why this was really the case. I'm not going to claim that FreeBSD will be kick-ass on this type of workload but I think that results reported so far are biased and I think that a more realistic behaviour (that I hope to start exploring soon) would involve having PGSQL clients to run on separate machines so that we can just benchmark the backend behaviour. After all, PGSQL people raccoment this as well: http://www.postgresql.org/docs/devel/static/pgbench.html "A limitation of pgbench is that it can itself become the bottleneck when trying to test a large number of client sessions. This can be alleviated by running pgbench on a different machine from the database server, although low network latency will be essential. It might even be useful to run several pgbench instances concurrently, on several client machines, against the same database server." To be honest, I'm a bit worried that with a realistic/physical test FreeBSD is going to be more limited by NIC / TCP stack bottlenecks than real CPU / memory ones (the ones really interesting to analyze further kernel scalability) but there is no way than try it with performing hardware to see where we are staying. A fairer approach would be maybe to just stick all the clients into a single, different domain and the backend into another. However there will still be some data sharing among them, invalidating the test at all the effects and having clients/backend to compete for the same resources. While looking into this, however I noted something that is interesting: the EST / cpufreq driver is essentially broken. It is not going to attach to newest Intel microarchs (Nehalem, SB, etc). From what my experience is, I can tell that enabling EST and possibly disabling turbo-boost makes a nice difference. Without such capabilities controlled by the est driver we can end up in having sub-optimal performances on Intel CPUs (I can tell that for giant-ape1 this wasn't the case, as everything was already setup properly, but we cannot assume this for all the machines booting FreeBSD I think). Possibly some time might be spent on this part of the code to be properly available. Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndBYbvB50p%2BFEqyuGhk4m-4GPgOH4%2BA9SQHBN-urz9P%2BzA>