Date: Wed, 3 Oct 2018 01:01:03 -0400 From: Allan Jude <allanjude@freebsd.org> To: freebsd-hackers@freebsd.org, Thomas Munro <munro@ip9.org> Subject: Re: Regression when trying to replace poll() with kqueue() Message-ID: <09f0dce2-7899-b839-e70e-79be43a0fa6b@freebsd.org> In-Reply-To: <CADLWmXXcdbL6wyLUktGzp=41zmbRjxw30FU=Ait-jfd8NcQSyQ@mail.gmail.com> References: <CADLWmXXcdbL6wyLUktGzp=41zmbRjxw30FU=Ait-jfd8NcQSyQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-10-01 22:24, Thomas Munro wrote: > Hello FreeBSD hackers, > > (CCing mjg and a list of others FreeBSD hackers he suggested) > > In a fit of enthusiasm for FreeBSD, a couple of years ago I wrote a > patch to teach PostgreSQL to use kqueue(2). That was after we > switched over to epoll(2) on Linux for performance reasons. Our > default is to use poll(2) unless we have something better. The most > common usage pattern is simply waiting for read/write readiness on the > socket that is connected to the client + a pipe connected to the > parent supervisor process ("postmaster"), but we have plans for more > interesting kinds of multiplexing involving many more descriptors, and > in general this sits behind our very thin abstraction called > WaitEventSet (see latch.c in the PostgreSQL source tree) that can be > used for many things. > > We did some testing using "pgbench" (instructions below) on various > platforms that have kqueue(2), and we got some conflicting results > from FreeBSD. When the system is heavily overloaded (a scenario we > want to work well, or at least not get worse under kqueue, even if > it's not the ideal way to run your database server), mjg reported that > with the kqueue patch performance was way better than unpatched when > the pgbench test client was running on a different host. Huzzah! > > Unfortunately, another tester reported the performance was worse when > running pgbench from the same host (originally he complained about > NetBSD performance and then we realised FreeBSD was the same under > those conditions), and I confirmed that was the case for both Unix > sockets and TCP sockets. In one 96 (!) thread test, the TPS reported > by pgbench dropped from 70k to 50k queries per second on an 8 CPU > system. As crazy as those test conditions may seem, that is not a > good result. > > Curiously, when truss'd, in the overloaded scenario that performs > worse, we very rarely seem to actually reach kevent(2). It seems like > there is some kind of scheduling difference producing the change. > Each PostgreSQL server process looks like this over ~10 seconds: > > syscall seconds calls errors > sendto 0.396840146 3452 0 > recvfrom 0.415802029 3443 6 > kevent 0.000626393 6 0 > gettimeofday 2.723923249 24053 0 > ------------- ------- ------- > 3.537191817 30954 6 > > (That was captured on a virtualised system which had gettimeofday as a > syscall, but the effect has been reported on bare metal too and there > no gettimeofday calls show up; I don't believe that is a factor). > > The pgbench client looks like this: > > syscall seconds calls errors > ppoll 0.002773195 1 0 > sendto 16.597880468 7217 0 > recvfrom 25.646406008 7238 0 > ------------- ------- ------- > 42.247059671 14456 0 > > (For whatever reason pgbench uses ppoll() instead, but I assume that's > irrelevant here; it's also multi-threaded, unlike the server.) The > truss -c results for the server are not much different when using > poll(2) instead of kevent(2), although recvfrom in the pgbench client > seems to show a few seconds less total time, which is curious. You > can see that we're mostly able to do sendto() and recvfrom() without > seeing EWOULDBLOCK. So it's not direct access to the kqueue that is > affecting performance. It's something else, something caused by the > mere existence of the kqueue object holding the descriptor. > > That led several people to speculate that there may be a difference in > the wakeup logic, when one end of a descriptor is in a kqueue (mjg > speculated wake-up-one vs broadcast could be a factor), and that may > be leading to worse scheduling behaviour. > > To be clear, nobody thinks that 96 client threads talking to 96 > processes on a single 8 CPU box is a great way to run a system in real > life! But it's still surprising that we lose performance whe using > kqueue, and it'd be great to understand why, and hopefully improve it. > > The complete discussion on pgsql-hackers is here: > > https://www.postgresql.org/message-id/flat/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com > > Any ideas would be most welcome. > > Thanks for reading! > > ==== > > Reproduction steps (assuming you have git, gmake, flex, bison, > readline, curl, ccache): > > # grab postgres > git clone https://github.com/postgres/postgres.git > cd postgres > > # grab kqueue patch > curl -O https://www.postgresql.org/message-id/attachment/65098/0001-Add-kqueue-2-support-for-WaitEventSet-v11.patch > git checkout -b kqueue > git am 0001-Add-kqueue-2-support-for-WaitEventSet-v11.patch > > # build > ./configure --prefix=$HOME/install --with-includes=/usr/local/include > --with-libs=/usr/local/lib CC="ccache cc" > gmake -s -j8 > gmake -s install > gmake -C contrib/pg_prewarm install > > # create a db cluster and set it to use 2GB of shmem so we can hold > whole dataset > ~/install/bin/initdb -D ~/pgdata > echo "shared_buffers = '2GB'" >> ~/pgdata/postgresql.conf > > # you can either start (and later stop) postgres in the background with pg_ctl: > ~/install/bin/pg_ctl start -D ~/pgdata > # ... or just run it in the foreground and hit ^C to stop it: > # ~/install/bin/postgres -D ~/pgdata > > # this should produce about 1.1GB of data under ~/pgdata > ~/install/bin/pgbench -s 10 -i postgres > > # install the prewarm extension, so we can run the test without doing > any file IO > ~/install/bin/psql postgres -c "create extension pg_prewarm" > > # after that, after any server restart, prewarm like so: > ~/install/bin/psql postgres -c "select pg_prewarm(c.oid::regclass) > from pg_class c where relkind in ('r', 'i')" | cat > > # then 60 second pgbench runs are simply: > ~/install/bin/pgbench -c 96 -j 96 -M prepared -S -T 60 postgres > > # to make pgbench use TCP instead of Unix sockets, add -h localhost; > # to allow connection from another host, update ~/pgdata/postgresql.conf's > # listen_addresses > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > I have started to look into this a bit. I have not really gotten anywhere yet, but I have produced a graph comparing the performance of vanilla postgres vs your patch. https://imgur.com/a/gKycGxW They scale identically up to the 20 threads of hardware on my test machine, and then kqueue falls off much more quickly. Hopefully I'll have more useful findings tomorrow. -- Allan Jude
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?09f0dce2-7899-b839-e70e-79be43a0fa6b>