Date: Sat, 25 Jun 2016 09:16:01 -0700 From: Maxim Sobolev <sobomax@freebsd.org> To: Sean Chittenden <sean@chittenden.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, performance@freebsd.org, John Baldwin <jhb@freebsd.org>, Alan Somers <asomers@freebsd.org>, Alan Cox <alc@rice.edu>, Alan Cox <alc@freebsd.org>, freebsd-current <freebsd-current@freebsd.org>, "current@freebsd.org" <current@freebsd.org> Subject: Re: PostgreSQL performance on FreeBSD Message-ID: <CAH7qZfuHa-ZcZzspvsjhFASTQ841rfUu1gFV7KESBustMQD70Q@mail.gmail.com> In-Reply-To: <C06B11C3-5D40-43AF-8975-880F272933C5@chittenden.org> References: <20140627125613.GT93733@kib.kiev.ua> <CAJ-Vmom-M=R=FaBfHE5c2%2BYxW0SLmJTdFJD8tW4_aOD7MDNwzA@mail.gmail.com> <CAJ-Vmomt=WYjct%2BzsTbHuryxqYp7ELyS52LOb4NEsfENQ1yj1w@mail.gmail.com> <1603235.2ShtoCfSqO@ralph.baldwin.cx> <CAH7qZfuAtHtUG92wEjPhOZ=BGgyFS728uigjJoD0pG%2B-mtUSww@mail.gmail.com> <20160622100241.GM38613@kib.kiev.ua> <CAH7qZfvy46wWcrjz-ihA%2B%2BEYktm7PqGoJhj1a7hdYWssiEXFuA@mail.gmail.com> <C06B11C3-5D40-43AF-8975-880F272933C5@chittenden.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Sean, to the issue that you are describing it is also might be possible to do it some other way around. One, perhaps more portable, is to share a connected socketpair between two communicating processes, so that you can do non-blocking read on one of its ends from time to time and check if it returns EOF. Which would be the case if whatever process holds the other end of it is no longer there. So instead of shared memory segment, you can have pool of descriptors, one for each worker that you care about. Polling on those would be trivial with just regular poll(2). The only issue might be that postgres forks a lot, so we would probably need to implement FD_CLOFORK to avoid copying those extra fds into every child. Something akin to a solution that I recently posted to work around problem that you cannot really waitpid() on a grand-child see PG BUG #14199 for details & patch. But yes, it would be really nice to get rid of SYSV shared memory use in PG completely as some point one way or another. -Max On Thu, Jun 23, 2016 at 3:42 PM, Sean Chittenden <sean@chittenden.org> wrote: > Small nit: > > PostgreSQL used SYSV because it allowed for the detection of dead > processes. If you `kill -9`=E2=80=99ed a process, PostgreSQL can detect = that and > then shut down and perform an automatic recovery. In this regard, sysv i= s > pretty clever. The move to POSIX shared mem was done for a host of > reasons, but it means that you don=E2=80=99t have to adjust your SYSV lim= its. My > understanding from a few years ago is that there is still a ~64KB SYSV > memory segment that is still used to act as the latch to signal if a > process was killed, but all of the shared buffers are stored in posix > mmap=E2=80=99ed regions. > > At this point in time this could be replaced with kqueue(2) EVFILT_PROC, > but no one has done that yet. > > -sc > > > > -- > Sean Chittenden > sean@chittenden.org > > > On Jun 22, 2016, at 07:26 , Maxim Sobolev <sobomax@freebsd.org> wrote: > > > > Konstantin, > > > > Not if you do sem_unlink() immediately, AFAIK. And that's what PG does. > So > > the window of opportunity for the leakage is quite small, much smaller > than > > for SYSV primitives. Sorry for missing your status update message, I've > > missed it somehow. > > > > ---- > > mySem =3D sem_open(semname, O_CREAT | O_EXCL, > > (mode_t) IPCProtection, > > (unsigned) 1); > > > > #ifdef SEM_FAILED > > if (mySem !=3D (sem_t *) SEM_FAILED) > > break; > > #else > > if (mySem !=3D (sem_t *) (-1)) > > break; > > #endif > > > > /* Loop if error indicates a collision */ > > if (errno =3D=3D EEXIST || errno =3D=3D EACCES || errno = =3D=3D EINTR) > > continue; > > > > /* > > * Else complain and abort > > */ > > elog(FATAL, "sem_open(\"%s\") failed: %m", semname); > > } > > > > /* > > * Unlink the semaphore immediately, so it can't be accessed > > externally. > > * This also ensures that it will go away if we crash. > > */ > > sem_unlink(semname); > > > > return mySem; > > ---- > > > > -Max > > > > On Wed, Jun 22, 2016 at 3:02 AM, Konstantin Belousov < > kostikbel@gmail.com> > > wrote: > > > >> On Tue, Jun 21, 2016 at 12:48:00PM -0700, Maxim Sobolev wrote: > >>> Thanks, Konstantin for the great work, we are definitely looking > forward > >> to > >>> get all those improvements to be part of the default FreeBSD > kernel/port. > >>> Would be nice if you can post an update some day later as to what's > >>> integrated and what's not. > >> I did posted the update several days earlier. Since you replying to > this > >> thread, it would be not unreasonable to read recent messages that were > >> sent. > >> > >>> > >>> Just in case, I've opened #14206 with PG to switch us to using POSIX > >>> semaphores by default. Apart from the mentioned performance benefits, > >> SYSV > >>> semaphores are PITA to deal with as they come in very limited > quantities > >> by > >>> default. Also they might stay around if PG dies/gets nuked and preven= t > it > >>> from starting again due to overflow. We've got some quite ugly code t= o > >>> clean up those using ipcrm(1) in our build scripts to deal with just > >> that. > >>> I am happy that code could be retired now. > >> > >> Named semaphores also stuck around if processes are killed without > cleanup. > >> > >> > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfuHa-ZcZzspvsjhFASTQ841rfUu1gFV7KESBustMQD70Q>