Date: Tue, 6 Dec 2016 17:58:32 +0100 From: Dimitri Staessens <dimitri.staessens@intec.ugent.be> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-threads@freebsd.org Subject: Re: Unlocking a robust mutex in a cleanup handler Message-ID: <1c235c8f-b1db-f107-63e2-28e099e17667@intec.ugent.be> In-Reply-To: <20161206163807.GT54029@kib.kiev.ua> References: <119e59d4-6125-f313-e6e6-67055a15d224@intec.ugent.be> <20161206112558.GN54029@kib.kiev.ua> <6a7139cd-b6db-d078-ee5e-b7c590eb13d1@intec.ugent.be> <20161206144812.GS54029@kib.kiev.ua> <35726dbb-75f7-682d-ad41-c78b96675485@intec.ugent.be> <20161206163807.GT54029@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Konstantin, you're right. I moved home from work, so the machine was rebooted. Your patch works. It will have choked on the already initialized mutex in the shm segment as you suggest, the small test wasn't handling that case. I can't get the situation back by kiling the process while running (it was probably due to the specific location of the PANIC which led to a specific state in that mutex that was causing the lock up). Patch confirmed as working, thanks for your help! Dimitri On 12/06/16 17:38, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 04:30:14PM +0100, Dimitri Staessens wrote: >> Dear Konstantin, >> >> I didn't get the error, but on my machine the thread never exists the >> condwait when the pthread_cancel is called. >> >> gdb output: >> >> $ sudo gdb ./robust_test 3246 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"...(no debugging >> symbols found)... >> Attaching to program: /usr/home/dstaesse/robust_test, process 3246 >> Reading symbols from /lib/libthr.so.3...Reading symbols from >> /usr/lib/debug//lib/libthr.so.3.debug...done. >> [New Thread 801416500 (LWP 100404/robust_test)] >> [New Thread 801416000 (LWP 100313/robust_test)] >> done. >> Loaded symbols for /lib/libthr.so.3 >> Reading symbols from /usr/lib/librt.so.1...done. >> Loaded symbols for /usr/lib/librt.so.1 >> Reading symbols from /lib/libc.so.7...done. >> Loaded symbols for /lib/libc.so.7 >> Reading symbols from /libexec/ld-elf.so.1...done. >> Loaded symbols for /libexec/ld-elf.so.1 >> [Switching to Thread 801416000 (LWP 100313/robust_test)] >> 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 >> (gdb) info threads >> * 2 Thread 801416000 (LWP 100313/robust_test) 0x00000008008386ac in >> _umtx_op_err () from /lib/libthr.so.3 >> 1 Thread 801416500 (LWP 100404/robust_test) _thr_ast >> (curthread=0x801416500) at /usr/src/lib/libthr/thread/thr_sig.c:271 >> Current language: auto; currently minimal >> (gdb) bt >> #0 0x00000008008386ac in _umtx_op_err () from /lib/libthr.so.3 >> #1 0x0000000800834df6 in join_common (pthread=<value optimized out>, >> thread_return=<value optimized out>, abstime=<value optimized out>) >> at /usr/src/lib/libthr/thread/thr_join.c:125 >> #2 0x0000000000401186 in main () >> (gdb) thread 1 >> [Switching to thread 1 (Thread 801416500 (LWP 100404/robust_test))]#0 >> _thr_ast (curthread=0x801416500) >> at /usr/src/lib/libthr/thread/thr_sig.c:271 >> 271 check_suspend(curthread); >> (gdb) bt >> #0 _thr_ast (curthread=0x801416500) at >> /usr/src/lib/libthr/thread/thr_sig.c:271 >> #1 0x0000000800837a5b in __thr_pshared_offpage (key=<value optimized >> out>, doalloc=<value optimized out>) >> at /usr/src/lib/libthr/thread/thr_pshared.c:86 >> #2 0x00000008008363cb in cond_wait_common (cond=<value optimized out>, >> mutex=0x800643004, abstime=0x0, cancel=1) >> at /usr/src/lib/libthr/thread/thr_cond.c:349 >> #3 0x0000000000400ff2 in blockfunc () >> #4 0x000000080082ab55 in thread_start (curthread=<value optimized out>) >> at /usr/src/lib/libthr/thread/thr_create.c:289 >> #5 0x0000000000000000 in ?? () >> (gdb) >> > I suspect that there is an issue with the test program itself. > > If you terminate your program, e.g. with SIGING/Ctrl-C, then shm_unlink() > call is not performed at the end, and orphaned locked robust mutex is kept > associated with that memory segment. Then, since you have the loop around > pthread_cond_wait() call, it seems feasible to assume that the next > instance of the program gets ignored errors from pthread_mutex_lock() > and pthread_cond_wait(). > > This is explicitely allowed by POSIX, which states that "Attempting to > initialize an already initialized mutex results in undefined behavior." > > Can you try the following modification of your test program, without > rebooting the machine, so that the shared segment and mutex were kept > around ? > > #define _POSIX_C_SOURCE 200809L > #define __XSI_VISIBLE 500 > > #include <pthread.h> > #include <sys/mman.h> > #include <unistd.h> > #include <stdio.h> > #include <string.h> > #include <fcntl.h> > > #define FN "/robust" > #define FS (sizeof(int) + sizeof(pthread_mutex_t) + sizeof(pthread_cond_t)) > > /* contents of the shm segment */ > int * shm_int; > pthread_mutex_t * shm_mtx; > pthread_cond_t * shm_cnd; > > /* function for thread */ > void * blockfunc(void * o) > { > int error; > > printf("Thread started...\n"); > > error = pthread_mutex_lock(shm_mtx); > if (error != 0) > printf("mutex_lock err %d %s\n", error, strerror(error)); > > pthread_cleanup_push((void (*)(void *)) pthread_mutex_unlock, > (void *) shm_mtx); > > error = 0; > while (*shm_int == 0 && error == 0) > error = pthread_cond_wait(shm_cnd, shm_mtx); > if (error != 0) > printf("cond_wait err %d %s\n", error, strerror(error)); > > pthread_cleanup_pop(1); > > return (void *) 0; > } > > int > main(void) > { > /* file descriptor for shm_open */ > int fd; > > /* mutex and condvar attributes */ > pthread_mutexattr_t mattr; > pthread_condattr_t cattr; > > /* thread that will block on the convar in shm */ > pthread_t thr; > > printf("Initializing...\n"); > > /* create shm segment containing an int, a mutex, and a condvar */ > fd = shm_open(FN, O_CREAT | O_RDWR, 0666); > ftruncate(fd, FS - 1); > shm_int = mmap(NULL, FS, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > shm_mtx = (pthread_mutex_t *) (shm_int + 1); > shm_cnd = (pthread_cond_t *) (shm_mtx + 1); > > close(fd); > > /* initialize the contents */ > > pthread_mutexattr_init(&mattr); > pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED); > pthread_mutexattr_setrobust(&mattr, PTHREAD_MUTEX_ROBUST); > > pthread_condattr_init(&cattr); > pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); > > pthread_mutex_init(shm_mtx, &mattr); > pthread_cond_init(shm_cnd, &cattr); > > *shm_int = 0; > > /* start the thread */ > printf("Starting thread...\n"); > pthread_create(&thr, NULL, blockfunc, NULL); > > /* sleep for a second */ > printf("Sleeping for one second...\n"); > sleep(1); > > /* cancel the thread */ > printf("Cancelling thread...\n"); > pthread_cancel(thr); > > /* wait for the thread to join */ > pthread_join(thr, NULL); > > printf("Thread finished.\n"); > pthread_mutex_destroy(shm_mtx); > pthread_cond_destroy(shm_cnd); > > /* cleanup shared memory */ > munmap(shm_int, FS); > shm_unlink(FN); > > printf("Bye.\n"); > return (0); > } -- Dimitri Staessens Ghent University - imec Dept. of Information Technology (INTEC) Internet Based Communication Networks and Services Technologiepark 15 9052 Zwijnaarde T: +32 9 331 48 70 F: +32 9 331 48 99
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1c235c8f-b1db-f107-63e2-28e099e17667>