From owner-freebsd-arch@freebsd.org Wed Dec 23 17:25:41 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A1440A4FDCE for ; Wed, 23 Dec 2015 17:25:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 8E5E81EE5 for ; Wed, 23 Dec 2015 17:25:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 88FBDA4FDCA; Wed, 23 Dec 2015 17:25:41 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E897A4FDC9; Wed, 23 Dec 2015 17:25:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 18A221EE2; Wed, 23 Dec 2015 17:25:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tBNHPS51064753 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 23 Dec 2015 19:25:28 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tBNHPS51064753 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tBNHPSaZ064752; Wed, 23 Dec 2015 19:25:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 23 Dec 2015 19:25:28 +0200 From: Konstantin Belousov To: threads@freebsd.org, arch@freebsd.org Subject: libthr shared locks Message-ID: <20151223172528.GT3625@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Dec 2015 17:25:41 -0000 A well-known limitation of the FreeBSD libthr implementation of the pthread locking objects is the missed support for the process-shared locks. The hardest part of implementing the support is the neccessity of providing ABI compatibility for the current implementation. Right now, the ABI-visible handle to the locks is a single pointer word. As an example which allows to make the description less vague, let consider pthread_mutex_t. It is defined in sys/sys/_pthreadtypes.h as typedef struct pthread_mutex *pthread_mutex_t; The pointer points to the following structure, after the pthread_mutex_init(3) is called struct pthread_mutex { struct umutex m_lock; int m_flags; struct pthread *m_owner; int m_count; int m_spinloops; int m_yieldloops; TAILQ_ENTRY(pthread_mutex) m_qe; }; struct umutex { volatile __lwpid_t m_owner; /* Owner of the mutex */ __uint32_t m_flags; /* Flags of the mutex */ __uint32_t m_ceilings[2]; /* Priority protect ceiling */ __uint32_t m_spare[4]; }; Would the ABI modified to make the pthread_mutex_t large enough to hold struct pthread_mutex, the rest of the implementation of the shared mutex is relatively trivial, if not already done. Changing this ABI is very hard. libthr provides the symbol versioning, which allows to provide compatible shims for the previous ABI variant. But since userspace tends to use the pthread objects in the layouts of the library objects, this causes serious ABI issues when mixing libraries built against different default versions of libthr. My idea to provide the shared locks, while not changing the ABI for libthr, is to use marker pointers to indicate the shared objects. The real struct pthread_mutex, which carries the locking information, is allocated by at the off-page from some anonymous posix shared memory object. The marker is defined as #define THR_PSHARED_PTR \ ((void *)(uintptr_t)((1ULL << (NBBY * sizeof(long) - 1)) | 1)) The bit-pattern is 1000....00001. There are two tricks used: 1. All correctly allocated objects in all supported ABIs are at least word-aligned, so the least-significant bit cannot be set. This should made the THR_PSHARED_PTR pattern unique against non-shared allocations. 2. The high bit is set, which makes the address non-canonical on amd64, causing attempts to dereference the pointer guaranteed to segfault, instead of relying of not having the corresponding page not mapped on the weakly-aligned arches. The majority of the libthr modifications follow the easy pattern where the library must store the THR_PSHARED_PTR upon the initialization of the shared objects, allocate the off-page and initialize the lock there. If a call assumes that the object is already initialized, then the we must not instantiate the off-page. To speed-up the lookup, a cache is kept at the userspace which translates address of locks to the off-page. Note that we can safely ignore possible unmapping of the locks, since correct pthread_* API use assumes the call to pthread_*_destroy() on the end of the object lifecycle. If the lock is remapped in the usermode, then userspace off-page translation cache fails, but kernel returns the same shm for lookup, and we end with two off-page mappings, which is acceptable. Kernel holds a lookup table which translates the (vm_object, offset) pair, obtained by the dereference of the user-space address, into the posix shared memory object. The lifecycle of the shm objects is bound to the existence of the corresponding vm object. Note that lifecycle of the kernel objects does not correspond well to the lifecycle of the vnode vm object. Closed vnode could be recycled by VFS for whatever reasons, and then we would loose the entry in the registry. I am not sure if this is very serious issue, since I suppose that typical use case assumes the anonymous shared memory backing. Right now kernel drops the off-page shm object on the last vnode unmap. Due to backing by the kernel objects, the implementation imposes per-uid limits on the amount of the shared objects created. An issue is that there are no such limits in other implementations. Overhead of the implementation, comparing with the non-process shared locks, is due to the mandatory off-page lookup, which is mostly ammortized by the (read-locked) userspace cache. Also, for each shared lock we get an additional page of memory, which works fine assuming the applications use limited amount of the shared locks. Cost for the non-shared locks is a single memory load for each pthread_* call. Below are the timing results of my implementation on the 4-core sandy against the Fedora 22 glibc, done with the same program on the same hardware (https://www.kib.kiev.ua/kib/pshared/pthread_shared_mtx1.c). [FreeBSD] # time /root/pthread_shared_mtx1-64 iter1 10000000 aiter1 10000000 iter2 10000000 aiter2 10000000 ./pthread_shared_mtx1-64 2.47s user 3.27s system 166% cpu 3.443 total [Fedora] [kostik@sandy tests]$ /usr/bin/time ./pthread_shared_mtx1-linux64 iter1 10000000 aiter1 10000000 iter2 10000000 aiter2 10000000 1.38user 2.46system 0:01.95elapsed 196%CPU (0avgtext+0avgdata 1576maxresident)k 0inputs+0outputs (0major+142minor)pagefaults 0swaps The implementation in the patch https://www.kib.kiev.ua/kib/pshared/pshared.1.patch gives shared mutexes, condvars, rwlocks and barriers. I did some smoke-testing, only on amd64. Not implementated are the robust mutexes. I want to finalize this part of work before implementing robustness, but some restructuring in the patch, which seems to be arbitrary, like the normal/pp queues rework to live in arrays, is a preparation to the robustness feature. The work was sponsored by The FreeBSD Foundation, previous and current versions of idea and previous patch were discussed with John Baldwin and Ed Maste.