From owner-freebsd-smp Sun Jun 27 22: 4:12 1999 Delivered-To: freebsd-smp@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id C101314F2C for ; Sun, 27 Jun 1999 22:04:08 -0700 (PDT) (envelope-from tlambert@usr04.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id WAA08612; Sun, 27 Jun 1999 22:04:07 -0700 (MST) Received: from usr04.primenet.com(206.165.6.204) via SMTP by smtp03.primenet.com, id smtpd008550; Sun Jun 27 22:03:58 1999 Received: (from tlambert@localhost) by usr04.primenet.com (8.8.5/8.8.5) id WAA26806; Sun, 27 Jun 1999 22:03:54 -0700 (MST) From: Terry Lambert Message-Id: <199906280503.WAA26806@usr04.primenet.com> Subject: Re: high-efficiency SMP locks - submission for review To: dillon@apollo.backplane.com (Matthew Dillon) Date: Mon, 28 Jun 1999 05:03:54 +0000 (GMT) Cc: freebsd-smp@FreeBSD.ORG In-Reply-To: <199906272024.NAA15634@apollo.backplane.com> from "Matthew Dillon" at Jun 27, 99 01:24:01 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I would like to know what the SMP gurus think of this code. I have > not posted the complete patch, but have included the meat of the code. > I am interested in comments relating to the implementation & performance > aspects of the code. I would also appreciate it if an inline assembly > expert could look at my inline assembly. I have tested it well and > I believe it to be correct, but I don't write much inline assembly so... > > This code is designed to implement extremely low overhead SMP-compatible > locks. It will probably be used instead of lockmgr() locks for the > buffer cache, and I hope to use it as a basis to replace lockmgr() locks > in other modules later on. The same structure can be used to implement > both spin locks and normal blocking locks, and can handle shared, > exclusive, and recursive-exclusive lock types. The critical path has > been optimize down to 20 instructions or so ( compared to the hundreds > in lockmgr() ). > > As many of you may already know, lockmgr() locks are extremely expensive > and require interrupt synchronization for their internal simplelocks to > operate properly in a mixed process/interrupt environment. The result is > a horrendous amount of overhead for very little gain. I am attempting > to come up with a general locking module that can eventually replace > lockmgr() locks. The lockmgr() locks are ill suited to use for anything other than advisory/mandatory file locking, IMO. Any move away from them is a good thing. However, I have some comment on the implementation. 1) SMP locking should be done utilizing intention modes. I had a nice discussion with Mike Smith and Julian over the proposed locking mechanism for non-intention mode multiple reader, single writer locks for the buffer cache. This boiled down to: A) Use of non-intention mode locks has a serious serialization penalty. B) Use of locks, where the write queue has a shared intention exclusive (SIX) lock implicit to the enqueueing of synchronization point data (e.g. a soft dependency on a directory entry block with one backed out operation) _significantly_ reduces the serialization overhead. C) There's really no reason that the blocks that are not locked exclusive on the list can't be updated by user processes. This resolves the Ziff-Davis Labs benchmark issue, more cleanly than the way that Mike stated Kirk proposed to solve the problem. 2) In general, as regards, SMP locking, I think that your lock structure is too abbreviated: > struct qlock { > volatile int count; /* lock refs: neg=exclusive, pos=shared */ > int waiting; /* process(s) are waiting on lock */ > struct proc *holder; /* holder of an exclusive lock */ > }; In particular, the use of a single wait count means that you will not have an ordered list for the equivalent of a "wake one". This is a real problem, since it allows for a deadly embrace deadlock to occur when two kernel contexts each hold one lock and want to acquire the other contexts lock. I believe that resolving this as an EWOULDBLOCK, and then backtracking the stack and any partially complete operations is prohibitive. The use of a count, rather than a relationship between the loc holder(s) and the things being locked is also similarly problematic. Finally, there is insufficient mechanism to avoid competition starvation, where a write blocks indefinitely as multiple readers pass the resource between each other. I believe the following is the minimal set of structures required to resolve the blocking operation inheritance and deadlock detection: /* * */ struct lockable; typedef struct lockable LKA; struct lockingentity; typedef struct lockingentity LKE; struct locklistentry; typedef struct locklistentry LKLE; struct lockentitylistentry; typedef struct lockentitylistentry LKEE; struct lockentitylistentry { LKEE *next; /* next list entry*/ LKEE *inherit; /* who inherits this dependency*/ LKE *entity; /* entity waiting on us*/ }; struct locklistentry { LKLE *next; /* next lock in list*/ LKA *held; /* a held lock*/ }; struct lockingentity { LKLE *holding; /* locks that entity is holding*/ LKA *waiton; /* lock that entity is waiting on*/ }; struct lockable { LKEE *holds; /* the locking entities*/ LKEE *waits; /* entities waiting for the lock*/ }; This presumes a model where each context which wishes to acquire a lock is considered a lockingentity (this works equally well for kernel threads vs. processes vs. async call gates, vs. pseudo async call gates ala lazy thread context creation for kernel sleeps ala BSDI). Each object that can be locked is considered a lockable, and the relationship between a locking entity and a lockable is a locklistentry. The lockentitylistentry is used to inherit blocked, pending operations to the root of the lock tree (graph). This allows near instantaneous deadlock detection, and the inheritance list combined with the graph represents the transitive closure patch between any lock you propose, and the locks which already exist. Finally, one could envision a hierarchical relationship between lockingentities, e.g. multiple threads within a process, so as to avoid self-deadlock. This really depends on your threads implemenation; obviously, a user space call conversion mechanism backed by multiple kernel threads, and implemented wholly using asynchronous kernel entries (e.g. an async call gate) would be immune to the requirement. Other implementations would need something like: /* * iml.c * * An intention mode based lock manager * * This software uses graph theory. A graph is a mathematical object * that can accurately model a problem formulated in terms of objects * and the relationships between them. Lock management is one such * problem. I recommend: * * Algorithms in C++ * Robert Sedgewick * Addison-Wesley Publishing Company * ISBN 0-201-51059-6 * * The Art Of computer Programming * _Volume 1: Fundamental Algorithms_ * Donald Knuth * Addison-Wesley Publishing Company * ISBN 0-201-03809-9 * * UNIX Systems for Modern Architectures * _Symmetric Multiprocessing and Caching for Kernel Programmers_ * Curt Schimmel * Addison-Wesley Publishing Company * ISBN 0-201-63338-8 * */ struct lockable; typedef struct lockable IML_L; struct lockentity; typedef struct lockentity IML_E; struct locklock; typedef struct locklock IML_LOCK; /* * A node that can have a lock applied against it */ struct lockable { IML_L *parent; /* our parent locakable*/ IML_L *sibling; /* sibling lockable(s), if any*/ IML_L *child; /* child lockable(s), if any*/ IML_LOCK *locks; /* locks that are held*/ IML_LOCK *waits; /* locks that are waiting*/ unsigned long nesting_lvl; }; /* * A lock entity node for holding locks and entity relationships */ struct lockentity { IML_E *sibling; /* entites with a relationship to us*/ IML_LOCK *locks; /* the list of locks we hold*/ }; /* * A lock instance */ struct locklock { IML_E *entity; /* the entity that applied the lock*/ IML_L *lockable; /* where the lock was applied*/ IML_LOCK *enextlock; /* the next lock by the entity*/ IML_LOCK *lnextlock; /* the next lock on the lockable*/ unsigned long mode; /* the mode of this lock*/ }; Note: This doesn't deal with the inheritance issue for waiters, which would use a structure similar to the structure from the previous example. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message