FreeBSD Mail Archives

Date:      Thu, 04 Oct 2001 07:29:44 -0400
From:      "Brian F. Feldman" <green@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        Brian Fundakowski Feldman <green@FreeBSD.org>, arch@FreeBSD.org
Subject:   Re: sx-ifying src/sys/vm 
Message-ID:  <200110041129.f94BTjl31274@green.bikeshed.org>
In-Reply-To: Your message of "Thu, 04 Oct 2001 02:31:06 PDT." <XFMail.011004023106.jhb@FreeBSD.org>

John Baldwin <jhb@FreeBSD.org> wrote:
> 
> On 04-Oct-01 Brian Fundakowski Feldman wrote:
> > I've taken a shot at changing the vm_map locking from using lockmgr() locks 
> > (EAAARGHH...) to sx locks.  Already I can tell that it 1. seems to work and 
> > 2. seems to not give me bogus lock ordering messages like lockmgr() used to.
> > I'd appreciate if people took a look at it and told me of any concerns, 
> > whether it also looks fine to more than just myself, etc.
> > 
> > I added a few more vm_map locking operations than were there already for the 
> > purpose of removing the hard-coded lockmgr() ops that were in a few places 
> > previously.  Instead of having a sx_locked() operation I used 
> > sx_try_xlock(), and instead of recursion I saved the state of the one case 
> > where a lock was attempted to be "recursed" upon, and removed the actual 
> > recursion on the lock.
> 
> Note that recursion of shared locks is allowed.  Also, the name
> vm_map_lookup2() is really gross.  However, if you only recurse on a read lock,
> then you can get rid of lockheld it seems and remove the vm_map_lookup()
> ugliness.
> 
> > Index: vm_fault.c
> > ===================================================================
> > RCS file: /home/freebsd/ncvs/src/sys/vm/vm_fault.c,v
> > retrieving revision 1.125
> > diff -u -r1.125 vm_fault.c
> > --- vm_fault.c        2001/09/12 08:38:11     1.125
> > +++ vm_fault.c        2001/10/04 00:41:36
> > @@ -213,12 +214,16 @@
> >       int hardfault;
> >       int faultcount;
> >       struct faultstate fs;
> > +     boolean_t user_wire;
> > 
> >       GIANT_REQUIRED;
> > 
> >       cnt.v_vm_faults++;
> >       hardfault = 0;
> > +     user_wire = (fault_flags & VM_FAULT_WIRE_MASK) == VM_FAULT_USER_WIRE;
> > +     fs.lockheld = user_wire;
> >  
> > +
> >  RetryFault:;
> >  
> >       /*
> 
> Extra blank line?

Oops :)

> > Index: vm_glue.c
> > ===================================================================
> > RCS file: /home/freebsd/ncvs/src/sys/vm/vm_glue.c,v
> > retrieving revision 1.119
> > diff -u -r1.119 vm_glue.c
> > --- vm_glue.c 2001/09/12 08:38:11     1.119
> > +++ vm_glue.c 2001/10/04 00:41:37
> > @@ -577,9 +577,7 @@
> >                        * data structures there is a
> >                        * possible deadlock.
> >                        */
> > -                     if (lockmgr(&vm->vm_map.lock,
> > -                                     LK_EXCLUSIVE | LK_NOWAIT,
> > -                                     NULL, curthread)) {
> > +                     if (vm_map_trylock(&vm->vm_map)) {
> >                               vmspace_free(vm);
> >                               PROC_UNLOCK(p);
> >                               goto nextproc;
> 
> Hrm, this reads weird.  It should be 'if (!vm_map_trylock()) {'.  I would make
> vm_map_trylock() return true on success and false on failure.  Same for
> vm_map_lock_upgrade().

I suppose, but errno return types are consistent with lockmgr.  I also 
expect them a hell of a lot more than some other kind of int return values.

> > Index: vm_map.c
> > ===================================================================
> > RCS file: /home/freebsd/ncvs/src/sys/vm/vm_map.c,v
> > retrieving revision 1.207
> > diff -u -r1.207 vm_map.c
> > --- vm_map.c  2001/09/12 08:38:11     1.207
> > +++ vm_map.c  2001/10/04 
> > @@ -403,7 +392,7 @@
> >       map->first_free = &map->header;
> >       map->hint = &map->header;
> >       map->timestamp = 0;
> > -     lockinit(&map->lock, PVM, "thrd_sleep", 0, LK_NOPAUSE);
> > +     sx_init(&map->lock, "thrd_sleep");
> >  }
> >  
> >  void
> 
> Why not a more recognizable name such as "vm_map"?

I was wondering that myself.

> > @@ -1490,7 +1479,6 @@
> >                       estart = entry->start;
> >  
> >                       /* First we need to allow map modifications */
> > -                     vm_map_set_recursive(map);
> >                       vm_map_lock_downgrade(map);
> >                       map->timestamp++;
> >  
> 
> Old bug: why is the timestamp updated while not holding an exclusive lock?  If
> anything, it should be bumped before the downgrade.  Otherwise there isn't a
> memory barrier there to ensure other slock holders will see the right timestamp
> on other CPU's.

I don't think the code was designed to make sense in an SMP world.  That in 
mind, I'll go through all vm_map locking-related calls and look for this 
kind of inconsistency.

> > @@ -1500,14 +1488,12 @@
> >                               entry->wired_count--;
> >                               entry->eflags &= ~MAP_ENTRY_USER_WIRED;
> >  
> > -                             vm_map_clear_recursive(map);
> > -                             vm_map_unlock(map);
> > +                             vm_map_unlock_read(map);
> >                               
> >                               (void) vm_map_user_pageable(map, start,
> entry->start, TRUE);
> >                               return rv;
> >                       }
> >  
> > -                     vm_map_clear_recursive(map);
> >                       if (vm_map_lock_upgrade(map)) {
> >                               vm_map_lock(map);
> >                               if (vm_map_lookup_entry(map, estart, &entry) 
> 
> Another old bug: why is the lock being modified w/o holding an exclusive lock? 
> Looks like this one is now moot, however.

What do you mean exactly?  The old way was to get the lockmgr's simplelock, 
modify the lockmgr itself, then release it.

The vm_map's entries are being modified with an exclusive lock there, if I'm 
not mistaken...  That may be an implicit dependency on Giant (I doubt it) or 
just a bug that the author didn't catch due to the following vm_map_unlock() 
not explicitly documenting whether the map was to be read-locked or 
write-locked at that point.

> > @@ -2678,6 +2670,22 @@
> >             vm_prot_t *out_prot,              /* OUT */
> >             boolean_t *wired)                 /* OUT */
> >  {
> > +     
> > +     return (vm_map_lookup2(var_map, vaddr, fault_typea, out_entry,
> > +         object, pindex, out_prot, wired, 0));
> > +}
> > +
> > +int
> > +vm_map_lookup2(vm_map_t *var_map,            /* IN/OUT */
> > +           vm_offset_t vaddr,
> > +           vm_prot_t fault_typea,
> > +           vm_map_entry_t *out_entry,        /* OUT */
> > +           vm_object_t *object,              /* OUT */
> > +           vm_pindex_t *pindex,              /* OUT */
> > +           vm_prot_t *out_prot,              /* OUT */
> > +           boolean_t *wired,                 /* OUT */
> > +           boolean_t lockheld)
> > +{
> >       vm_map_entry_t entry;
> >       vm_map_t map = *var_map;
> >       vm_prot_t prot;
> > @@ -2690,11 +2698,13 @@
> >        * Lookup the faulting address.
> >        */
> >  
> > -     vm_map_lock_read(map);
> > +     if (!lockheld)
> > +             vm_map_lock_read(map);
> >  
> >  #define      RETURN(why) \
> >               { \
> > -             vm_map_unlock_read(map); \
> > +             if (!lockheld) \
> > +                     vm_map_unlock_read(map); \
> >               return(why); \
> >               }
> >  
> > @@ -2730,7 +2740,8 @@
> >               vm_map_t old_map = map;
> >  
> >               *var_map = map = entry->object.sub_map;
> > -             vm_map_unlock_read(old_map);
> > +             if (!lockheld)
> > +                     vm_map_unlock_read(old_map);
> >               goto RetryLookup;
> >       }
> >  
> 
> Ok, I don't like the lockheld parameter here. :)  It looks like the recursive
> locks are read locks.  Is the problem that you can call this function with an
> exclusive lock held?  If so, I would rather that you add an assertion
> SX_ASSERT_LOCKED(&map->lock) that the map is locked (either read or write) by
> the caller.  Then vm_map_lookup() can simply assume the map is already locked.

vm_map_lookup() calls vm_map_lock_upgrade(), and in this case it would call 
vm_map_lock_upgrade() while already having two shared locks, thus making the 
sx lock have an sx_cnt of 2 and the vm_map_lock_upgrade() would always 
subsequently fail in sx_try_upgrade, would it not?

> > Index: vm_zone.c
> > ===================================================================
> > RCS file: /home/freebsd/ncvs/src/sys/vm/vm_zone.c,v
> > retrieving revision 1.48
> > diff -u -r1.48 vm_zone.c
> > --- vm_zone.c 2001/08/05 03:55:02     1.48
> > +++ vm_zone.c 2001/10/04 00:41:48
> > @@ -409,11 +409,12 @@
> >                * map.
> >                */
> >               mtx_unlock(&z->zmtx);
> > -             if (lockstatus(&kernel_map->lock, NULL)) {
> > +             if (vm_map_trylock(kernel_map)) {
> >                       item = (void *) kmem_malloc(kmem_map, nbytes, M_WAITOK);
> >                       if (item != NULL)
> >                               atomic_add_int(&zone_kmem_pages, z->zalloc);
> >               } else {
> > +                     vm_map_unlock(kernel_map);
> >                       item = (void *) kmem_alloc(kernel_map, nbytes);
> >                       if (item != NULL)
> >                               atomic_add_int(&zone_kern_pages, z->zalloc);
> 
> Question: why does the zone allocator care if the map is locked or not?  Also,
> this has a race: some other thread could easily come in and lock the map in
> between the vm_map_unlock() and the kmem_alloc(), thus rendering this test
> useless.  This entire test here is not thread safe and needs to be rethought.

I believe the code is meaning to check lockstatus() is LK_EXCLOTHER or 
LK_SHARED, and it would be broken if LK_EXCLUSIVE were returned here :)

I really was wondering about this.  If you can't have faults in an interrupt 
context (well, you can't use lockmgr from an interrupt context that I know 
either...), then in non-SMP world this would be just fine, wouldn't it?

Anyway, it would be easy to fix if kmem_alloc() didn't xlock the map itself, 
which it does.  It would be easy to make it do that by generating, say, a 
kmem_alloc2() ;)

-- 
 Brian Fundakowski Feldman           \  FreeBSD: The Power to Serve!  /
 green@FreeBSD.org                    `------------------------------'



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200110041129.f94BTjl31274>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation