Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Dec 2020 15:36:06 -0800
From:      Bryan Drewery <bryan-lists@shatow.net>
To:        Mark Johnston <markj@freebsd.org>, "Leverett, Bruce" <bleverett@panasas.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: kernel: page fault in unp_pcb_owned_lock2_slowpath
Message-ID:  <593fd094-42fe-719c-63e5-d1fbbf422563@shatow.net>
In-Reply-To: <20201009124933.GB29607@raichu>
References:  <BN7PR08MB56681AE7A0C9A73ED2D908B2A50B0@BN7PR08MB5668.namprd08.prod.outlook.com> <20201009124933.GB29607@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/9/2020 5:49 AM, Mark Johnston wrote:
> On Thu, Oct 08, 2020 at 09:58:09PM +0000, Leverett, Bruce wrote:
>> In 12.1, we are seeing a page fault in unp_pcb_owned_lock2_slowpath, while trying to lock unp2.  Examination of the crash dump shows that unp2's reference count is down to zero, which it shouldn't be, since the function took a reference on it before unlocking unp.
>>
>> Could this be a bug that has been fixed in recent versions?  I would look into upgrading, or back-porting the fix, if a fix is known.
> 
> I recently fixed a few issues with the unix domain socket locking code.
> The commits were merged to stable/12 in r366488.  There's a few earlier
> fixes in uipc_usrreq.c that were merged after 12.1, so you might have
> luck backporting those as well.  I'm not sure what the specific bug is
> in your case; a backtrace at least might be enough to pinpoint it.

For google's sake I'm replying.

Probably this:

kernel:trap_fatal+0x96
kernel:trap+0x76
kernel:__mtx_lock_sleep+0xe7
kernel:__mtx_lock_flags+0x100
kernel:unp_pcb_owned_lock2_slowpath+0x66
kernel:uipc_send+0x105e
kernel:sosend_generic+0x4ae
kernel:kern_sendit+0x1a7
kernel:sendit+0x260
kernel:sys_sendto+0x4c
kernel:amd64_syscall+0x327

>From my limited triage it appeared to be that unp2 was trying to lock
unp->unp_conn as it was being nulled/disconnected elsewhere. It did seem
to be races in the ref/locking code.

>          if ((unp2 = unp->unp_conn)  == NULL) {
>                  UNP_PCB_UNLOCK(unp);          
>                  error = ENOTCONN;             
>                  break;                        
>          }                                     
>  }                                             
>  if (__predict_false(unp == unp2)) {           
>          if (unp->unp_socket == NULL) {        
>                  error = ENOTCONN;             
>                  break;                        
>          }                                     
>          goto connect_self;                    
>  }                                             
>  unp_pcb_owned_lock2(unp, unp2, freed);        
unp2 was set here but unp->unp_conn was NULL by the time it tried to
lock unp2. The old locking was strange and seemed to assume some
invariant as it set unp2 from unp->unp_conn, then took a ref on unp2,
unlocked unp, then locked unp2. But I didn't see how unp2 could be kept
alive between the NULL check above and the ref taken in
unp_pcb_owned_lock2_slowpath. Anyway I figured it was those commits too.


#3  0xffffffff806e2fbf in uipc_send (so=0xfffffe8cc082a6d0,
flags=<optimized out>, m=0xffffffbf000007b4, nam=0x0, control=<optimized
out>, td=<optimized out>) at /b/mnt/src/sys/kern/uipc_usrreq.c:1095
1095                    unp_pcb_owned_lock2(unp, unp2, freed);
(gdb) p nam
$11 = (struct sockaddr *) 0x0
(gdb) set $unp = ((struct unpcb *)((so)->so_pcb))
(gdb) p *$unp
$14 = {
  unp_mtx = {
    lock_object = {
      lo_name = 0xffffffff80ae79ba "unp",
      lo_flags = 21168128,
      lo_data = 0,
      lo_witness = 0xfffff80431cd9b00
    },
    mtx_lock = 0
  },
  unp_conn = 0x0,
  unp_refcount = 1,
  unp_flags = 0,
  unp_gcflag = 4,
  unp_addr = 0x0,
  unp_socket = 0xfffffe8cc082a6d0,
  unp_vnode = 0x0,
  unp_peercred = 0x0,
  unp_reflink = {
    le_next = 0xffffffffffffffff,
    le_prev = 0xffffffffffffffff
  },
  unp_link = {
    le_next = 0xfffff801b9e68c00,
    le_prev = 0xfffff801b9b3e120
  },
  unp_refs = {
    lh_first = 0x0
  },
  unp_gencnt = 935,
  unp_file = 0x0,
  unp_msgcount = 0,
  unp_ino = 0
}
(gdb) set $unp2 = $unp->unp_conn
(gdb) p $unp2
$15 = (struct unpcb *) 0x0
...
#2  0xffffffff806e4017 in unp_pcb_owned_lock2_slowpath (unp=<optimized
out>, unp2p=0xfffffe8872867830, freed=0xfffffe887286784c) at
/b/mnt/src/sys/kern/uipc_usrreq.c:372
372             UNP_PCB_LOCK(unp2);
(gdb) p unp2
$23 = (struct unpcb *) 0xfffff801b9baa900
(gdb) p freed
$24 = (int *) 0xfffffe887286784c
(gdb) p *freed
$25 = 0


-- 
Regards,
Bryan Drewery
bdrewery@freenode/EFNet



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?593fd094-42fe-719c-63e5-d1fbbf422563>