Date: Fri, 13 Mar 2020 12:28:50 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 244792] [iscsi] ctladm islist leads to kernel panic if target ctl(4) port is disabled Message-ID: <bug-244792-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244792 Bug ID: 244792 Summary: [iscsi] ctladm islist leads to kernel panic if target ctl(4) port is disabled Product: Base System Version: CURRENT Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: aleksandr.fedorov@itglobal.com Created attachment 212383 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D212383&action= =3Dedit iscsi_ioctl_list panic + debug info I found an issue which leads to kernel panic. Test setup: Machine 1 - ISCSI target. Machine 2 - ISCSI initiator. Disable ctl(4) port on target: Machine 1# ctladm port -o off -p 3 Front End Ports disabled After that, initiator trying to reconnect: Machine 2# dmesg ... (da23:iscsi4:0:0:5): Periph destroyed (da22:iscsi4:0:0:4): Periph destroyed (da19:iscsi4:0:0:3): Periph destroyed (da17:iscsi4:0:0:2): Periph destroyed WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error; reconnecting WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error; reconnecting ... If I try to list iscsi sessions on target side - kernel panics. Machine 1# ctladm islist Fatal trap 12: page fault while in kernel mode cpuid =3D 11; apic id =3D 11 fault virtual address =3D 0x17c fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff831bb8c3 stack pointer =3D 0x28:0xfffffe01c358f780 frame pointer =3D 0x28:0xfffffe01c358f810 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 27739 (ctladm) trap number =3D 12 panic: page fault cpuid =3D 11 time =3D 1583839216 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c358f= 3e0 vpanic() at vpanic+0x185/frame 0xfffffe01c358f440 panic() at panic+0x43/frame 0xfffffe01c358f4a0 trap_fatal() at trap_fatal+0x386/frame 0xfffffe01c358f500 trap_pfault() at trap_pfault+0x99/frame 0xfffffe01c358f580 trap() at trap+0x2a7/frame 0xfffffe01c358f6b0 calltrap() at calltrap+0x8/frame 0xfffffe01c358f6b0 --- trap 0xc, rip =3D 0xffffffff831bb8c3, rsp =3D 0xfffffe01c358f780, rbp = =3D 0xfffffe01c358f810 --- cfiscsi_ioctl() at cfiscsi_ioctl+0x753/frame 0xfffffe01c358f810 devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe01c358f860 vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe01c358f970 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe01c358f990 kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe01c358f9f0 sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe01c358fac0 amd64_syscall() at amd64_syscall+0x168/frame 0xfffffe01c358fbf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01c358fbf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x8004c19ba, rsp =3D 0x7fffffffe448, rbp =3D 0x7fffffffeab0 --- KDB: enter: panic You can see full output with debug in attachment. The panic occurs in function cfiscsi_ioctl_list() https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis= ion=3D358333&view=3Dmarkup#l1718 Due the fact that cs->cs_target pointer is NULL (see attachment). I add some checks to prevent panic: Machine 1# ctladm islist ID Portal Initiator name Target name=20= =20=20=20=20=20=20=20=20=20 1 192.168.101.5 iqn.1994-09.org.freebsd:q1u005.z.vstack.com iqn.2018-11.com.vstack:target4=20=20=20=20=20=20 3 192.168.101.4 iqn.1994-09.org.freebsd:q1u004.z.vstack.com iqn.2018-11.com.vstack:target3=20=20=20=20=20=20 4 192.168.101.3 iqn.1994-09.org.freebsd:q1u003.z.vstack.com iqn.2018-11.com.vstack:target2=20=20=20=20=20=20 74 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 106 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 124 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 130 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 147 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 259 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20= =20=20=20=20=20=20=20=20=20 330 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none root@q1u001:~ # ps -l -p 0 -HSwww | grep cfiscsimt=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt]=20 As you can see, there are many partially initialized session and correspond= ing maintenance threads: https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis= ion=3D358333&view=3Dmarkup#l1161. After some investigation I found that cs->cs_target =3D=3D NULL because ses= sion doesn't terminated correctly. A new session is created in the cfiscsi_ioctl_handoff() function: https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis= ion=3D358333&view=3Dmarkup#l1490 Find target: ... 1505 ct =3D cfiscsi_target_find(softc, cihp->target_name, 1506 cihp->portal_group_tag); 1507 if (ct =3D=3D NULL) { 1508 ci->status =3D CTL_ISCSI_ERROR; 1509 snprintf(ci->error_str, sizeof(ci->error_str), 1510 "%s: target not found", __func__); 1511 return; 1512 } ... Create new session: allocate struct 'cs', start manteinance thread. 1539 cs =3D cfiscsi_session_new(softc, cihp->offload); 1540 if (cs =3D=3D NULL) { 1541 ci->status =3D CTL_ISCSI_ERROR; 1542 snprintf(ci->error_str, sizeof(ci->error_st= r), 1543 "%s: cfiscsi_session_new failed", __func__); 1544 cfiscsi_target_release(ct); 1545 return; 1546 } ... Check if target port is online. In our case target port is offline (ct->ct_online =3D=3D 0). 1583 if (ct->ct_online =3D=3D 0) { 1584 mtx_unlock(&softc->lock); 1585 cs->cs_handoff_in_progress =3D false; Terminate session: Send cv_signal() to mantainance thread, deallocate struc= t, etc. 1586 cfiscsi_session_terminate(cs); 1587 cfiscsi_target_release(ct); 1588 ci->status =3D CTL_ISCSI_ERROR; 1589 snprintf(ci->error_str, sizeof(ci->error_str), 1590 "%s: port offline", __func__); 1591 return; 1592 } The main problem is that mantainance thread not always receive cv_signal() = and stuck in cv_wait(). So we have many partially initilized sessions and mantanance threads. I see the following problems: 1. Flags cs->cs_handoff_in_progress and cs->cs_terminating which used by mantanance thread is changed without the lock. 2. As I understand cv_signal() must be called under lock: 390 /* 391 * Signal a condition variable, wakes up one waiting thread. Will also wakeup 392 * the swapper if the process is not in memory, so that it can bring the 393 * sleeping process in. Note that this may also result in additional threads 394 * being made runnable. Should be called with the same mutex as was pa= ssed to 395 * cv_wait held. 396 */ 397 void 398 cv_signal(struct cv *cvp) With next patch I can't reproduce the panic: diff --git a/sys/cam/ctl/ctl_frontend_iscsi.c b/sys/cam/ctl/ctl_frontend_iscsi.c index d5be20c2a215..1b7837aa8355 100644 --- a/sys/cam/ctl/ctl_frontend_iscsi.c +++ b/sys/cam/ctl/ctl_frontend_iscsi.c @@ -1582,8 +1582,10 @@ cfiscsi_ioctl_handoff(struct ctl_iscsi *ci) mtx_lock(&softc->lock); if (ct->ct_online =3D=3D 0) { mtx_unlock(&softc->lock); + CFISCSI_SESSION_LOCK(cs); cs->cs_handoff_in_progress =3D false; cfiscsi_session_terminate(cs); + CFISCSI_SESSION_UNLOCK(cs); cfiscsi_target_release(ct); ci->status =3D CTL_ISCSI_ERROR; snprintf(ci->error_str, sizeof(ci->error_str), 3. Why wee need at all to start the mantanance thread and than if ct->ct_on= line =3D=3D 0 immidiatelly destroy it. Can we check ct->ct_online early? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244792-227>