Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Mar 2020 12:28:50 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 244792] [iscsi] ctladm islist leads to kernel panic if target ctl(4) port is disabled
Message-ID:  <bug-244792-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244792

            Bug ID: 244792
           Summary: [iscsi] ctladm islist leads to kernel panic if target
                    ctl(4) port is disabled
           Product: Base System
           Version: CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: aleksandr.fedorov@itglobal.com

Created attachment 212383
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D212383&action=
=3Dedit
iscsi_ioctl_list panic + debug info

I found an issue which leads to kernel panic.

Test setup:

Machine 1 - ISCSI target.
Machine 2 - ISCSI initiator.

Disable ctl(4) port on target:
Machine 1# ctladm port -o off -p 3
Front End Ports disabled

After that, initiator trying to reconnect:

Machine 2# dmesg
...
(da23:iscsi4:0:0:5): Periph destroyed
(da22:iscsi4:0:0:4): Periph destroyed
(da19:iscsi4:0:0:3): Periph destroyed
(da17:iscsi4:0:0:2): Periph destroyed
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
...

If I try to list iscsi sessions on target side - kernel panics.

Machine 1# ctladm islist

Fatal trap 12: page fault while in kernel mode
cpuid =3D 11; apic id =3D 11
fault virtual address   =3D 0x17c
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff831bb8c3
stack pointer           =3D 0x28:0xfffffe01c358f780
frame pointer           =3D 0x28:0xfffffe01c358f810
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 27739 (ctladm)
trap number             =3D 12
panic: page fault
cpuid =3D 11
time =3D 1583839216
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c358f=
3e0
vpanic() at vpanic+0x185/frame 0xfffffe01c358f440
panic() at panic+0x43/frame 0xfffffe01c358f4a0
trap_fatal() at trap_fatal+0x386/frame 0xfffffe01c358f500
trap_pfault() at trap_pfault+0x99/frame 0xfffffe01c358f580
trap() at trap+0x2a7/frame 0xfffffe01c358f6b0
calltrap() at calltrap+0x8/frame 0xfffffe01c358f6b0
--- trap 0xc, rip =3D 0xffffffff831bb8c3, rsp =3D 0xfffffe01c358f780, rbp =
=3D
0xfffffe01c358f810 ---
cfiscsi_ioctl() at cfiscsi_ioctl+0x753/frame 0xfffffe01c358f810
devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe01c358f860
vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe01c358f970
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe01c358f990
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe01c358f9f0
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe01c358fac0
amd64_syscall() at amd64_syscall+0x168/frame 0xfffffe01c358fbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01c358fbf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x8004c19ba, rsp =3D
0x7fffffffe448, rbp =3D 0x7fffffffeab0 ---
KDB: enter: panic

You can see full output with debug in attachment.

The panic occurs in function cfiscsi_ioctl_list()
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis=
ion=3D358333&view=3Dmarkup#l1718

Due the fact that cs->cs_target pointer is NULL (see attachment).
I add some checks to prevent panic:

Machine 1# ctladm islist
  ID Portal           Initiator name                       Target name=20=
=20=20=20=20=20=20=20=20=20
   1 192.168.101.5    iqn.1994-09.org.freebsd:q1u005.z.vstack.com
iqn.2018-11.com.vstack:target4=20=20=20=20=20=20
   3 192.168.101.4    iqn.1994-09.org.freebsd:q1u004.z.vstack.com
iqn.2018-11.com.vstack:target3=20=20=20=20=20=20
   4 192.168.101.3    iqn.1994-09.org.freebsd:q1u003.z.vstack.com
iqn.2018-11.com.vstack:target2=20=20=20=20=20=20
  74 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 106 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 124 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 130 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 147 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 259 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none=20=
=20=20=20=20=20=20=20=20=20
 330 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none

root@q1u001:~ # ps -l -p 0 -HSwww | grep cfiscsimt=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]=20

As you can see, there are many partially initialized session and correspond=
ing
maintenance threads:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis=
ion=3D358333&view=3Dmarkup#l1161.

After some investigation I found that cs->cs_target =3D=3D NULL because ses=
sion
doesn't terminated correctly.

A new session is created in the cfiscsi_ioctl_handoff() function:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revis=
ion=3D358333&view=3Dmarkup#l1490

Find target:
...
1505            ct =3D cfiscsi_target_find(softc, cihp->target_name,
1506                cihp->portal_group_tag);
1507            if (ct =3D=3D NULL) {
1508                    ci->status =3D CTL_ISCSI_ERROR;
1509                    snprintf(ci->error_str, sizeof(ci->error_str),
1510                        "%s: target not found", __func__);
1511                    return;
1512            }
...

Create new session: allocate struct 'cs', start manteinance thread.

1539                    cs =3D cfiscsi_session_new(softc, cihp->offload);
1540                    if (cs =3D=3D NULL) {
1541                            ci->status =3D CTL_ISCSI_ERROR;
1542                            snprintf(ci->error_str, sizeof(ci->error_st=
r),
1543                                "%s: cfiscsi_session_new failed",
__func__);
1544                            cfiscsi_target_release(ct);
1545                            return;
1546                    }
...

Check if target port is online. In our case target port is offline
(ct->ct_online =3D=3D 0).

1583            if (ct->ct_online =3D=3D 0) {
1584                    mtx_unlock(&softc->lock);
1585                    cs->cs_handoff_in_progress =3D false;

Terminate session: Send cv_signal() to mantainance thread, deallocate struc=
t,
etc.

1586                    cfiscsi_session_terminate(cs);
1587                    cfiscsi_target_release(ct);
1588                    ci->status =3D CTL_ISCSI_ERROR;
1589                    snprintf(ci->error_str, sizeof(ci->error_str),
1590                        "%s: port offline", __func__);
1591                    return;
1592            }

The main problem is that mantainance thread not always receive cv_signal() =
and
stuck in cv_wait().
So we have many partially initilized sessions and mantanance threads.

I see the following problems:

1. Flags cs->cs_handoff_in_progress and cs->cs_terminating which used by
mantanance thread is changed without the lock.

2. As I understand cv_signal() must be called under lock:

390 /*
391  * Signal a condition variable, wakes up one waiting thread.  Will also
wakeup
392  * the swapper if the process is not in memory, so that it can bring the
393  * sleeping process in.  Note that this may also result in additional
threads
394  * being made runnable.  Should be called with the same mutex as was pa=
ssed
to
395  * cv_wait held.
396  */
397 void
398 cv_signal(struct cv *cvp)

With next patch I can't reproduce the panic:

diff --git a/sys/cam/ctl/ctl_frontend_iscsi.c
b/sys/cam/ctl/ctl_frontend_iscsi.c
index d5be20c2a215..1b7837aa8355 100644
--- a/sys/cam/ctl/ctl_frontend_iscsi.c
+++ b/sys/cam/ctl/ctl_frontend_iscsi.c
@@ -1582,8 +1582,10 @@ cfiscsi_ioctl_handoff(struct ctl_iscsi *ci)
        mtx_lock(&softc->lock);
        if (ct->ct_online =3D=3D 0) {
                mtx_unlock(&softc->lock);
+               CFISCSI_SESSION_LOCK(cs);
                cs->cs_handoff_in_progress =3D false;
                cfiscsi_session_terminate(cs);
+               CFISCSI_SESSION_UNLOCK(cs);
                cfiscsi_target_release(ct);
                ci->status =3D CTL_ISCSI_ERROR;
                snprintf(ci->error_str, sizeof(ci->error_str),

3. Why wee need at all to start the mantanance thread and than if ct->ct_on=
line
=3D=3D 0 immidiatelly destroy it.
Can we check ct->ct_online early?

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244792-227>