Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Feb 2008 17:47:14 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Alexander Nedotsukov <bland@bbnest.net>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>, Joe Marcus Clarke <marcus@freebsd.org>
Subject:   Re: page fault panic in scioctl and console-kit-daemon
Message-ID:  <20080221154714.GS57756@deviant.kiev.zoral.com.ua>
In-Reply-To: <72E58ECA-D743-4D5E-9222-7129104E4BAC@bbnest.net>
References:  <4792C32C.5010205@gmail.com> <20080122133835.GT57756@deviant.kiev.zoral.com.ua> <4796356D.9080809@gmail.com> <20080123051648.GV57756@deviant.kiev.zoral.com.ua> <479796E1.4000500@gmail.com> <1201118159.38742.17.camel@shumai.marcuscom.com> <20080123211149.GA57756@deviant.kiev.zoral.com.ua> <1201123933.62127.9.camel@shumai.marcuscom.com> <20080124124213.GD57756@deviant.kiev.zoral.com.ua> <72E58ECA-D743-4D5E-9222-7129104E4BAC@bbnest.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--LMbs3ReFgmH17S4B
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Feb 21, 2008 at 09:26:16AM +0900, Alexander Nedotsukov wrote:
> Hi,
>=20
> May I ask to revisit this issue? To me ENXIO is not semantically =20
> correct in this particular case. It also turns out that doing =20
> workaround in userspace may not be that good as we used to think. I =20
> propose is to fix VT_WAITACTIVE so it simply wait for bound device =20
> activation. For my understanding this change should not have any =20
> impact on existing code. I also removed really strange console cleanup =
=20
> bit sticked in a long time ago (see ioctl() part).
> It will be nice to see this patch=20


>  (successfully tested by our affected users) committed to all branches.
>=20
> Thanks,
> Alexander.

I mostly agree with the patch, given it is tested.

The first (not important) note is that change of the wait channel from
the address of the private structure to the address of the cdev could
cause more spurious wakeups then before. I expect you usermode code to
deal with it properly.

Second one is that I do not understand the reason to have sc_clean_up()
there except that it might be to help the screen switching. The commit
message for the rev. 1.307, where it seems to be introduced, is not
useful at all. Any comments ?
>=20
> On 24.01.2008, at 21:42, Kostik Belousov wrote:
>=20
> >On Wed, Jan 23, 2008 at 04:32:13PM -0500, Joe Marcus Clarke wrote:
> >>
> >>On Wed, 2008-01-23 at 23:11 +0200, Kostik Belousov wrote:
> >>>On Wed, Jan 23, 2008 at 02:55:59PM -0500, Joe Marcus Clarke wrote:
> >>>>
> >>>>On Wed, 2008-01-23 at 20:34 +0100, Pawel Worach wrote:
> >>>>>Kostik Belousov wrote:
> >>>>>>On Tue, Jan 22, 2008 at 07:26:53PM +0100, Pawel Worach wrote:
> >>>>>>>Kostik Belousov wrote:
> >>>>>>>>On Sun, Jan 20, 2008 at 04:42:36AM +0100, Pawel Worach wrote:
> >>>>>>>>>Hi,
> >>>>>>>>>
> >>>>>>>>>While starting console-kit-daemon (sysutils/consolekit =20
> >>>>>>>>>0.2.3) during
> >>>>>>>>>boot or in single-user mode the system panics. If I start it =20
> >>>>>>>>>post-boot
> >>>>>>>>>it runs fine. This is on 8.0-CURRENT from about 12 hours =20
> >>>>>>>>>ago, another
> >>>>>>>>>user also reported the same on RELENG_7. Any other =20
> >>>>>>>>>information I can
> >>>>>>>>>provide ?
> >>>>>>>>>
> >>>>>>>>>Fatal trap 12: page fault while in kernel mode
> >>>>>>>>>fault virtual address	=3D 0x4
> >>>>>>>>>fault code		=3D supervisor read, page not present
> >>>>>>>>>instruction pointer	=3D 0x20:0xc04d2ab4
> >>>>>>>>>stack pointer	        =3D 0x28:0xe6499b18
> >>>>>>>>>frame pointer	        =3D 0x28:0xe6499b80
> >>>>>>>>>code segment		=3D base 0x0, limit 0xfffff, type 0x1b
> >>>>>>>>>			=3D DPL 0, pres 1, def32 1, gran 1
> >>>>>>>>>processor eflags	=3D interrupt enabled, resume, IOPL =3D 0
> >>>>>>>>>current process		=3D 134 (console-kit-daemon)
> >>>>>>>>>Physical memory: 1014 MB
> >>>>>>>>>Dumping 43 MB: 28 12
> >>>>>>>>>
> >>>>>>>>>#8  0xc07a34a2 in trap (frame=3D0xe6499ad8) at
> >>>>>>>>>/usr/src/sys/i386/i386/trap.c:489
> >>>>>>>>>#9  0xc079183b in calltrap () at /usr/src/sys/i386/i386/=20
> >>>>>>>>>exception.s:146
> >>>>>>>>>#10 0xc04d2ab4 in scioctl (dev=3D0xc3b20d00, cmd=3D537163270,
> >>>>>>>>>  data=3D0xe6499c70 "\002", flag=3D1, td=3D0xc3d3c880)
> >>>>>>>>>  at /usr/src/sys/dev/syscons/syscons.c:1073
> >>>>>>>>>#11 0xc051ed1a in giant_ioctl (dev=3D0xc3b20d00, cmd=3D537163270,
> >>>>>>>>>  data=3D0xe6499c70 "\002", fflag=3D1, td=3D0xc3d3c880)
> >>>>>>>>>  at /usr/src/sys/kern/kern_conf.c:349
> >>>>>>>>>#12 0xc0598194 in cnioctl (dev=3D0xc3b20d00, cmd=3D537163270,
> >>>>>>>>>  data=3D0xe6499c70 "\002", flag=3D1, td=3D0xc3d3c880)
> >>>>>>>>>---Type <return> to continue, or q <return> to quit---
> >>>>>>>>>  at /usr/src/sys/kern/tty_cons.c:521
> >>>>>>>>Unless the virtual screen is opened, the screen state =20
> >>>>>>>>scr_stat structure
> >>>>>>>>is not allocated. The following patch would fix the panic, =20
> >>>>>>>>but I do not
> >>>>>>>>know how the console-kit would react to the ENXIO from the
> >>>>>>>>VT_WAITACTIVE ioctl. Please, test it.
> >>>>>>>Thanks! The patch works.
> >>>>>>
> >>>>>>To clarify: do you see any problems with the console-kit after =20
> >>>>>>the patch ?
> >>>>>>In particular, can you verify that the program functions =20
> >>>>>>correctly, esp.
> >>>>>>on the virtual terminals 1, 2 ... , whatever this means ?
> >>>>>
> >>>>>The panic is of course gone, while chatting a bit with Marcus =20
> >>>>>(CCd) it
> >>>>>looks like console-kit does not do any error handling at all. =20
> >>>>>I've not
> >>>>>looked at what c-k does so maybe Marcus can answer the question =20
> >>>>>better.
> >>>>
> >>>>It tries to install a wait thread on each available VT.  That =20
> >>>>thread
> >>>>sets the WAITACTIVE ioctl, and waits for its VT to become =20
> >>>>active.  When
> >>>>it does, it sets the CK active VT accordingly, and reattaches the =20
> >>>>wait.
> >>>>
> >>>>When an error occurs in the ioctl, no wait is attached, and CK =20
> >>>>will not
> >>>>know when a particular VT becomes active.  This will essentially =20
> >>>>cripple
> >>>>CK (assuming the VT really does become available at a later point).
> >>>>
> >>>>Now, admittedly, there is no error correction in CK for this =20
> >>>>situation.
> >>>>It would be trivial to add something that periodically attempts to
> >>>>reestablish a failed wait.  However, I am very curious why only a =20
> >>>>few
> >>>>users are seeing this panic (me excluded on two different =20
> >>>>machines).  It
> >>>>seems to me that the scp should be initialized when =20
> >>>>sc_attach_unit() is
> >>>>called during device probing.  I don't see anything special in =20
> >>>>init(8)'s
> >>>>code that would cause these VT devices to become initialized.  I =20
> >>>>would
> >>>>assume that if one can open(2) them, then they should be =20
> >>>>available for
> >>>>ioctls?
> >>>
> >>>From my reading of the code, scp would be non-NULL after the first =20
> >>>open
> >>>of the corresponding /dev/ttyvX. sc_attach_unit() creates the scp =20
> >>>for
> >>>the console and the consolectl devices.
> >>>
> >>>VT_WAITACTIVE ioctl is performed on arbitrary syscons /dev node, and
> >>>can wait for any other screens, in particular, the screens that are
> >>>not opened at the moment (the reason for the reported panic).
> >>
> >>Right, and this is where I was confused.  I had thought from an old
> >>reading of the CK code that CK opened each ttyvX device to perform =20
> >>the
> >>ioctl.  It does not.  Instead, it opens /dev/console, and performs =20
> >>the
> >>ioctl for each ttyvX on that fd.  That does explain the panic, but =20
> >>not
> >>exactly why I did not see it.  I'm guessing a race condition, but I
> >>can't be sure.
> >>
> >>>
> >>>The patch I posted may be improved by making the VT_WAITACTIVE ioctl
> >>>to wait for the scp being allocated, and only then for the screen =20
> >>>to be
> >>>switched too. Please, test.
> >>
> >>I really appreciate your attention to this.  Funny thing is, CK 0.2.4
> >>was just released, and it is no longer started out of rc.d.  I've =20
> >>also
> >>added error correcting in the CK code path.  The problem may =20
> >>disappear
> >>depending on when CK is executed out of D-BUS.  However, it would be
> >>good to prevent this in the kernel.  Pawel said he would try and test
> >>this patch.
> >
> >This must be fixed in the kernel, at least because it allows any =20
> >console
> >user to panic the system. The question I have to you is what =20
> >approach shall
> >we use:
> >- return the ENXIO (1)
> >- do the wait for the open, then wait for the switch (2)
> >
> >I would prefer (1), both due to putting the decision to the =20
> >userspace, and
> >to not have the algorithm that could be implemented in userspace, in =20
> >the
> >kernel. On the other hand, as the maintainer of the one of the major =20
> >API
> >consumer there, you may have the different opinion, that is crusial.
>=20


--LMbs3ReFgmH17S4B
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (FreeBSD)

iEYEARECAAYFAke9nQIACgkQC3+MBN1Mb4jnSwCfbX0xWNffMszpUZM3SmB+dm6m
H8cAoOUsCn1VhOCnOPbisbnWeV1hN6pp
=egYk
-----END PGP SIGNATURE-----

--LMbs3ReFgmH17S4B--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080221154714.GS57756>