From owner-freebsd-fs@FreeBSD.ORG  Tue Jul  3 07:01:38 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@freebsd.org
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 11C2416A41F;
	Tue,  3 Jul 2007 07:01:38 +0000 (UTC)
	(envelope-from david.cecil@nokia.com)
Received: from mgw-ext14.nokia.com (smtp.nokia.com [131.228.20.173])
	by mx1.freebsd.org (Postfix) with ESMTP id B784613C447;
	Tue,  3 Jul 2007 07:01:36 +0000 (UTC)
	(envelope-from david.cecil@nokia.com)
Received: from esebh107.NOE.Nokia.com (esebh107.ntc.nokia.com [172.21.143.143])
	by mgw-ext14.nokia.com (Switch-3.2.5/Switch-3.2.5) with ESMTP id
	l636gMNo013766; Tue, 3 Jul 2007 09:42:39 +0300
Received: from siebh102.NOE.Nokia.com ([172.30.195.29]) by
	esebh107.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Tue, 3 Jul 2007 09:42:19 +0300
Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by
	siebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); 
	Tue, 3 Jul 2007 14:41:48 +0800
Received: from [172.30.10.247] ([172.30.10.247]) by syebe101.NOE.Nokia.com
	with Microsoft SMTPSVC(6.0.3790.1830); 
	Tue, 3 Jul 2007 16:41:47 +1000
Message-ID: <4689EFAA.4080009@nokia.com>
Date: Tue, 03 Jul 2007 16:41:46 +1000
From: David Cecil <david.cecil@nokia.com>
User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)
MIME-Version: 1.0
To: ext Kostik Belousov <kostikbel@gmail.com>
References: <45F776AE.8090702@nokia.com>
	<20070314161041.GI7847@garage.freebsd.pl>
	<45F8EE27.6070208@nokia.com>
	<20070315090031.GB80993@deviant.kiev.zoral.com.ua>
In-Reply-To: <20070315090031.GB80993@deviant.kiev.zoral.com.ua>
X-OriginalArrivalTime: 03 Jul 2007 06:41:47.0151 (UTC)
	FILETIME=[3A4391F0:01C7BD3D]
X-Nokia-AV: Clean
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: FFS writes to read-only mount
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Jul 2007 07:01:38 -0000

Hi,

in March I mailed the list about errors I'm seeing that look something 
like this:
g_vfs_done():mirror/gmroots1f[WRITE(offset=1349091328, 
length=16384)]error = 1
(Note that gmroots1f is not a complete mirror set, just one disk ,and 
was never part of a complete mirror set.  Also, partition f is the root 
partition.)

The problem comes about because the underlying geom object has had its 
access permissions changed so that g_io_check fails due to acw being 0 
(as per my original message at the end of this email).  This problem is 
most easily reproducible when during boot if the root filesystem is 
changed from rw to ro mount.  Our code is now 6.1-based.

I have recently returned to this problem, but so far I have not been 
able to determine why this buffer was not sync'ed along with the others 
at the time the filesystem was remounted ro.

I have included some information that was requested by Kostik back in 
March, in the hope it might trigger something useful.  I have caused the 
system to panic in g_io_check when an illegal write attempt is found:
#9  0x8057a3df in panic (fmt=0x80743c24 "Illegal write attempt!\n")
    at ../../../kern/kern_shutdown.c:572
#10 0x8052f0f0 in g_io_check (bp=0x866dbab0) at ../../../geom/geom_io.c:205
---Type <return> to continue, or q <return> to quit---
#11 0x8052f5da in g_io_schedule_down (tp=0x852bd7d0)
    at ../../../geom/geom_io.c:387
#12 0x8052fb4a in g_down_procbody () at ../../../geom/geom_kern.c:118
#13 0x80564a08 in fork_exit (callout=0x8052faf0 <g_down_procbody>, arg=0x0,
    frame=0xda68dd38) at ../../../kern/kern_fork.c:805
#14 0x806ebd7c in fork_trampoline () at ../../../i386/i386/exception.s:208

I have used bp to get the vnode, dirty buffer header, and buffer data.
vnode:
(kgdb) p *((struct buf *)(bp->bio_caller2))->b_bufobj->__bo_vnode
$5 = {v_type = VCHR, v_tag = 0x807419d2 "devfs", v_op = 0x80fbb5a0,
  v_data = 0x866b5680, v_mount = 0x8664f000, v_nmntvnodes = {
    tqe_next = 0x866b1104, tqe_prev = 0x866b8118}, v_un = {
    vu_mount = 0x8664cb00, vu_socket = 0x8664cb00, vu_cdev = 0x8664cb00,
    vu_fifoinfo = 0x8664cb00}, v_hashlist = {le_next = 0x0, le_prev = 0x0},
  v_hash = 0, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 
0x0,
    tqh_last = 0x866b8030}, v_dd = 0x0, v_cstart = 0, v_lasta = 0,
  v_lastw = 0, v_clen = 0, v_lock = {lk_interlock = 0x80fed128,
    lk_flags = 128, lk_sharecount = 0, lk_waitcount = 0,
    lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0x807419d2 "devfs",
    lk_timo = 51, lk_lockholder = 0xffffffff, lk_newlock = 0x0},
  v_interlock = {mtx_object = {lo_name = 0x80750688 "vnode interlock",
      lo_type = 0x80750688 "vnode interlock", lo_flags = 16973824,
      lo_witness_data = {lod_list = {stqe_next = 0x80ffd508},
        lod_witness = 0x80ffd508}}, mtx_lock = 4, mtx_recurse = 0},
  v_vnlock = 0x866b8058, v_holdcnt = 46, v_usecount = 1, v_iflag = 0,
  v_vflag = 0, v_writecount = 0, v_freelist = {tqe_next = 0x0,
    tqe_prev = 0x0}, v_bufobj = {bo_mtx = 0x866b807c, bo_clean = {bv_hd = {
        tqh_first = 0xcd364fc8, tqh_last = 0xcd365290}, bv_root = 
0xcd38b060,
      bv_cnt = 44}, bo_dirty = {bv_hd = {tqh_first = 0x0,
        tqh_last = 0x866b80c8}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput 
= 1,
    bo_flag = 0, bo_ops = 0x80fda81c, bo_bsize = 512, bo_object = 
0x8186d7c0,
    bo_synclist = {le_next = 0x867d0ce4, le_prev = 0x860c21c4},
    bo_private = 0x8667f040, __bo_vnode = 0x866b8000}, v_pollinfo = 0x0,
  v_label = 0x0}

bufobj and buf:
(kgdb) p *((struct buf *)(bp->bio_caller2))->b_bufobj
$6 = {bo_mtx = 0x866b807c, bo_clean = {bv_hd = {tqh_first = 0xcd364fc8,
      tqh_last = 0xcd365290}, bv_root = 0xcd38b060, bv_cnt = 44}, 
bo_dirty = {
    bv_hd = {tqh_first = 0x0, tqh_last = 0x866b80c8}, bv_root = 0x0,
    bv_cnt = 0}, bo_numoutput = 1, bo_flag = 0, bo_ops = 0x80fda81c,
  bo_bsize = 512, bo_object = 0x8186d7c0, bo_synclist = {le_next = 
0x867d0ce4,
    le_prev = 0x860c21c4}, bo_private = 0x8667f040, __bo_vnode = 0x866b8000}
(kgdb) p *((struct buf *)(bp->bio_caller2))
$7 = {b_bufobj = 0x866b80b4, b_bcount = 16384, b_caller1 = 0x0,
  b_data = 0xce0ab000 "\001", b_error = 0, b_iocmd = 2 '\002',
  b_ioflags = 2 '\002', b_iooffset = 1541685248, b_resid = 0, b_iodone = 0,
  b_blkno = 3011104, b_offset = 1541685248, b_bobufs = {tqe_next = 
0xcd38ac88,
    tqe_prev = 0xcd383828}, b_left = 0xcd3837f0, b_right = 0xcd369b60,
  b_vflags = 1, b_freelist = {tqe_next = 0xcd36c318, tqe_prev = 0xcd389ae4},
  b_qindex = 2, b_flags = 2684485668, b_xflags = 2 '\002', b_lock = {
    lk_interlock = 0x80fed170, lk_flags = 262144, lk_sharecount = 0,
    lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80,
    lk_wmesg = 0x8075476d "bufwait", lk_timo = 0, lk_lockholder = 
0xfffffffe,
    lk_newlock = 0x0}, b_bufsize = 16384, b_runningbufspace = 16384,
  b_kvabase = 0xce0ab000 "\001", b_kvasize = 16384, b_lblkno = 3011104,
  b_vp = 0x866b8000, b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0x0,
  b_wcred = 0x0, b_saveaddr = 0xce0ab000, b_pager = {pg_reqpage = 0},
  b_cluster = {cluster_head = {tqh_first = 0x0, tqh_last = 0x0},
    cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}}, b_pages = 
{0x8253ce28,
    0x8251bb70, 0x824ff9b8, 0x8251c500, 0x0 <repeats 28 times>}, 
b_npages = 4,
  b_dep = {lh_first = 0x0}}

data:
(kgdb) x/32x 0xce0ab000
0xce0ab000:     0x0001a1ed      0x00000000      0x00000000      0x00000000
0xce0ab010:     0x00000010      0x00000000      0x00000000      0x00000000
0xce0ab020:     0x00000000      0x00000000      0x4689bfd2      0x00000000
0xce0ab030:     0x4689d40d      0x00000000      0x4689bfd2      0x00000000
0xce0ab040:     0x00000000      0x00000000      0x00000000      0x00000000
0xce0ab050:     0x76688876      0x00000000      0x00000000      0x00000000
0xce0ab060:     0x00000000      0x00000000      0x00000000      0x00000000
0xce0ab070:     0x5f696c63      0x746d7373      0x6f732e70      0x302e322e

I appreciate any thoughts you have that might help me solve this issue.

Also, in looking at a few geom related issues, I have a couple of 
changes that are generally applicable to FreeBSD that I plan to send for 
your comments soon.

Thanks,
Dave


ext Kostik Belousov wrote:
> On Thu, Mar 15, 2007 at 04:56:39PM +1000, David Cecil wrote:
>   
>> Hi Pawel,
>>
>> here is what I've found.  Note that this problem doesn't occur 
>> frequently, but does happen often enough to be a problem.  I am still 
>> debugging the live system.
>>
>> I put breakpoints in g_print_bio and g_io_request.  I then continued 
>> until the breakpoint at g_print_bio breakpoint was hit, but each time 
>> g_io_request was hit, I printed the backtrace.  Then, at g_print_bio I 
>> could identify the struct bio * and look for the corresponding 
>> g_io_request trace with a matching struct bio *.
>>
>> That trace looks like this:
>> db> bt
>> Tracing pid 47 tid 100050 td 0x860d0640
>> g_io_request(864258c4,860ec440,a0020024,98b92188,a5281bf0,...) at 
>> g_io_request+0x107
>> g_vfs_strategy(86104ce4,98b92188,a0020024,98b92188,86104c30,...) at 
>> g_vfs_strategy+0x63
>> ffs_geom_strategy(86104ce4,98b92188) at ffs_geom_strategy+0x129
>> bufwrite(98b92188,4000,98b92188,a5281c54,805b20c1,...) at bufwrite+0x146
>> ffs_bufwrite(98b92188) at ffs_bufwrite+0x282
>> vfs_bio_awrite(98b92188) at vfs_bio_awrite+0x221
>> vop_stdfsync(a5281cbc,a5281cbc,a5281c98,806eff3e,a5281cbc,...) at 
>> vop_stdfsync+0x121
>>     
> It would be helpful, at this frame, to print the actual vnode, as well as
> the dirty buffer header together with dump of buffer data. It seems that
> the easiest way to do that would be to obtain coredump and then use kgdb.
>
>   
>> devfs_fsync(a5281cbc) at devfs_fsync+0x23
>> VOP_FSYNC_APV(80795080,a5281cbc) at VOP_FSYNC_APV+0x7e
>> sync_vnode(86104ce4,860d0640) at sync_vnode+0x100
>> sched_sync(0,a5281d38,0,805c068c,0,...) at sched_sync+0x1ed
>> fork_exit(805c068c,0,a5281d38) at fork_exit+0xa0
>> fork_trampoline() at fork_trampoline+0x8
>>
>> From ps, it identifies pid 47 as "syncer".
>>
>> So it appears that all I've discovered is that the syncer thread is 
>> writing all the dirty buffers for gm0s1a to the disk :-( I guess the 
>> question is, why are there any dirty buffers for the read-only 
>> partition?  I think the partition was mounted rw for a short while 
>> during the boot process.  Maybe something there that didn't get 
>> flushed?  Or is it something internal to FFS?
>>
>> The output of ps is listed below.  Note that some processes might differ 
>> from a standard FreeBSD system as we're modifying the code.  As this 
>> problem seems to be generic to some FreeBSD systems, I thought it valid 
>> to pursue the problem here.  I hope that's okay, and anticipate the fix 
>> would apply to FBSD, as I've seen what appears to be the same problem 
>> logged elsewhere.
>>
>> db> ps
>>  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
>>  856   841   856     0  S+      ttyin    0x858bd810 csh
>>  841     1   841     0  Ss+     wait     0x8612f800 login
>>  791   551   791     0  Ss      select   0x8081231c sshd-x
>>  790   551   790     0  Ss      nanslp   0x807c6400 cron
>>  789   551   789     0  Ss      select   0x8081231c monitord
>>  788   551   788     0  Ss      select   0x8081231c snmpd
>>  787   551   787     0  Ss      select   0x8081231c clishd
>>  785   551   785     0  Ss      select   0x8081231c inetd
>>  731   727   727    80  S       accept   0x861cde7e httpd
>>  730   727   727    80  S       accept   0x861cde7e httpd
>>  729   551   729     0  Ss      select   0x8081231c ifm
>>  728   551   728     0  Ss      select   0x8081231c xpand
>>  727   551   727     0  Ss      select   0x8081231c httpd
>>  725   551   725     0  Ss      select   0x8081231c ipsrd
>>  559   551   559     0  Ss      select   0x8081231c syslogd
>>  551     1   551     0  Ss      select   0x8081231c pm
>>   49     0     0     0  SL      -        0xa5071d04 [schedcpu]
>>   48     0     0     0  SL      sdflush  0x808ca6cc [softdepflush]
>>   47     0     0     0  SL      syncer   0x807c5da4 [syncer]
>>   46     0     0     0  SL      vlruwt   0x860ce600 [vnlru]
>>   45     0     0     0  SL      psleep   0x8081276c [bufdaemon]
>>   44     0     0     0  SL      pgzero   0x808cb41c [pagezero]
>>   43     0     0     0  SL      psleep   0x808caf84 [vmdaemon]
>>   42     0     0     0  SL      psleep   0x808caf40 [pagedaemon]
>>   41     0     0     0  SL      m:w1     0x860a1800 [g_mirror gm0s1]
>>   40     0     0     0  SL      ip_inp   0x80854c00 [ip_in_0]
>>   39     0     0     0  SL      ip6_inpu 0x808c9560 [ip6 input [1]]
>>   38     0     0     0  SL      ip6_inpu 0x808c9560 [ip6 input [0]]
>>   37     0     0     0  WL                          [swi0: sio]
>>   36     0     0     0  WL                          [irq17: ichsmb0]
>>   35     0     0     0  WL                          [irq18: atapci1]
>>   34     0     0     0  WL                          [irq15: ata1]
>>   33     0     0     0  WL                          [irq14: ata0]
>>   32     0     0     0  SL      usbevt   0x84b7d210 [usb2]
>>   31     0     0     0  WL                          [irq23: ehci0]
>>   30     0     0     0  SL      usbevt   0x8588c210 [usb1]
>>   29     0     0     0  WL                          [irq19: uhci1]
>>   28     0     0     0  SL      usbtsk   0x807c2c64 [usbtask]
>>   27     0     0     0  SL      usbevt   0x8587a210 [usb0]
>>   26     0     0     0  WL                          [irq16: uhci0]
>>   25     0     0     0  WL                          [irq144: cavium0]
>>   24     0     0     0  SL      cbb cv   0x84b7dbe4 [cbb1]
>>   23     0     0     0  SL      cbb cv   0x84b7f3e4 [cbb0]
>>   22     0     0     0  WL                          [irq96: cbb0 cbb1]
>>   21     0     0     0  WL                          [irq9: acpi0]
>>   20     0     0     0  WL                          [swi5: +]
>>    9     0     0     0  SL      -        0x84b56bc0 [thread taskq]
>>   19     0     0     0  WL                          [swi6: +]
>>   18     0     0     0  WL                          [swi6: task queue]
>>    8     0     0     0  SL      -        0x84b56d00 [acpi_task2]
>>    7     0     0     0  SL      -        0x84b56d00 [acpi_task1]
>>    6     0     0     0  SL      -        0x84b56d00 [acpi_task0]
>>   17     0     0     0  WL                          [swi2: cambio]
>>    5     0     0     0  SL      -        0x84b56e80 [kqueue taskq]
>>   16     0     0     0  SL      -        0x807c2440 [yarrow]
>>    4     0     0     0  SL      -        0x807c335c [g_down]
>>    3     0     0     0  RL      CPU 0               [g_up]
>>    2     0     0     0  SL      -        0x807c3350 [g_event]
>>   15     0     0     0  WL                          [swi1: net]
>>   14     0     0     0  WL                          [swi3: vm]
>>   13     0     0     0  WL                          [swi4: clock sio]
>>   12     0     0     0  RL                          [idle: cpu0]
>>   11     0     0     0  RL      CPU 1               [idle: cpu1]
>>    1     0     1     0  SLs     wait     0x84ac8e00 [init]
>>   10     0     0     0  SL      ktrace   0x807c4610 [ktrace]
>>    0     0     0     0  WLs                         [swapper]
>>
>> One list member asked if I have mounted the root partition with noatime 
>> as well as ro.  I have not specified noatime.  It's worth noting that 
>> files in / are regularly read/accessed, as you'd expect, but this bug 
>> only occurs once a week or so on a number of machines.
>>
>> Thanks a lot for your time.
>>
>> Regards,
>> Dave
>>
>> ext Pawel Jakub Dawidek wrote:
>>     
>>> On Wed, Mar 14, 2007 at 02:14:38PM +1000, David Cecil wrote:
>>>  
>>>       
>>>> Hi,
>>>>
>>>> I have seen the following message (or equivalent) occasionally on a 
>>>> FreeBSD 6.1 system:
>>>> g_vfs_done():mirror/gm0s1a[WRITE(offset=1349091328, length=16384)]error = 
>>>> 1
>>>>
>>>> The partition in question is the root partition, and it is mounted 
>>>> read-only.  I have verified that the problem occurs due to the write 
>>>> request returning EPERM due to the check in g_io_check:
>>>> case BIO_WRITE:
>>>> case BIO_DELETE:
>>>>       if (cp->acw == 0)
>>>>           return (EPERM);
>>>>
>>>> I have been trying to determine what within FFS would be trying to write 
>>>> to the partition.  The "bio_from" in the bio structure indicates (in the 
>>>> geom) that it's ffs.mirror/gm0s1a that's trying to write.  The contents 
>>>> of the buffer looks somewhat like a directory (lots of files listed, but 
>>>> comparison to the actual directory that contians these files reveals it's 
>>>> somewhat different), followed by a binary (ELF header).  However, I'm at 
>>>> a loss to understand who's actually doing the writing.  Is it coming from 
>>>> within FFS or is there an application that's done the write?  (I can't 
>>>> understand how an application would be permitted to do it though.)
>>>>
>>>> I have seen this sort of problem (same error number) reported on the 
>>>> Internet occasionally, but it doesn't seem it's been satisfactorily 
>>>> resolved in all instances.
>>>>
>>>> Any help you can provide would be much appreciated.
>>>>    
>>>>         
>>> Will be good if you could place kdb_enter() into g_vfs_done() error path
>>> and once in DDB try which processes wait for I/O and collect their
>>> backtraces or just put output of 'alltrace' on some web page.
>>>
>>>  
>>>       
>> -- 
>> Software Engineer
>> Secure and Mobile Connectivity
>> Nokia Enterprise Solutions
>> +61 7 5553 8307 (office)
>> +61 412 728 222 (cell)
>>
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>