Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jan 2012 22:38:38 -0500
From:      Gary Palmer <gpalmer@freebsd.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Panic on 7.4-RELEASE-p5
Message-ID:  <20120127033838.GF17973@in-addr.com>
In-Reply-To: <20120127030906.GA67449@icarus.home.lan>
References:  <20120127024815.GD17973@in-addr.com> <20120127030906.GA67449@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 26, 2012 at 07:09:06PM -0800, Jeremy Chadwick wrote:
> On Thu, Jan 26, 2012 at 09:48:15PM -0500, Gary Palmer wrote:
> > Hi,
> > 
> > My Soekris firewall just panic'd
> > 
> > Fatal trap 12: page fault while in kernel mode
> > fault virtual address   = 0x14001d
> > fault code              = supervisor write, page not present
> > instruction pointer     = 0x20:0xc06fd2d3
> > stack pointer           = 0x28:0xd63557bc
> > frame pointer           = 0x28:0xd63558a4
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, def32 1, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 1869 (tcpdump)
> > trap number             = 12
> > panic: page fault
> > KDB: stack backtrace:
> > db_trace_self_wrapper(c079bbfc,c07fb800,c078d680,d6355660,d6355660,...) at db_tr
> > ace_self_wrapper+0x26
> > panic(c078d680,c07b829c,c36d6d24,1,1,...) at panic+0xed
> > trap_fatal(c2c352d0,140000,2,8,d63556c8,...) at trap_fatal+0x234
> > trap_pfault(c2b13620,0,c368f460,4,c36d6b00,...) at trap_pfault+0x27a
> > trap(d635577c) at trap+0x34e
> > calltrap() at calltrap+0x6
> > --- trap 0xc, eip = 0xc06fd2d3, esp = 0xd63557bc, ebp = 0xd63558a4 ---
> > softdep_disk_io_initiation(c2ade474,1000,c3003738,1,c2aab9b8,...) at softdep_dis
> > k_io_initiation+0xb3
> > ffs_geom_strategy(c3003738,c2aab9b8,1000,edb000,0,...) at ffs_geom_strategy+0x10
> > c
> > ufs_strategy(d6355910,d6355910,c07f3980,c378bbdc,c2aab9b8,...) at ufs_strategy+0
> > x64
> > bufstrategy(c378bc9c,c2aab9b8,c368f460,c2aab9b8,c2bd46b4,...) at bufstrategy+0x2
> > e
> > bufwrite(c2aab9b8,c2aabb04,20,c368f460,0,...) at bufwrite+0xf4
> > cluster_wbuild(c378bbdc,4000,3b7,0,8,...) at cluster_wbuild+0x6c9
> > cluster_write(c378bbdc,c2bd46b4,edc000,0,7f,...) at cluster_write+0x715
> > ffs_write(d6355bc0,c07091e4,c378bc34,0,c378bc64,...) at ffs_write+0x837
> > VOP_WRITE_APV(c07e9500,d6355bc0,c368f460,c07a2aec,252,...) at VOP_WRITE_APV+0xa0
> > vn_write(c2e2adf4,d6355c54,c36d7500,0,c368f460,...) at vn_write+0x26f
> > dofilewrite(d6355c54,ffffffff,ffffffff,0,c2e2adf4,...) at dofilewrite+0x84
> > kern_writev(c368f460,4,d6355c54,d6355c74,1,...) at kern_writev+0x58
> > write(c368f460,d6355cf8,c,c0798e1b,39bfbd8f,...) at write+0x50
> > syscall(d6355d38) at syscall+0x1b9
> > Xint0x80_syscall() at Xint0x80_syscall+0x20
> > --- syscall (4, FreeBSD ELF32, write), eip = 0x28354663, esp = 0xbfbfe85c, ebp =
> >  0xbfbfe878 ---
> > Uptime: 11d5h5m49s
> > Physical memory: 503 MB
> > Dumping 112 MB: 97 81 65 49 33 17 1
> > Dump complete
> > Automatic reboot in 1 seconds - press a key on the console to abort
> > 
> > fsck prevented the box from coming up automatically
> > 
> > /dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
> > /dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
> > THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
> >         ufs: /dev/ufs/varlog (/var/log)
> > 
> > Anyone seen this before?  Any clues?
> 
> Please shed some light as to your filesystem and disk setup.
> Partitioning details, any GEOM layers you use, you get the idea.
> The more verbose the better.  Please be sure to state which filesystems
> use softupdates and which do not (this matters a lot in this situation).

kryten# mount
/dev/ad0s1a on / (ufs, local, read-only)
devfs on /dev (devfs, local)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /etc (ufs, local)
/dev/ufs/home on /home (ufs, local, soft-updates)
/dev/ufs/var on /var (ufs, local, soft-updates)
/dev/ufs/varlog on /var/log (ufs, local, soft-updates)
devfs on /var/db/dhcpd/dev (devfs, local)

(note the second md on /etc is an anomaly - there is normally only one and
 the daily security report already flagged that up as being different. 
 Haven't looked into where / why that happened yet.  This is a nanobsd
 build)

kryten# glabel list
Geom name: ad0s2a
Providers:
1. Name: ufsid/4ef571bfcc16bfa6
   Mediasize: 1015636480 (969M)
   Sectorsize: 512
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 1983665
   length: 1015636480
   index: 0
Consumers:
1. Name: ad0s2a
   Mediasize: 1015636480 (969M)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: ad1s1d
Providers:
1. Name: ufs/varlog
   Mediasize: 107374182912 (100G)
   Sectorsize: 512
   Mode: r1w1e1
   secoffset: 0
   offset: 0
   seclength: 209715201
   length: 107374182912
   index: 0
Consumers:
1. Name: ad1s1d
   Mediasize: 107374182912 (100G)
   Sectorsize: 512
   Mode: r1w1e2

Geom name: ad1s1e
Providers:
1. Name: ufs/home
   Mediasize: 8589934592 (8.0G)
   Sectorsize: 512
   Mode: r1w1e1
   secoffset: 0
   offset: 0
   seclength: 16777216
   length: 8589934592
   index: 0
Consumers:
1. Name: ad1s1e
   Mediasize: 8589934592 (8.0G)
   Sectorsize: 512
   Mode: r1w1e2

Geom name: ad1s1f
Providers:
1. Name: ufs/var
   Mediasize: 3438215680 (3.2G)
   Sectorsize: 512
   Mode: r1w1e1
   secoffset: 0
   offset: 0
   seclength: 6715265
   length: 3438215680
   index: 0
Consumers:
1. Name: ad1s1f
   Mediasize: 3438215680 (3.2G)
   Sectorsize: 512
   Mode: r1w1e2

Geom name: ad0s1a
Providers:
1. Name: ufsid/4f125433ead0d4df
   Mediasize: 1015636480 (969M)
   Sectorsize: 512
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 1983665
   length: 1015636480
   index: 0
Consumers:
1. Name: ad0s1a
   Mediasize: 1015636480 (969M)
   Sectorsize: 512
   Mode: r0w0e0

Geom name: ad0s3
Providers:
1. Name: ufsid/47d97f881feaef9a
   Mediasize: 17031168 (16M)
   Sectorsize: 512
   Mode: r0w0e0
   secoffset: 0
   offset: 0
   seclength: 33264
   length: 17031168
   index: 0
Consumers:
1. Name: ad0s3
   Mediasize: 17031168 (16M)
   Sectorsize: 512
   Mode: r0w0e0


# grep 'ad[01]' /var/run/dmesg.boot 
ad0: 1953MB <SanDisk SDCFH-2048 HDX 4.04> at ata0-master WDMA2
ad1: DMA limited to UDMA33, device found non-ATA66 cable
ad1: 114473MB <Hitachi HTS542512K9SA00 BB2OC31P> at ata0-slave UDMA33
Trying to mount root from ufs:/dev/ad0s1a

> As for the actual crash -- I assume the filesystem in question which
> induced this has softupdates.

I have no idea from the backtrace which filesystem was being written
to at the time.  The filesystem that it complained was unable to
be checked on boot does have softupdates on it (although thinking about
it I probably don't actually need SU)

> Have you booted this box into single-user and issued a manual fsck on
> all the filesystems?  If not, please do so ASAP and provide details of
> the results.

I don't use background fsck and when the box tried rebooting it couldn't
handle an unexpected inconsistency in one filesystem and dropped to
single user mode.  I did 2 fsck passes from single user mode before
exiting out of the shell and letting the box finish booting.

( from a login shell after the boot has finished )

# grep fsck /etc/rc.conf 
background_fsck="NO"

( from my console log )

Starting file system checks:
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/home: 142 files, 1974 used, 4158281 free (81 frags, 519775 blocks, 0.0%
 fragmentation)
/dev/ufs/var: UNREF FILE I=75348  OWNER=someuser MODE=100600
/dev/ufs/var: SIZE=2048 MTIME=Jan 25 14:47 2012  (CLEARED)
/dev/ufs/var: UNREF FILE I=75349  OWNER=someuser MODE=100644
/dev/ufs/var: SIZE=39402 MTIME=Jan 25 14:43 2012  (CLEARED)
/dev/ufs/var: UNREF FILE I=75350  OWNER=someuser MODE=100600
/dev/ufs/var: SIZE=2048 MTIME=Jan 25 14:47 2012  (CLEARED)
/dev/ufs/var: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/ufs/var: SUMMARY INFORMATION BAD (SALVAGED)
/dev/ufs/var: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/ufs/var: 499 files, 2861 used, 1661962 free (234 frags, 207716 blocks, 0.0%
 fragmentation)
/dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
/dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
        ufs: /dev/ufs/varlog (/var/log)
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!

# fsck -p
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: clean, 4158281 free (81 frags, 519775 blocks, 0.0% fragmentation)
/dev/ufs/var: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/var: clean, 1661962 free (234 frags, 207716 blocks, 0.0% fragmentation)
/dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
/dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
        ufs: /dev/ufs/varlog (/var/log)
# fsck -y /dev/ufs/varlog
** /dev/ufs/varlog
** Last Mounted on /var/log
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=1496123
SALVAGE? yes

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

907 files, 15527664 used, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmen
tation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# fsck -y /dev/ufs/varlog
** /dev/ufs/varlog
** Last Mounted on /var/log
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
907 files, 15527664 used, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmen
tation)
# fsck -p
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/var: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: clean, 4158281 free (81 frags, 519775 blocks, 0.0% fragmentation)
/dev/ufs/varlog: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/varlog: clean, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmenta
tion)
/dev/ufs/var: clean, 1661962 free (234 frags, 207716 blocks, 0.0% fragmentation)

> These are confirmed cases of background_fsck not properly
> catching/dealing with all filesystem errors, thus letting some
> transparently pass through.  You are running 7.4-p2, which makes this
> likelihood even more probable.  This is why I advocate (and have for
> years) background_fsck="no" in rc.conf.  I can dig up the (very long and
> semi-heated) thread discussing this if you want, but I probably won't
> get to it until next week.  I have too much going on with job-related
> things outside of FreeBSD to spend a lot of time with it now.

Way ahead of you Jeremy :)  None of my boxes use background FSCK for
that very reason.  I prefer to wait a bit longer for the boot than
suffer ... oddities.

Thanks,

Gary



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120127033838.GF17973>