From owner-freebsd-geom@FreeBSD.ORG Mon Apr 23 11:07:15 2012 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD9961065678 for ; Mon, 23 Apr 2012 11:07:15 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id AEA768FC1C for ; Mon, 23 Apr 2012 11:07:15 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3NB7F42047571 for ; Mon, 23 Apr 2012 11:07:15 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3NB7FEa047567 for freebsd-geom@FreeBSD.org; Mon, 23 Apr 2012 11:07:15 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 23 Apr 2012 11:07:15 GMT Message-Id: <201204231107.q3NB7FEa047567@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2012 11:07:15 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165745 geom [geom] geom_multipath page fault on removed drive o kern/165428 geom [glabel][patch] Add xfs support to glabel o kern/164254 geom [geom] gjournal not stopping on GPT partitions o kern/164252 geom [geom] gjournal overflow o kern/164143 geom [geom] Partition table not recognized after upgrade R8 a kern/163020 geom [geli] [patch] enable the Camellia-XTS on GEOM ELI o kern/162010 geom [geli] panic: Provider's error should be set (error=0) o kern/161979 geom [geom] glabel doesn't update after newfs, and glabel s o kern/161752 geom [geom] glabel(8) doesn't get gpt label change o bin/161677 geom gpart(8) Probably bug in gptboot o kern/160562 geom [geom][patch] Allow to insert new component to geom_ra o kern/160409 geom [geli] failed to attach provider f kern/159595 geom [geom] [panic] panic on gmirror unload in vbox [regres p kern/158398 geom [headers] [patch] includes o kern/158197 geom [geom] geom_cache with size>1000 leads to panics o kern/157879 geom [libgeom] [regression] ABI change without version bump o kern/157863 geom [geli] kbdmux prevents geli passwords from being enter o kern/157739 geom [geom] GPT labels with geom_multipath o kern/157724 geom [geom] gpart(8) 'add' command must preserve gap for sc o kern/157723 geom [geom] GEOM should not process 'c' (raw) partitions fo o kern/157108 geom [gjournal] dumpon(8) fails on gjournal providers o kern/155994 geom [geom] Long "Suspend time" when reading large files fr o kern/154226 geom [geom] GEOM label does not change when you modify them o kern/150858 geom [geom] [geom_label] [patch] glabel(8) is not compatibl o kern/150626 geom [geom] [gjournal] gjournal(8) destroys label o kern/150555 geom [geom] gjournal unusable on GPT partitions o kern/150334 geom [geom] [udf] [patch] geom label does not support UDF o kern/149762 geom volume labels with rogue characters o bin/149215 geom [panic] [geom_part] gpart(8): Delete linux's slice via o kern/147667 geom [gmirror] Booting with one component of a gmirror, the o kern/145818 geom [geom] geom_stat_open showing cached information for n o kern/145042 geom [geom] System stops booting after printing message "GE o kern/143455 geom gstripe(8) in RELENG_8 (31st Jan 2010) broken o kern/142563 geom [geom] [hang] ioctl freeze in zpool o kern/141740 geom [geom] gjournal(8): g_journal_destroy concurrent error o kern/140352 geom [geom] gjournal + glabel not working o kern/135898 geom [geom] Severe filesystem corruption - large files or l o kern/134922 geom [gmirror] [panic] kernel panic when use fdisk on disk o kern/134113 geom [geli] Problem setting secondary GELI key o kern/133931 geom [geli] [request] intentionally wrong password to destr o bin/132845 geom [geom] [patch] ggated(8) does not close files opened a o bin/131415 geom [geli] keystrokes are unregulary sent to Geli when typ o kern/131353 geom [geom] gjournal(8) kernel lock o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid f kern/128276 geom [gmirror] machine lock up when gmirror module is used o kern/127420 geom [geom] [gjournal] [panic] Journal overflow on gmirrore o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123122 geom [geom] GEOM / gjournal kernel lock o kern/122738 geom [geom] gmirror list "losts consumers" after gmirror de o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120091 geom [geom] [geli] [gjournal] geli does not prompt for pass o kern/115856 geom [geli] ZFS thought it was degraded when it should have o kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile f kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o bin/86388 geom [geom] [geom_part] periodic(8) daily should backup gpa o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. 72 problems total. From owner-freebsd-geom@FreeBSD.ORG Thu Apr 26 15:20:13 2012 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AC134106564A for ; Thu, 26 Apr 2012 15:20:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 959048FC0A for ; Thu, 26 Apr 2012 15:20:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3QFKDB5003470 for ; Thu, 26 Apr 2012 15:20:13 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3QFKD3k003469; Thu, 26 Apr 2012 15:20:13 GMT (envelope-from gnats) Date: Thu, 26 Apr 2012 15:20:13 GMT Message-Id: <201204261520.q3QFKD3k003469@freefall.freebsd.org> To: freebsd-geom@FreeBSD.org From: Andreas Longwitz Cc: Subject: Re: kern/164252: [geom] gjournal overflow X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Andreas Longwitz List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2012 15:20:13 -0000 The following reply was made to PR kern/164252; it has been noted by GNATS. From: Andreas Longwitz To: bug-followup@freebsd.org Cc: grimeton@gmx.net Subject: Re: kern/164252: [geom] gjournal overflow Date: Thu, 26 Apr 2012 17:17:01 +0200 On one of my machines I can reproduce this panic and give some debug informations. if I ran mount -u -o noatime -o async -o snapshot /prod/.snap/fscktest /prod on one gjournaled disk and some seconds later cd /backup; iozone -Ra -g 2G on a second gjournaled disk, the panic is triggered immediately. My disk configuration is: Filesystem Size Used Avail Capacity Mounted on /dev/mirror/gm0p7.journal 989M 207M 702M 23% / devfs 1.0k 1.0k 0B 100% /dev /dev/mirror/gm0p8.journal 3.9G 477M 3.1G 13% /var /dev/mirror/gm0p9.journal 15G 3.1G 11G 21% /usr /dev/mirror/gm0p10.journal 101G 120M 93G 0% /home /dev/mirror/gm1p2.journal 1.8T 94G 1.5T 6% /prod /dev/mirror/gm2p2.journal 1.8T 94G 1.5T 6% /backup From /var/log/messages: FreeBSD 8.2-STABLE #0: Mon Apr 23 00:20:44 CEST 2012 root@dsssrvt2.incore:/usr/obj/usr/src/sys/SERVER64 amd64 ... GEOM: da0: the secondary GPT header is not in the last LBA. GEOM: da1: the secondary GPT header is not in the last LBA. GEOM: da2: the secondary GPT header is not in the last LBA. GEOM: da3: the secondary GPT header is not in the last LBA. GEOM_MIRROR: Device mirror/gm0 launched (2/2). GEOM: da4: the secondary GPT header is not in the last LBA. GEOM_MIRROR: Device mirror/gm1 launched (2/2). GEOM: da5: the secondary GPT header is not in the last LBA. GEOM_MIRROR: Device mirror/gm2 launched (2/2). GEOM_JOURNAL: Journal 2877441109: mirror/gm0p3 contains journal. GEOM_JOURNAL: Journal 4086979879: mirror/gm0p4 contains journal. GEOM_JOURNAL: Journal 1430126911: mirror/gm0p5 contains journal. GEOM_JOURNAL: Journal 2027035949: mirror/gm0p6 contains journal. GEOM_JOURNAL: Journal 2877441109: mirror/gm0p7 contains data. GEOM_JOURNAL: Journal mirror/gm0p7 clean. GEOM_JOURNAL: Journal 4086979879: mirror/gm0p8 contains data. GEOM_JOURNAL: Journal mirror/gm0p8 clean. GEOM_JOURNAL: Journal 1430126911: mirror/gm0p9 contains data. GEOM_JOURNAL: Journal mirror/gm0p9 clean. GEOM_JOURNAL: Journal 2027035949: mirror/gm0p10 contains data. GEOM_JOURNAL: Journal mirror/gm0p10 clean. GEOM_JOURNAL: Journal 3941023214: mirror/gm1p1 contains journal. GEOM_JOURNAL: Journal 3941023214: mirror/gm1p2 contains data. GEOM_JOURNAL: Journal mirror/gm1p2 clean. GEOM_JOURNAL: Journal 817888587: mirror/gm2p1 contains journal. GEOM_JOURNAL: Journal 817888587: mirror/gm2p2 contains data. GEOM_JOURNAL: Journal mirror/gm2p2 clean. Trying to mount root from ufs:/dev/mirror/gm0p7.journal em0: link state changed to UP GEOM_MIRROR[1]: Kernel dump will go to da0. Sometimes I see messages like fsync: giving up on dirty 0xffffff0002ef6000: tag devfs, type VCHR usecount 1, writecount 0, refcount 18 mountedhere 0xffffff0002d8ec00 flags (VV_COPYONWRITE) v_object 0xffffff0002eda360 ref 0 pages 79834 lock type devfs: EXCL by thread 0xffffff0002a00460 (pid 18) dev mirror/gm2p2.journal GEOM_JOURNAL: Cannot suspend file system /backup (error=35). This should be harmless according to http://freebsd.monkey.org/freebsd-stable/200609/msg00020.html. My kernel has a patch for geom/journal/g_journal.c, because I want to know which of the gjournals triggers the panic: < panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)", < (intmax_t)sc->sc_journal_offset, --- > panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)", > sc->sc_id, (intmax_t)sc->sc_journal_offset, Further I have (same 'id' as in panic string): gjournal dump mirror/gm2p1: Metadata on /dev/mirror/gm2p1: magic: GEOM::JOURNAL version: 0 id: 817888587 type: 2 start: 0 end: 2147483136 joffset: 0 jid: 0 flags: 1 hcprovider: provsize: 2147483648 MD5 hash: 38681a7ffb1c2e570a0ce09d6da60a5f gjournal dump mirror/gm2p2: Metadata on mirror/gm2p2: magic: GEOM::JOURNAL version: 0 id: 817888587 type: 1 start: 0 end: 2147483136 joffset: 182301184 jid: 2952916330 flags: 0 hcprovider: provsize: 1998251415552 MD5 hash: 3a3039ec8bf7be3ed1293fc82b52691a After the panic is triggered I give some ddb-output: panic: Journal overflow (id=817888587 joffset=1027771904 active=182301184 inactive=1026233856) cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 g_journal_flush() at g_journal_flush+0x8c8 g_journal_worker() at g_journal_worker+0x14bb fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 --- KDB: enter: panic [thread pid 36 tid 100089 ] Stopped at kdb_enter+0x3b: movq $0,0x46ffb2(%rip) db:0:kdb.enter.panic> watchdog No argument provided, disabling watchdog db:0:kdb.enter.panic> run ddbinfo db:1:ddbinfo> capture on db:1:on> run lockinfo db:2:lockinfo> show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {UNOWNED} db:2:Giant> show lockedvnods Locked vnodes 0xffffff01b40f0b10: tag ufs, type VDIR usecount 4, writecount 0, refcount 5 mountedhere 0 flags () v_object 0xffffff01c8b75ca8 ref 0 pages 1 lock type ufs: SHARED (count 1) ino 3, on dev mirror/gm1p2.journal 0xffffff01b43f1588: tag ufs, type VREG usecount 1, writecount 0, refcount 45167 mountedhere 0 flags (VV_SYSTEM) lock type ufs: EXCL by thread 0xffffff00348aa8c0 (pid 80175) with shared waiters pending ino 4, on dev mirror/gm1p2.journal 0xffffff01e6844ce8: tag ufs, type VREG usecount 1, writecount 1, refcount 1027 mountedhere 0 flags () v_object 0xffffff01cee58510 ref 0 pages 4100 lock type ufs: EXCL by thread 0xffffff0042fe1460 (pid 80395) ino 4, on dev mirror/gm2p2.journal db:2:lockedvnods> show lockchain thread 100089 (pid 36, g_journal mirror/gm) running on CPU 0 db:2:lockchain> show sleepchain thread 100089 (pid 36, g_journal mirror/gm) running on CPU 0 db:1:sleepchain> show allpcpu Current CPU: 0 cpuid = 0 dynamic pcpu = 0x236980 curthread = 0xffffff0002ded000: pid 36 "g_journal mirror/gm" curpcb = 0xffffff82452d0d10 fpcurthread = none idlethread = 0xffffff0002359460: tid 100006 "idle: cpu0" curpmap = 0xffffffff808557d0 tssp = 0xffffffff808c3100 commontssp = 0xffffffff808c3100 rsp0 = 0xffffff82452d0d10 gs32p = 0xffffffff808c1f38 ldt = 0xffffffff808c1f78 tss = 0xffffffff808c1f68 cpuid = 1 dynamic pcpu = 0xffffff807f7fc980 curthread = 0xffffff0002d35460: pid 25 "syncer" curpcb = 0xffffff8242f96d10 fpcurthread = none idlethread = 0xffffff00023598c0: tid 100005 "idle: cpu1" curpmap = 0xffffffff808557d0 tssp = 0xffffffff808c3168 commontssp = 0xffffffff808c3168 rsp0 = 0xffffff8242f96d10 gs32p = 0xffffffff808c1fa0 ldt = 0xffffffff808c1fe0 tss = 0xffffffff808c1fd0 cpuid = 2 dynamic pcpu = 0xffffff807f803980 curthread = 0xffffff000234d000: pid 11 "idle: cpu2" curpcb = 0xffffff8000051d10 fpcurthread = none idlethread = 0xffffff000234d000: tid 100004 "idle: cpu2" curpmap = 0xffffffff808557d0 tssp = 0xffffffff808c31d0 commontssp = 0xffffffff808c31d0 rsp0 = 0xffffff8000051d10 gs32p = 0xffffffff808c2008 ldt = 0xffffffff808c2048 tss = 0xffffffff808c2038 cpuid = 3 dynamic pcpu = 0xffffff807f80a980 curthread = 0xffffff0002359000: pid 12 "swi1: netisr 0" curpcb = 0xffffff8000060d10 fpcurthread = none idlethread = 0xffffff000234d460: tid 100003 "idle: cpu3" curpmap = 0xffffffff808557d0 tssp = 0xffffffff808c3238 commontssp = 0xffffffff808c3238 rsp0 = 0xffffff8000060d10 gs32p = 0xffffffff808c2070 ldt = 0xffffffff808c20b0 tss = 0xffffffff808c20a0 db:1:allpcpu> bt Tracing pid 36 tid 100089 td 0xffffff0002ded000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x180 g_journal_flush() at g_journal_flush+0x8c8 g_journal_worker() at g_journal_worker+0x14bb fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 --- db:1:bt> ps pid ppid pgrp uid state wmesg wchan cmd 80395 80387 80395 0 S+ wdrain 0xffffffff8087cc80 initial thread 80387 80386 80387 0 S+ ttyin 0xffffff0002cdf4a8 sh 80386 79541 80386 1003 S+ wait 0xffffff01bc6f9470 su 80199 5911 37 0 S nanslp 0xffffffff80859308 sleep 80192 79516 80192 1003 S+ ufs 0xffffff01b43f1620 ls 80175 80155 80144 0 R+ mount 80155 80148 80144 0 S+ wait 0xffffff01bc6f98e0 sh 80149 80145 80144 0 S+ piperd 0xffffff00345e4888 sed 80148 80145 80144 0 S+ wait 0xffffff01d3c97000 sh 80145 80144 80144 0 S+ wait 0xffffff007a9c1470 sh 80144 80123 80144 0 S+ wait 0xffffff0002e188e0 sh 80123 80122 80123 0 S+ ttyin 0xffffff0002a600a8 sh 80122 78340 80122 1003 S+ wait 0xffffff01b531d000 su 79541 79540 79541 1003 Ss+ wait 0xffffff01b9bd2000 bash 79540 79538 79538 1003 S select 0xffffff0002c9c540 sshd 79538 5898 79538 0 Ss sbwait 0xffffff00345f5694 sshd 79516 79515 79516 1003 Ss+ wait 0xffffff01bc6f9000 bash 79515 79513 79513 1003 S select 0xffffff01f7afe740 sshd 79513 5898 79513 0 Ss sbwait 0xffffff0002f613ec sshd 78340 78339 78340 1003 Ss+ wait 0xffffff01c97d1000 bash 78339 78337 78337 1003 S select 0xffffff0002daf840 sshd 78337 5898 78337 0 Ss sbwait 0xffffff00344703ec sshd 70905 1 69750 0 S piperd 0xffffff0002f602d8 sh 70904 1 69750 0 S pause 0xffffff00345c7980 top 49914 1 49914 0 Ss nanslp 0xffffffff80859308 watchdogd 49465 0 0 0 SL ipmireq 0xffffff00346693b8 [ipmi0: kcs] 6330 1 6330 0 Ss+ ttyin 0xffffff0002a6bca8 getty 6329 1 6329 0 Ss+ ttyin 0xffffff0002a7c0a8 getty 6328 1 6328 0 Ss+ ttyin 0xffffff0002a7bca8 getty 6327 1 6327 0 Ss+ ttyin 0xffffff0002a748a8 getty 6326 1 6326 0 Ss+ ttyin 0xffffff0002a74ca8 getty 6325 1 6325 0 Ss+ ttyin 0xffffff0002a7a0a8 getty 6324 1 6324 0 Ss+ ttyin 0xffffff0002a7a4a8 getty 6323 1 6323 0 Ss+ ttyin 0xffffff0002a7a8a8 getty 6287 1 6287 0 Ss select 0xffffff003443f5c0 inetd 6267 1 6267 0 Ss select 0xffffff0002adac40 moused 6108 1 6108 0 Ss nanslp 0xffffffff80859308 cron 6102 1 6102 25 Ss pause 0xffffff0034329980 sendmail 6098 1 6098 0 Ss select 0xffffff0002daf340 sendmail 5911 1 37 0 S+ wait 0xffffff0034026470 sh 5903 1 5903 0 Ss select 0xffffff0002c9ca40 sshd 5898 1 5898 0 Ss select 0xffffff0002c9cac0 sshd 5882 5833 5833 88 S (threaded) mysqld 100191 S sigwait 0xffffff82454cea38 mysqld 100190 S ucond 0xffffff0002dad200 mysqld 100169 S select 0xffffff0002b39240 initial thread 5833 1 5833 88 Ss wait 0xffffff003432b8e0 sh 5693 1 5693 181 Ss select 0xffffff00343032c0 nrpe2 5641 1 5641 0 Ss select 0xffffff00343035c0 ntpd 5480 0 0 0 SL mdwait 0xffffff0002b63000 [md0] 5435 1 5435 53 Ss (threaded) named 100158 S kqread 0xffffff0002c7f700 named 100157 S ucond 0xffffff0002c28080 named 100156 S ucond 0xffffff0002b3a480 named 100155 S ucond 0xffffff0002b3a380 named 100154 S ucond 0xffffff0002b3a400 named 100153 S ucond 0xffffff0002c28000 named 100145 S sigwait 0xffffff82453e8a38 named 5344 1 5344 0 Ss select 0xffffff0002de96c0 syslogd 5308 0 0 0 SL - 0xffffffff80855960 [accounting] 5103 1 5103 0 Ss select 0xffffff0002bb2e40 devd 1522 1514 1514 64 R pflogd 1514 1 1514 0 Ss sbwait 0xffffff003403b144 pflogd 264 1 264 0 Ss pause 0xffffff0002e17510 adjkerntz 36 0 0 0 RL CPU 0 [g_journal mirror/gm] 35 0 0 0 SL gj:work 0xffffff0002dde800 [g_journal mirror/gm] 34 0 0 0 SL gj:work 0xffffff0002d8dc00 [g_journal mirror/gm] 33 0 0 0 SL gj:work 0xffffff0002d8d600 [g_journal mirror/gm] 32 0 0 0 SL gj:work 0xffffff0002d8c600 [g_journal mirror/gm] 31 0 0 0 SL gj:work 0xffffff0002d8c800 [g_journal mirror/gm] 30 0 0 0 SL m:w1 0xffffff0002d6f200 [g_mirror gm2] 29 0 0 0 SL m:w1 0xffffff0002d6c000 [g_mirror gm1] 28 0 0 0 SL m:w1 0xffffff0002d6e200 [g_mirror gm0] 27 0 0 0 RL [softdepflush] 26 0 0 0 SL vlruwt 0xffffff0002d388e0 [vnlru] 25 0 0 0 RL CPU 1 [syncer] 24 0 0 0 SL psleep 0xffffffff8087cc48 [bufdaemon] 23 0 0 0 SL pgzero 0xffffffff8088af4c [pagezero] 22 0 0 0 SL pollid 0xffffffff80858108 [idlepoll] 21 0 0 0 SL psleep 0xffffffff8088a2e8 [vmdaemon] 20 0 0 0 SL psleep 0xffffffff8088a2ac [pagedaemon] 19 0 0 0 SL ccb_scan 0xffffffff808244e0 [xpt_thrd] 18 0 0 0 SL suspwt 0xffffff0002df5c34 [g_journal switcher] 17 0 0 0 SL pftm 0xffffffff801cf420 [pfpurge] 16 0 0 0 SL (threaded) usb 100063 D - 0xffffff8000429e18 [usbus3] 100062 D - 0xffffff8000429dc0 [usbus3] 100061 D - 0xffffff8000429d68 [usbus3] 100060 D - 0xffffff8000429d10 [usbus3] 100058 D - 0xffffff8000420ef0 [usbus2] 100057 D - 0xffffff8000420e98 [usbus2] 100056 D - 0xffffff8000420e40 [usbus2] 100055 D - 0xffffff8000420de8 [usbus2] 100053 D - 0xffffff8000417ef0 [usbus1] 100052 D - 0xffffff8000417e98 [usbus1] 100051 D - 0xffffff8000417e40 [usbus1] 100050 D - 0xffffff8000417de8 [usbus1] 100048 D - 0xffffff800040eef0 [usbus0] 100047 D - 0xffffff800040ee98 [usbus0] 100046 D - 0xffffff800040ee40 [usbus0] 100045 D - 0xffffff800040ede8 [usbus0] 15 0 0 0 SL idle 0xffffff80003f3000 [mpt_recovery3] 9 0 0 0 SL idle 0xffffff80003de000 [mpt_recovery2] 8 0 0 0 SL idle 0xffffff80003b7300 [mpt_raid1] 7 0 0 0 SL idle 0xffffff80003b7000 [mpt_recovery1] 6 0 0 0 SL idle 0xffffff80003a2300 [mpt_raid0] 5 0 0 0 SL idle 0xffffff80003a2000 [mpt_recovery0] 14 0 0 0 SL - 0xffffffff80859004 [yarrow] 4 0 0 0 RL [g_down] 3 0 0 0 SL - 0xffffffff80854cc0 [g_up] 2 0 0 0 SL - 0xffffffff80854cb0 [g_event] 13 0 0 0 SL (threaded) ng_queue 100016 D sleep 0xffffffff807ecdf0 [ng_queue3] 100015 D sleep 0xffffffff807ecdf0 [ng_queue2] 100014 D sleep 0xffffffff807ecdf0 [ng_queue1] 100013 D sleep 0xffffffff807ecdf0 [ng_queue0] 12 0 0 0 RL (threaded) intr 100069 I [swi0: uart uart] 100068 I [irq12: psm0] 100067 I [irq1: atkbd0] 100066 I [irq17: ichsmb0] 100065 I [irq15: ata1] 100064 I [irq14: ata0] 100059 I [irq23: ehci0] 100054 I [irq18: uhci2] 100049 I [irq19: uhci1] 100044 I [irq16: uhci0] 100042 I [irq257: mpt3] 100040 I [irq256: mpt2] 100035 I [irq25: mpt1] 100032 I [irq26: mpt0] 100031 I [irq9: acpi0] 100029 I [swi5: +] 100028 I [swi2: cambio] 100022 I [swi6: task queue] 100021 I [swi6: Giant taskq] 100012 I [swi3: vm] 100011 I [swi4: clock] 100010 I [swi4: clock] 100009 I [swi4: clock] 100008 I [swi4: clock] 100007 Run CPU 3 [swi1: netisr 0] 11 0 0 0 RL (threaded) idle 100006 CanRun [idle: cpu0] 100005 CanRun [idle: cpu1] 100004 Run CPU 2 [idle: cpu2] 100003 CanRun [idle: cpu3] 1 0 1 0 SLs wait 0xffffff000234a8e0 [init] 10 0 0 0 SL audit_wo 0xffffffff808888d0 [audit] 0 0 0 0 SLs (threaded) kernel 100039 D - 0xffffff000250e480 [em1 taskq] 100038 D - 0xffffff000250c700 [em0 taskq] 100030 D - 0xffffff000242c500 [thread taskq] 100027 D - 0xffffff00023f7c00 [acpi_task_2] 100026 D - 0xffffff00023f7c00 [acpi_task_1] 100025 D - 0xffffff00023f7c00 [acpi_task_0] 100024 D - 0xffffff00023f7c80 [ffs_trim taskq] 100023 D - 0xffffff000242c180 [kqueue taskq] 100000 D sched 0xffffffff80854e00 [swapper] db:1:ps> show thread Thread 100089 at 0xffffff0002ded000: proc (pid 36): 0xffffff0002ddb470 name: g_journal mirror/gm stack: 0xffffff82452cd000-0xffffff82452d0fff flags: 0x10004 pflags: 0x210000 state: RUNNING (CPU 0) priority: 76 container lock: sched lock 0 (0xffffffff8085fcc0) db:1:thread> alltrace (!! NOT ALL GIVEN) Tracing command iozone pid 80395 tid 100387 td 0xffffff0042fe1460 sched_switch() at sched_switch+0xde mi_switch() at mi_switch+0x176 sleepq_wait() at sleepq_wait+0x42 _sleep() at _sleep+0x317 waitrunningbufspace() at waitrunningbufspace+0x4d bufwrite() at bufwrite+0x15b ffs_syncvnode() at ffs_syncvnode+0x201 ffs_fsync() at ffs_fsync+0x3f fsync() at fsync+0x155 amd64_syscall() at amd64_syscall+0x1f4 Xfast_syscall() at Xfast_syscall+0xfc --- syscall (95, FreeBSD ELF64, fsync), rip = 0x180885f9c, rsp = 0x7fffffffe7e8, rbp = 0x180a20000 --- Tracing command ls pid 80192 tid 100390 td 0xffffff01b00e6460 sched_switch() at sched_switch+0xde mi_switch() at mi_switch+0x176 sleepq_wait() at sleepq_wait+0x42 __lockmgr_args() at __lockmgr_args+0x83c ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x46 _vn_lock() at _vn_lock+0x47 vget() at vget+0x70 vfs_hash_get() at vfs_hash_get+0xea ffs_vgetf() at ffs_vgetf+0x48 ufs_lookup_ino() at ufs_lookup_ino+0xccb vfs_cache_lookup() at vfs_cache_lookup+0xf0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x454 namei() at namei+0x53a kern_statat_vnhook() at kern_statat_vnhook+0x8f kern_statat() at kern_statat+0x15 lstat() at lstat+0x2a amd64_syscall() at amd64_syscall+0x1f4 Xfast_syscall() at Xfast_syscall+0xfc --- syscall (190, FreeBSD ELF64, lstat), rip = 0x18098ff1c, rsp = 0x7fffffffe368, rbp = 0x180c276d0 --- Tracing command mount pid 80175 tid 100254 td 0xffffff00348aa8c0 sched_switch() at sched_switch+0xde mi_switch() at mi_switch+0x176 sleepq_wait() at sleepq_wait+0x42 _sleep() at _sleep+0x317 bwait() at bwait+0x64 bufwait() at bufwait+0x56 bread() at bread+0x1e ffs_balloc_ufs2() at ffs_balloc_ufs2+0xbd2 cgaccount() at cgaccount+0x357 ffs_snapshot() at ffs_snapshot+0xb33 ffs_mount() at ffs_mount+0x652 vfs_donmount() at vfs_donmount+0xdc5 nmount() at nmount+0x63 amd64_syscall() at amd64_syscall+0x1f4 Xfast_syscall() at Xfast_syscall+0xfc --- syscall (378, FreeBSD ELF64, nmount), rip = 0x1807b065c, rsp = 0x7fffffffdaa8, rbp = 0x180a080d8 --- Tracing command g_journal mirror/gm pid 36 tid 100089 td 0xffffff0002ded000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x180 g_journal_flush() at g_journal_flush+0x8c8 g_journal_worker() at g_journal_worker+0x14bb fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 --- Tracing command syncer pid 25 tid 100078 td 0xffffff0002d35460 cpustop_handler() at cpustop_handler+0x3a ipi_nmi_handler() at ipi_nmi_handler+0x30 trap() at trap+0x155 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff803d6736, rsp = 0xffffff800001efe0, rbp = 0xffffff8242f96930 --- panic() at panic+0x96 ffs_bufwrite() at ffs_bufwrite+0x2bc ffs_sbupdate() at ffs_sbupdate+0x98 ffs_sync() at ffs_sync+0x623 sync_fsync() at sync_fsync+0x16a sync_vnode() at sync_vnode+0x15e sched_sync() at sched_sync+0x1c9 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8242f96d00, rbp = 0 --- Tracing command g_journal switcher pid 18 tid 100071 td 0xffffff0002a00460 sched_switch() at sched_switch+0xde mi_switch() at mi_switch+0x176 sleepq_wait() at sleepq_wait+0x42 _sleep() at _sleep+0x317 vfs_write_suspend() at vfs_write_suspend+0x100 g_journal_switcher() at g_journal_switcher+0xd2b fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8242f73d00, rbp = 0 --- Tracing command intr pid 12 tid 100007 td 0xffffff0002359000 cpustop_handler() at cpustop_handler+0x3a ipi_nmi_handler() at ipi_nmi_handler+0x30 trap() at trap+0x155 nmi_calltrap() at nmi_calltrap+0x8 --- trap 0x13, rip = 0xffffffff803af7d5, rsp = 0xffffff800002cfe0, rbp = 0xffffff8000060bf0 --- ithread_loop() at ithread_loop+0x65 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8000060d00, rbp = 0 --- db:1:alltrace> capture off db:0:kdb.enter.panic> call doadump Dumping 2691 out of 8181 MB:..1%..11%..21%..31% ..41%..51%..61%..71%..81%..91% Dump complete = 0 db:0:kdb.enter.panic> reset The snapshot-mount sits on bwait(), iozone sits on waitrunningbufspace, maybe we must adapt some sysctl's ? From the dump I have (kgdb) p hirunningspace $9 = 1048576 (kgdb) p lorunningspace $10 = 524288 (kgdb) p runningbufspace $11 = 5687296 I need help to trigger this problem further down and can give any output from the written vmcore. -- Dr. Andreas Longwitz Data Service GmbH Beethovenstr. 2A 23617 Stockelsdorf Amtsgericht Lübeck, HRB 318 BS Geschäftsführer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau From owner-freebsd-geom@FreeBSD.ORG Thu Apr 26 15:55:28 2012 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DD4C106564A for ; Thu, 26 Apr 2012 15:55:28 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 1663A8FC21 for ; Thu, 26 Apr 2012 15:55:28 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 7D86F28424; Thu, 26 Apr 2012 17:47:21 +0200 (CEST) Received: from [192.168.1.2] (static-84-242-120-26.net.upcbroadband.cz [84.242.120.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id C75CB2842E; Thu, 26 Apr 2012 17:47:20 +0200 (CEST) Message-ID: <4F996E08.3060006@quip.cz> Date: Thu, 26 Apr 2012 17:47:20 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Andreas Longwitz References: <201204261520.q3QFKD3k003469@freefall.freebsd.org> In-Reply-To: <201204261520.q3QFKD3k003469@freefall.freebsd.org> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-geom@FreeBSD.org Subject: Re: kern/164252: [geom] gjournal overflow X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2012 15:55:28 -0000 I think you have too small journal partition. Can you try to reproduce it with bigger journal partition? (double the size) Or try to set shorter switch time (sysctl kern.geom.journal.switch_time) The default is 10 seconds, you can try 5. You can also try to monitor values of some sysctl kern.geom.journal during test. Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Fri Apr 27 13:59:26 2012 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E61231065680 for ; Fri, 27 Apr 2012 13:59:26 +0000 (UTC) (envelope-from longwitz@incore.de) Received: from dss.incore.de (dss.incore.de [195.145.1.138]) by mx1.freebsd.org (Postfix) with ESMTP id A084C8FC0A for ; Fri, 27 Apr 2012 13:59:26 +0000 (UTC) Received: from inetmail.dmz (inetmail.dmz [10.3.0.3]) by dss.incore.de (Postfix) with ESMTP id 9F12E5DCB5; Fri, 27 Apr 2012 15:59:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at incore.de Received: from dss.incore.de ([10.3.0.3]) by inetmail.dmz (inetmail.dmz [10.3.0.3]) (amavisd-new, port 10024) with LMTP id tIZ-25W2T4me; Fri, 27 Apr 2012 15:59:24 +0200 (CEST) Received: from mail.incore (fwintern.dmz [10.0.0.253]) by dss.incore.de (Postfix) with ESMTP id B88905DCA3; Fri, 27 Apr 2012 15:59:24 +0200 (CEST) Received: from bsdlo.incore (bsdlo.incore [192.168.0.84]) by mail.incore (Postfix) with ESMTP id AF3F845084; Fri, 27 Apr 2012 15:59:24 +0200 (CEST) Message-ID: <4F9AA63C.8010809@incore.de> Date: Fri, 27 Apr 2012 15:59:24 +0200 From: Andreas Longwitz User-Agent: Thunderbird 2.0.0.19 (X11/20090113) MIME-Version: 1.0 To: Miroslav Lachman <000.fbsd@quip.cz> References: <201204261520.q3QFKD3k003469@freefall.freebsd.org> <4F996E08.3060006@quip.cz> In-Reply-To: <4F996E08.3060006@quip.cz> Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: 7bit Cc: freebsd-geom@FreeBSD.org Subject: Re: kern/164252: [geom] gjournal overflow X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Apr 2012 13:59:27 -0000 Thank you very much for answer. > I think you have too small journal partition. > Can you try to reproduce it with bigger journal partition? (double the > size) There was a statement concerning size of journal in http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/027976.html, but that was 2006 and now we have a 2TB partition. > Or try to set shorter switch time (sysctl kern.geom.journal.switch_time) > The default is 10 seconds, you can try 5. > > You can also try to monitor values of some sysctl kern.geom.journal > during test. I will do some further tests and report the results. Andreas Longwitz