Date: Thu, 4 Jul 2013 20:13:14 +0200 From: Johan Hendriks <joh.hendriks@gmail.com> To: mxb <mxb@alumni.chalmers.se> Cc: freebsd-fs@freebsd.org Subject: Re: Slow resilvering with mirrored ZIL Message-ID: <CAOaKuAWLkE3TdF-rWpcaQoik29mwBqa-HdwHxyMGu8PVTYKUiA@mail.gmail.com> In-Reply-To: <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se> References: <CABBFC07-68C2-4F43-9AFC-920D8C34282E@unixconn.com> <51D42107.1050107@digsys.bg> <2EF46A8C-6908-4160-BF99-EC610B3EA771@alumni.chalmers.se> <51D437E2.4060101@digsys.bg> <E5CCC8F551CA4627A3C7376AD63A83CC@multiplay.co.uk> <CBCA1716-A3EC-4E3B-AE0A-3C8028F6AACF@alumni.chalmers.se> <20130704000405.GA75529@icarus.home.lan> <C8C696C0-2963-4868-8BB8-6987B47C3460@alumni.chalmers.se>
next in thread | previous in thread | raw e-mail | index | archive | help
Op donderdag 4 juli 2013 schreef mxb (mxb@alumni.chalmers.se) het volgende: > > Well, I'v got a lot of errors in dmesg regarding one of four disks, which > is not currently replaced. > After resilvering dropped to 2MB/s ( :O ), it stuck. I had to reboot > system. > It came up and now continues resilvering process at 9/s (9 what??? bytes? ) > > Here comes requested info: > > dmesg(plan was to re-new disks and then lift this sys up to STABLE): > > Copyright (c) 1992-2012 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 9.1-RELEASE-p1 #2: Fri Mar 8 10:30:38 CET 2013 > root@nas.home.unixconn.com:/usr/obj/usr/src/sys/GENERIC amd64 > module_register: module pci/em already exists! > Module pci/em failed to register: 17 > module_register: module pci/lem already exists! > Module pci/lem failed to register: 17 > CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3166.74-MHz K8-class > CPU) > Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping > = 6 > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=0xce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1> > AMD Features=0x20100800<SYSCALL,NX,LM> > AMD Features2=0x1<LAHF> > TSC: P-state invariant, performance statistics > real memory = 34359738368 (32768 MB) > avail memory = 33095909376 (31562 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: <PTLTD APIC > > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > ioapic0 <Version 2.0> irqs 0-23 on motherboard > ioapic1 <Version 2.0> irqs 24-47 on motherboard > kbd1 at kbdmux0 > ctl: CAM Target Layer loaded > acpi0: <PTLTD RSDT> on motherboard > acpi0: Power Button (fixed) > cpu0: <ACPI CPU> on acpi0 > cpu1: <ACPI CPU> on acpi0 > cpu2: <ACPI CPU> on acpi0 > cpu3: <ACPI CPU> on acpi0 > atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 > Event timer "RTC" frequency 32768 Hz quality 0 > attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1 > pci2: <ACPI PCI bus> on pcib2 > pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2 > pci3: <ACPI PCI bus> on pcib3 > pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3 > pci4: <ACPI PCI bus> on pcib4 > pcib5: <ACPI PCI-PCI bridge> at device 0.2 on pci3 > pci5: <ACPI PCI bus> on pcib5 > pcib6: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2 > pci6: <ACPI PCI bus> on pcib6 > em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x2000-0x201f mem > 0xdc000000-0xdc01ffff irq 18 at device 0.0 on pci6 > em0: Using an MSI interrupt > em0: Ethernet address: 00:30:48:34:46:36 > em1: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x2020-0x203f mem > 0xdc020000-0xdc03ffff irq 19 at device 0.1 on pci6 > em1: Using an MSI interrupt > em1: Ethernet address: 00:30:48:34:46:37 > pcib7: <ACPI PCI-PCI bridge> at device 0.3 on pci1 > pci7: <ACPI PCI bus> on pcib7 > pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0 > pci8: <ACPI PCI bus> on pcib8 > pcib9: <PCI-PCI bridge> at device 0.0 on pci8 > pci9: <PCI bus> on pcib9 > bce0: <HP NC380T PCIe DP Multifunc Gig Server Adapter (A2)> mem > 0xd8000000-0xd9ffffff irq 16 at device 4.0 on pci9 > miibus0: <MII bus> on bce0 > brgphy0: <BCM5706 1000BASE-T/SX media interface> PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce0: Ethernet address: 00:1b:78:38:2f:28 > bce0: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C > (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib10: <PCI-PCI bridge> at device 0.2 on pci8 > pci10: <PCI bus> on pcib10 > bce1: <HP NC380T PCIe DP Multifunc Gig Server Adapter (A2)> mem > 0xda000000-0xdbffffff irq 17 at device 5.0 on pci10 > miibus1: <MII bus> on bce1 > brgphy1: <BCM5706 1000BASE-T/SX media interface> PHY 1 on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > bce1: Ethernet address: 00:1b:78:38:2f:2a > bce1: ASIC (0x57060020); Rev (A2); Bus (PCI-X, 64-bit, 100MHz); B/C > (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI) > Coal (RX:6,6,18,18; TX:20,20,80,80) > pcib11: <ACPI PCI-PCI bridge> at device 6.0 on pci0 > pci11: <ACPI PCI bus> on pcib11 > pci0: <base peripheral> at device 8.0 (no driver attached) > pcib12: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0 > pci12: <ACPI PCI bus> on pcib12 > uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port > 0x1800-0x181f irq 17 at device 29.0 on pci0 > usbus0 on uhci0 > uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port > 0x1820-0x183f irq 19 at device 29.1 on pci0 > usbus1 on uhci1 > uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port > 0x1840-0x185f irq 18 at device 29.2 on pci0 > usbus2 on uhci2 > ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xdc500000-0xdc5003ff irq 17 > at device 29.7 on pci0 > usbus3: EHCI version 1.0 > usbus3 on ehci0 > pcib13: <ACPI PCI-PCI bridge> at device 30.0 on pci0 > pci13: <ACPI PCI bus> on pcib13 > vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem > 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on pci13 > isab0: <PCI-ISA bridge> at device 31.0 on pci0 > isa0: <ISA bus> on isab0 > ahci0: <Intel ESB2 AHCI SATA controller> port > 0x1890-0x1897,0x1884-0x1887,0x1888-0x188f,0x1880-0x1883,0x1860-0x187f mem > 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 > ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported > ahcich0: <AHCI channel> at channel 0 on ahci0 > ahcich1: <AHCI channel> at channel 1 on ahci0 > ahcich2: <AHCI channel> at channel 2 on ahci0 > ahcich3: <AHCI channel> at channel 3 on ahci0 > ahcich4: <AHCI channel> at channel 4 on ahci0 > ahcich5: <AHCI channel> at channel 5 on ahci0 > pci0: <serial bus, SMBus> at device 31.3 (no driver attached) > acpi_button0: <Power Button> on acpi0 > atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > ppc0: <Parallel port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode > ppc0: FIFO with 16/16/9 bytes threshold > ppbus0: <Parallel port bus> on ppc0 > plip0: <PLIP network interface> on ppbus0 > lpt0: <Printer> on ppbus0 > lpt0: Interrupt-driven port > ppi0: <Parallel I/O> on ppbus0 > orm0: <ISA Option ROMs> at iomem > 0xc0000-0xcafff,0xcb000-0xccfff,0xcd000-0xce7ff,0xce800-0xcffff on isa0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > fdc0: No FDOUT register! > est0: <Enhanced SpeedStep Frequency Control> on cpu0 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est0 attach returned 6 > p4tcc0: <CPU Frequency Thermal Control> on cpu0 > est1: <Enhanced SpeedStep Frequency Control> on cpu1 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est1 attach returned 6 > p4tcc1: <CPU Frequency Thermal Control> on cpu1 > est2: <Enhanced SpeedStep Frequency Control> on cpu2 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est2 attach returned 6 > p4tcc2: <CPU Frequency Thermal Control> on cpu2 > est3: <Enhanced SpeedStep Frequency Control> on cpu3 > est: CPU supports Enhanced Speedstep, but is not recognized. > est: cpu_vendor GenuineIntel, msr 4921492106004921 > device_attach: est3 attach returned 6 > p4tcc3: <CPU Frequency Thermal Control> on cpu3 > ZFS filesystem version 5 > ZFS storage pool version 28 > Timecounters tick every 1.000 msec > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 12Mbps Full Speed USB v1.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: <Intel> at usbus0 > uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 > ugen1.1: <Intel> at usbus1 > uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 > ugen2.1: <Intel> at usbus2 > uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 > ugen3.1: <Intel> at usbus3 > uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 > uhub0: 2 ports with 2 removable, self powered > uhub1: 2 ports with 2 removable, self powered > uhub2: 2 ports with 2 removable, self powered > bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = > 0x00000004) > bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = > 0x00000004) > uhub3: 6 ports with 6 removable, self powered > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: <ST32000645NS 0004> ATA-8 SATA 3.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: <WDC WD10EARS-00Z5B1 80.00A80> ATA-8 SATA 2.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 > ada2: <WDC WD10EARS-00Z5B1 80.00A80> ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada2: Previously was known as ad8 > ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 > ada3: <ST32000645NS 0004> ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada3: Previously was known as ad10 > ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 > ada4: <INTEL SSDSA2VP020G2 2CV102M5> ATA-7 SATA 2.x device > ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada4: Command Queueing enabled > ada4: 19087MB (39091248 512 byte sectors: 16H 63S/T 16383C) > ada4: Previously was known as ad12 > ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 > ada5: <OCZ-AGILITY3 2.15> ATA-8 SATA 3.x device > ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada5: Command Queueing enabled > ada5: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C) > ada5: Previously was known as ad14 > SMP: AP CPU #3 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 12370070 Hz quality 1000 > Root mount waiting for: usbus3 > ugen0.2: <Hewlett-Packard Company> at usbus0 > ukbd0: <Hewlett-Packard Company HP USB Smart Card Keyboard, class 0/0, rev > 2.00/5.35, addr 2> on usbus0 > kbd2 at ukbd0 > Root mount waiting for: usbus3 > ugen3.2: <Generic> at usbus3 > umass0: <Generic USB Storage, class 0/0, rev 2.00/2.50, addr 2> on usbus3 > umass0: SCSI over Bulk-Only; quirks = 0x4100 > umass0:7:0:-1: Attached to scbus7 > Trying to mount root from zfs:NAS/system/root [rw]... > da0 at umass-sim0 bus 0 scbus7 target 0 lun 0 > da0: <Generic STORAGE DEVICE 0250> Removable Direct Access SCSI-0 device > da0: 40.000MB/s transfers > da0: 7580MB (15523840 512 byte sectors: 255H 63S/T 966C) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > GEOM_PART: integrity check failed (label/zfs_boot, BSD) > bce0: Gigabit link up! > bce1: Gigabit link up! > > zpool status: > pool: NAS > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Tue Jul 2 07:31:29 2013 > 483G scanned out of 3.49T at 10/s, (scan is slow, no estimated > time) > 121G resilvered, 13.52% done > config: > > NAME STATE READ WRITE CKSUM > NAS DEGRADED 0 0 0 > raidz1-0 DEGRADED 0 0 0 > replacing-0 DEGRADED 0 0 0 > 1160414651350745057 UNAVAIL 0 0 0 was > /dev/ada0/old > ada0 ONLINE 0 0 0 > (resilvering) > ada3 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > ada4s1 ONLINE 0 0 0 > ada5s1 ONLINE 0 0 0 > cache > ada5s2 ONLINE 0 0 0 > > errors: No known data errors > > zpool get all: > NAME PROPERTY VALUE SOURCE > NAS size 3.64T - > NAS capacity 95% - > NAS altroot - default > NAS health DEGRADED - > NAS guid 3808946822857359331 default > NAS version 28 default > NAS bootfs - default > NAS delegation on default > NAS autoreplace off default > NAS cachefile - default > NAS failmode wait default > NAS listsnapshots off default > NAS autoexpand off default > NAS dedupditto 0 default > NAS dedupratio 1.00x - > NAS free 154G - > NAS allocated 3.49T - > NAS readonly off - > NAS comment - default > NAS expandsize 0 - > > zfs get all: > NAME PROPERTY VALUE SOURCE > NAS type filesystem - > NAS creation Tue Aug 3 20:10 2010 - > NAS used 2.61T - > NAS available 72.0G - > NAS referenced > > > > > On 4 jul 2013, at 02:04, Jeremy Chadwick <jdc@koitsu.org> wrote: > > > On Wed, Jul 03, 2013 at 05:12:17PM +0200, mxb wrote: > >> > >> Not sure if new are 4k. Done nothing about that. > >> But the SECOND drive, resilvering is SLOW. Not the first one. > >> > >> As stated below. Those changes are introduced to the system. > >> ALL new driver ARE identical, except S/N of cause :) > >> > >> On 3 jul 2013, at 16:55, Steven Hartland <killing@multiplay.co.uk> > wrote: > >> > >>> > >>> ----- Original Message ----- From: "Daniel Kalchev" <daniel@digsys.bg> > >>> To: "mxb" <mxb@alumni.chalmers.se> > >>> Cc: <freebsd-fs@freebsd.org> > >>> Sent: Wednesday, July 03, 2013 3:40 PM > >>> Subject: Re: Slow resilvering with mirrored ZIL > >>> > >>> > >>>> On 03.07.13 16:36, mxb wrote: > >>>>> Well, then my question persists - why I get so significant drop of > speed while resilvering second drive. > >>>>> The only changes to the system are: > >>>>> > >>>>> 1. Second partition for ZIL to create a mirror > >>>>> 2. New disks are 7200rpm. old ones are 5400rpm. > >>>>> > >>> > >>> Its not something like the old disks are 512byte sectors > >>> where as the new ones are 4k? > >>> > >>> It this is the case having already replaced one disk you've > >>> killed performance as its having to do lots more work reading > >>> none 4k aligned data? > > > > Not enough hard information to diagnose. What's needed is below; you > > can XXX out system names if you want, but please do not XXX out anything > > else. > > > > - Output from: dmesg > > > > - Output from: zpool status > > > > - Output from: zpool get all > > > > - Output from: zfs get all > > > > - Output from: "gpart show -p" for every disk on the system > > > > - Output from: cat /etc/sysctl.conf > > > > - Output from: cat /boot/loader.conf > > > > - Output from: "smartctl -a" for every disk that's used by ZFS > > (please use smartmontools 6.1 or newer in this case; install > > ports/sysutils/smartmontools) > > > > I can think o Could it be that your pool is at a capacity of 95%.. In a lot of cases this slows down a pool. Maybe this is what you see. Regards Johan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOaKuAWLkE3TdF-rWpcaQoik29mwBqa-HdwHxyMGu8PVTYKUiA>