From owner-freebsd-geom@FreeBSD.ORG Sun Jan 24 01:02:28 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 792E51065670 for ; Sun, 24 Jan 2010 01:02:28 +0000 (UTC) (envelope-from armyintelli@reliancemail.net) Received: from out.rilinfo.net (out42.rilinfo.net [202.138.96.42]) by mx1.freebsd.org (Postfix) with ESMTP id 31C558FC18 for ; Sun, 24 Jan 2010 01:02:20 +0000 (UTC) Received: from unknown (HELO Kalpesh-PC) ([123.237.96.195]) by smtpauth5.rilinfo.net with ESMTP; 23 Jan 2010 23:30:34 +0000 Message-ID: <20100123.CZKDHFYVWARALKVT@bsnl.in> From: "Kalpesh Sharma" To: Date: Sun, 24 Jan 2010 05:01:50 +0530 Importance: Normal MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_967_7612_84119958.86796714" EM-Campaign: {939ED5F0-B126-41B4-AB19-94AF7D9F72BD} EM-Task: 78 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Kalpesh Sharma CV - 12 years Experienced Specialist with World Records. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kalpesh Sharma List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jan 2010 01:02:28 -0000 This message is in MIME format with multi-part. Since your mail reader does not understand this format, some or all of this message may not be legible. ------=_NextPart_967_7612_84119958.86796714 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Respected=20Sir/Madam,=20 =20 It's=20my=20pleasure=20to=20get=20in=20touch=20with=20you.=20I=20am=20an=20e= xpert=20with=20world=20records=20and=20exceptional=20achievements=20in=20my=20= field.=20I=20am=20attaching=20my=20resume=20with=20this=20email.=20I=20kindly= =20request=20you=20to=20have=20a=20look=20incase=20my=20skills,=20expertise,=20= experience,=20etc.=20can=20play=20an=20primary=20role=20for=20your=20company'= s=20growth.=20I=20assure=20you=20with=20full=20confidence=20that=20once=20giv= en=20an=20opportunity,=20I=20will=20work=20hard=20to=20do=20as=20much=20as=20= possible=20best=20for=20the=20company.=20Because=20I=20believe=20that=20the=20= company=20for=20which=20I=20work=20is=20not=20just=20a=20place=20to=20fulfill= =20formality=20to=20come=20in=20morning=20and=20go=20back=20in=20evening,=20b= ut=20instead=20the=20company=20for=20which=20I=20work=20is=20like=20my=20home= .=20And=20a=20person=20who=20thinks=20his=20work=20place(company)=20his=20hom= e,=20will=20take=20care=20of=20his=20company=20in=20the=20same=20way=20as=20h= e=20takes=20care=20of=20the=20safety,=20growth=20and=20security=20of=20his=20= own=20home. =20 Thanking=20you, =20 Sincerely =20 Kalpesh=20Sharma =20 Note=20For=20Entertainment=20Industry=20Only:=20I=20am=20ready=20for=20worki= ng=20100%=20free=20of=20cost=20for=20short=20films,=20ad=20films,=20corporate= =20films=20and=20full=20featured=20films.=20And=20have=20more=20then=20suffic= ient=20knowledge=20and=20medium=20acting=20skills=20to=20give=20the=20best=20= delivery=20of=20my=20services.=20I=20will=20do=20free=20of=20cost=20just=20to= =20gain=20professional=20experience=20in=20entertainment=20industry.=20So,=20= only=20I=20will=20need=20is=20a=20certificate=20of=20good=20performance=20and= =20experience=20in=20entertainment=20industry.=20So,=20if=20you=20do=20not=20= have=20enough=20budget=20then=20feel=20free=20to=20get=20in=20touch=20with=20= me.=20However,=20the=20costs=20of=20food,=20accomodation=20and=20travel=20tha= t=20are=20an=20essential=20part=20of=20shooting=20will=20not=20be=20paid=20fr= om=20my=20pocket.=20I=20will=20give=20service=20100%=20free=20of=20cost=20wit= hout=20charging=20anything,=20but=20at=20the=20same=20time=20cost=20of=20acco= modation,=20food=20and=20travel=20will=20have=20to=20be=20borne=20by=20the=20= production=20company.=20As=20far=20as=20it=20concerns=20to=20delivery=20of=20= service=20and=20performance,=20I=20will=20try=20to=20do=20the=20best=20of=20b= est=20and=20assure=20you=20of=20satisfaction=20for=20the=20same=20at=20your=20= level. =20 My=20Quote:=20Marketing=20is=20an=20art=20of=20selling.=20Companies=20employ= ed=20marketing=20executives=20till=20date,=20but=20now=20there=20should=20be=20= a=20change.=20Start=20employing=20marketing=20actors. =20 My=20Primary=20Challenging=20Skill=20for=20Executive=20Position:=20I=20have=20= a=20lot=20of=20ideas=20for=20every=20industry=20sector.=20Once=20given=20an=20= opportunity,=20I=20challenge=20to=20prove=20my=20work=20with=20practical=20re= sults=20rather=20then=20floating=20in=20dream=20world=20and=20describing=20it= =20theoretically.=20I=20have=20worked=20at=20sophisticated=20levels=20and=20k= now=20those=20ideas,=20which=20will=20generate=20excellent=20revenue=20for=20= the=20companies.=20I=20research=20and=20then=20generate=20ideas,=20along=20wi= th=20taking=20care=20of=20every=20pros=20and=20cons=20involved. =20 Detailed=20Information=20of=20My=20Skills=20(Top=2015):=20 =20 Online=20Marketing=20Expert,=20Internet=20Advertising=20Expert,=20SEO,=20SEM= ,=20etc.=20 Article=20Writing=20Expert,=20Article=20Top=20Ranking=20Techniques,=20Bulk=20= Article=20Submissions.=20 Blog=20Marketing=20Expert,=20Forum=20Marketing=20Expert,=20Social=20Networki= ng=20Expert.=20 Media=20(Unique=20&=20Exclusive=20News),=20Arts=20&=20Entertainment=20Indust= ry=20Artist,=20Reporter,=20Writer,=20Assistant=20Director/=20Assistant=20Prod= ucer,=20etc.=20(Portfolio=20Images=20Attached=20for=20Entertainment=20Industr= y=20Related=20Jobs=20on=20http://www.desitara.com/shriganesh)=20 Networking=20(LAN=20of=20Wired/Wireless=20up=20to=2025=20PC).=20 Hardware=20Troubleshooting,=20Operating=20System=20Installations=20(Server/D= esktop/Laptop).=20 Information=20Security=20Expert,=20Ethical=20Hacking,=20Penetration=20Testin= g=20Expert,=20Vulnerability=20Assessment=20and=20Solutions.=20 Market=20Research,=20Web=20Research,=20Industry=20Research,=20Subject=20Spec= ific=20&=20Geo=20Specific=20Research.=20 Data=20Extraction,=20Email=20Extraction,=20Contact=20&=20Address=20Extractio= n.=20 Competitor=20Research,=20Analysis=20&=20Technical=20Intelligence=20 Business=20Development=20Director,=20Business=20Management,=20Business=20Pla= nning,=20Creative=20New=20Business=20Ideas=20Innovation,=20Planning=20and=20I= mplementing=20Business=20Strategies.=20 Linksys=20Wired/Wireless=20Router=20Configuration=20WAG325N=20&=20WRVS=20440= 0N.=20 Trainee/Assistant=20to=20Network=20Engineer=20Wide=20LAN/WAN=20Network,=20Co= rporate=20Network,=20Web=20Designing,=20Web=20Development,=20Web=20Programmin= g,=20Software=20Programming,=20Software=20Development,=20Software=20Engineeri= ng,=20Research=20&=20Development,=20Executive/Director/Corporate/Management=20= Levels,=20Software=20Testing=20Client/Server=20Level,=20Cisco=20Router=20CCNA= ,=20CCNP,=20CCIE=20Specialist,=20Sap=20Specialist,=20CISSP/CISA=20Specialist.= =20 Almost=20All=20Types=20of=20Administrative=20&=20Management=20Skills.=20 Moderate=20Legal=20Working=20Knowledge. ------=_NextPart_967_7612_84119958.86796714-- From owner-freebsd-geom@FreeBSD.ORG Sun Jan 24 22:58:57 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A8A3106566B; Sun, 24 Jan 2010 22:58:57 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 0533A8FC08; Sun, 24 Jan 2010 22:58:56 +0000 (UTC) Received: from elsa.codelab.cz (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 968AD19E023; Sun, 24 Jan 2010 23:58:55 +0100 (CET) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 4698219E019; Sun, 24 Jan 2010 23:58:53 +0100 (CET) Message-ID: <4B5CD0AC.7090302@quip.cz> Date: Sun, 24 Jan 2010 23:58:52 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.7) Gecko/20100104 SeaMonkey/2.0.2 MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: multipart/mixed; boundary="------------090909040006080506080000" Cc: freebsd-rc@FreeBSD.org Subject: ordering problem with gjournal and iSCSI initiator X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jan 2010 22:58:57 -0000 This is a multi-part message in MIME format. --------------090909040006080506080000 Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Hi, I don't know what mailing list is better for this problem, so I post to both. I have server with local partition using gjournal (mfid0s2f.journal). It is mounted from fstab by this line: /dev/mfid0s2f.journal /vol0 ufs rw,async,nosuid,noexec,noatime 2 2 Then I have storage connected by iSCSI initiator by attached rc script. It should be mounted by this script according to fstab.iscsi line: /dev/da0p1.journal /vol1 ufs rw,async,nosuid,noexec,noatime 2 2 But there is a problem, both partitions are journaled and journal is on local disk. It means that only the first partition is checked and mounted at boot time. GEOM_JOURNAL: Journal 2395012627: mfid0s2d contains journal. GEOM_JOURNAL: Journal 1544711416: mfid0s2e contains journal. GEOM_JOURNAL: Journal 2395012627: mfid0s2f contains data. GEOM_JOURNAL: Journal mfid0s2f clean. GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s2d. GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s2f. Root mount waiting for: GJOURNAL Root mount waiting for: GJOURNAL Root mount waiting for: GJOURNAL Root mount waiting for: GJOURNAL Root mount waiting for: GJOURNAL GEOM_JOURNAL: Timeout. Journal gjournal 1544711416 cannot be completed. This is because second data provider is not available at this time. rc.d/iscsi is executed later. (after NETWORKING, SERVERS, before DAEMON) But da0p1.journal is not created later (after iscsi created /dev/da0) with this message: GEOM_JOURNAL: Journal 1544711416: da0p1 contains data. GEOM_JOURNAL: Timeout. Journal gjournal 1544711416 cannot be completed. And that's why /vol1 can't be mounted by rc.d/iscsi. My question is: Is there any "Right Way" to handle this case? How can I "create" da0p1.journal later and complete the gjournal + mount? The device da0p1.journal is only created if I manually do: unmount /vol0 gjournal stop mfid0s2f.journal kldunload geom_journal kldload geom_journal Then there are both entries created: /dev/mfid0s2f.journal /dev/da0p1.journal And both can be manualy mounted Miroslav Lachman --------------090909040006080506080000 Content-Type: text/plain; name="iscsi.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="iscsi.txt" #!/bin/sh # PROVIDE: iscsi # REQUIRE: NETWORKING # BEFORE: DAEMON # KEYWORD: nojail shutdown # # Add the following lines to /etc/rc.conf to enable iscsi: # # iscsi_enable="YES" # iscsi_fstab="/etc/fstab.iscsi" . /etc/rc.subr name=iscsi rcvar=`set_rcvar` command=/sbin/iscontrol iscsi_enable=${iscsi_enable:-"NO"} iscsi_fstab=${iscsi_fstab:-"/etc/fstab.iscsi"} iscsi_exports=${iscsi_exports:-"/etc/exports.iscsi"} iscsi_debug=${iscsi_debug:-0} start_cmd="iscsi_start" faststop_cmp="iscsi_stop" stop_cmd="iscsi_stop" iscsi_wait() { dev=$1 trap "echo 'wait loop cancelled'; exit 1" 2 count=0 while true; do if [ -c $dev ]; then break; fi if [ $count -eq 0 ]; then echo -n Waiting for ${dev}': ' fi count=$((${count} + 1)) if [ $count -eq 6 ]; then echo " Failed for dev=$dev" return 0 break fi echo -n '.' sleep 5; done echo "$dev ok." return 1 } iscsi_start() { # # load needed modules for m in iscsi_initiator geom_label; do kldstat -qm $m || kldload $m done sysctl debug.iscsi_initiator=$iscsi_debug # # start iscontrol for each target if [ -n "${iscsi_targets}" ]; then for target in ${iscsi_targets}; do ${command} ${rc_flags} -n ${target} done fi if [ -f "${iscsi_fstab}" ]; then while read spec file type opt t1 t2 do case ${spec} in \#*|'') ;; *) if iscsi_wait ${spec}; then break; fi echo type=$type spec=$spec file=$file fsck -p ${spec} && mkdir -p ${file} && mount ${spec} ${file} chmod 755 ${file} ;; esac done < ${iscsi_fstab} fi if [ -f "${iscsi_exports}" ]; then cat ${iscsi_exports} >> /etc/exports #/etc/rc.d/mountd reload kill -1 `cat /var/run/mountd.pid` fi } iscsi_stop() { echo 'iscsi stopping' while read spec file type opt t1 t2 do case ${spec} in \#*|'') ;; *) echo iscsi: umount $spec umount -fv $spec ;; esac done < ${iscsi_fstab} } load_rc_config $name run_rc_command "$1" --------------090909040006080506080000-- From owner-freebsd-geom@FreeBSD.ORG Mon Jan 25 11:07:01 2010 Return-Path: Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 474991065697 for ; Mon, 25 Jan 2010 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 339D48FC2A for ; Mon, 25 Jan 2010 11:07:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id o0PB71kg038771 for ; Mon, 25 Jan 2010 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id o0PB705x038769 for freebsd-geom@FreeBSD.org; Mon, 25 Jan 2010 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 25 Jan 2010 11:07:00 GMT Message-Id: <201001251107.o0PB705x038769@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jan 2010 11:07:01 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/142563 geom [geom] [hang] ioctl freeze in zpool f kern/142365 geom [geom] FreeBSD RAID1 (gmirror) is much slower than Lin o kern/141740 geom [geom] gjournal(8): g_journal_destroy concurrent error o kern/140352 geom [geom] gjournal + glabel not working o kern/139847 geom [geom_mbr] load/unload causes system to hang o kern/135898 geom [geom] Severe filesystem corruption - large files or l o kern/134922 geom [gmirror] [panic] kernel panic when use fdisk on disk o kern/134113 geom [geli] Problem setting secondary GELI key o kern/134044 geom [geom] gmirror(8) overwrites fs with stale data from r o kern/133931 geom [geli] [request] intentionally wrong password to destr o bin/132845 geom [geom] [patch] ggated(8) does not close files opened a o kern/132273 geom glabel(8): [patch] failing on journaled partition f kern/132242 geom [gmirror] gmirror.ko fails to fully initialize o kern/131353 geom [geom] gjournal(8) kernel lock p docs/130548 geom [patch] gjournal(8) man page is missing sysctls o kern/129674 geom [geom] gjournal root did not mount on boot o kern/129645 geom gjournal(8): GEOM_JOURNAL causes system to fail to boo o kern/129245 geom [geom] gcache is more suitable for suffix based provid f kern/128276 geom [gmirror] machine lock up when gmirror module is used f kern/126902 geom [geom] geom_label: kernel panic during install boot o kern/124973 geom [gjournal] [patch] boot order affects geom_journal con o kern/124969 geom gvinum(8): gvinum raid5 plex does not detect missing s f kern/124294 geom [geom] gmirror(8) have inappropriate logic when workin o kern/123962 geom [panic] [gjournal] gjournal (455Gb data, 8Gb journal), o kern/123122 geom [geom] GEOM / gjournal kernel lock o kern/122738 geom [geom] gmirror list "losts consumers" after gmirror de f kern/122415 geom [geom] UFS labels are being constantly created and rem o kern/122067 geom [geom] [panic] Geom crashed during boot o kern/121559 geom [patch] [geom] geom label class allows to create inacc o kern/121364 geom [gmirror] Removing all providers create a "zombie" mir o kern/120091 geom [geom] [geli] [gjournal] geli does not prompt for pass o kern/119743 geom [geom] geom label for cds is keeped after dismount and o kern/115856 geom [geli] ZFS thought it was degraded when it should have o kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113419 geom [geom] geom fox multipathing not failing back p bin/110705 geom gmirror(8) control utility does not exit with correct o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/94632 geom [geom] Kernel output resets input while GELI asks for o kern/90582 geom [geom] [panic] Restore cause panic string (ffs_blkfree o bin/90093 geom fdisk(8) incapable of altering in-core geometry a kern/89660 geom [vinum] [patch] [panic] due to g_malloc returning null o kern/89546 geom [geom] GEOM error o kern/88601 geom [geli] geli cause kernel panic under heavy disk usage o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo o kern/84556 geom [geom] [panic] GBDE-encrypted swap causes panic at shu o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/79035 geom [vinum] gvinum unable to create a striped set of mirro o bin/78131 geom gbde(8) "destroy" not working. s kern/73177 geom kldload geom_* causes panic due to memory exhaustion 54 problems total. From owner-freebsd-geom@FreeBSD.ORG Tue Jan 26 01:40:05 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 563DE106566B for ; Tue, 26 Jan 2010 01:40:05 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id ECCBB8FC08 for ; Tue, 26 Jan 2010 01:40:04 +0000 (UTC) Received: from elsa.codelab.cz (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 150E319E023 for ; Tue, 26 Jan 2010 02:40:03 +0100 (CET) Received: from [192.168.1.2] (r5bb235.net.upc.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 79FD019E019 for ; Tue, 26 Jan 2010 02:39:59 +0100 (CET) Message-ID: <4B5E47EE.4020001@quip.cz> Date: Tue, 26 Jan 2010 02:39:58 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.7) Gecko/20100104 SeaMonkey/2.0.2 MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Subject: how can I recovery GPT table? X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jan 2010 01:40:05 -0000 I got logged this message on one machine: GEOM: da0: the secondary GPT table is corrupt or invalid. GEOM: da0: using the primary only -- recovery suggested. It is after some playing with glabel and gjournal, so it was my fault. Is there any way how can I fix this problem? (everything is working fine... for now) Or at least - how can I create backup of GPT tables to later restores? Miroslav Lachman From owner-freebsd-geom@FreeBSD.ORG Fri Jan 29 23:23:09 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30829106566B for ; Fri, 29 Jan 2010 23:23:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 819D28FC08 for ; Fri, 29 Jan 2010 23:23:08 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o0TNBAIS088873 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o0TNBA01055788; Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o0TNBAGH055787; Sat, 30 Jan 2010 01:11:10 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 30 Jan 2010 01:11:10 +0200 From: Kostik Belousov To: Alexander Motin Message-ID: <20100129231110.GS3877@deviant.kiev.zoral.com.ua> References: <4B636812.8060403@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zhl+qcI0cpCDfCbW" Content-Disposition: inline In-Reply-To: <4B636812.8060403@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 23:23:09 -0000 --zhl+qcI0cpCDfCbW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > Hi. >=20 > Experimenting with SATA hot-plug I've found quite repeatable deadlock > case. Problem observed when several SATA devices, opened via devfs, > disappear at exactly same time. In my case, at time of unplugging SATA > Port Multiplier with several disks beyond it. All I have to do is to run > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug > multiplier. That causes predictable I/O errors and devices destruction. > But with high probability several dd processes getting stuck in kernel. >=20 > I've discovered such pieces of problem: > - CAM receives disconnect event and starts device destruction. But as > device is still opened, it can't do it immediately. > - dd receives I/O error and exits. > - exit1() call closes all descriptors, including adaX device. It > triggers final device destruction, by sending event to geom_dev. >=20 > adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521 > g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4 > g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb > g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425 > devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762 > VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55 > vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750 > vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854 > devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b > _fdrop(4566da48,4569fd80,7b604b8c,408b5cec,0,4569fe24,40eb23a8,40d10460,4= 0c1a8bb,4560672c,721,40c1a8b2,7b604bb4,40878220,4560672c,8,40c1a8b2,721) > at 0x40836da3 > closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0 > fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da > exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423 > sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd > syscall(7b604d38) at 0x40b565c0 >=20 > - GEOM event thread tries to destroy /dev/adaX device (which should be > already free at this moment), but for some reason freezes, waiting for > device to be freed: >=20 > 0 2 0 0 -8 0 0 8 devdrn DL ?? 0:02.89 > [g_event] >=20 > - as GEOM event is still not handled, exit1() waits for it: >=20 > kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909 > g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f > exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431 > sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd > syscall(7b604d38) at 0x40b565c0 >=20 > - system stationary. GEOM frozen. No way to get out of this, except > pushing reset. >=20 > 0 1065 1055 0 44 0 5344 3040 g_wait DE 0 0:00.43 dd > if=3D/dev/ada1 of=3D/dev/null bs=3D1m > 0 1066 1055 0 44 0 5344 3040 GEOM t DE 0 0:00.07 dd > if=3D/dev/ada2 of=3D/dev/null bs=3D1m >=20 >=20 > So, does anybody have good idea why destroy_dev() can't complete? The devdrn state means that thread performing the device destruction, i.e. the thread called destroy_dev(), is waiting for threads to leave the cdevsw d_* methods. The thread that notified the destruction thread did that from d_close() method. This resulted in the deadlock. I introduced destroy_dev_sched(9) KPI to handle this and similar issues. Note that race-free use of destroy_dev_sched(9) is quite hard. --zhl+qcI0cpCDfCbW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktjaw0ACgkQC3+MBN1Mb4g4CgCg5qoXeNLMYgbyuZhwAZYQtX/g F4UAoOF3rYGBwcwwsat2EykHAGqEog0e =Rkef -----END PGP SIGNATURE----- --zhl+qcI0cpCDfCbW-- From owner-freebsd-geom@FreeBSD.ORG Fri Jan 29 23:23:32 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32C39106568D; Fri, 29 Jan 2010 23:23:32 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com [209.85.220.227]) by mx1.freebsd.org (Postfix) with ESMTP id 63F688FC13; Fri, 29 Jan 2010 23:23:31 +0000 (UTC) Received: by fxm27 with SMTP id 27so832917fxm.3 for ; Fri, 29 Jan 2010 15:23:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=LtEFmPGthK3fbozT5A2Sd4pzqLPAYGeXNOpggIFXl2I=; b=MUY3bZha8/l9HHbORTWYd2BY5netrT3cp8mlnJ9DzXa1zrCLS/WZbtvNxmnzgM/amI 1in7h8GkAMobFPtHDPB2LkuhcKcuM8ilgbQKofB3C00kSkt22oAbJMCPmSyG3D6BcbE/ G1Mc2QvKtz8Q9ESfQQipBlGwkWnV5aTKfnrw4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=JtZBYRduj2DfrG22ICVJ82hPr/6ebOuzGIS8PkRtaxQP5ZHAvlTD4UOcpH0KDajazt SeCL+3+IEzbxXtyH2Ew2riLWSE2PLmKpNtKEoJBJyDJUne65Bu5ldIRsRFBoFPoSjl19 gtTVO1gQtzy4W4ODEO0FI7FtGH442efU23ip0= Received: by 10.102.235.36 with SMTP id i36mr747418muh.56.1264807410244; Fri, 29 Jan 2010 15:23:30 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id j9sm11120931mue.6.2010.01.29.15.23.29 (version=SSLv3 cipher=RC4-MD5); Fri, 29 Jan 2010 15:23:29 -0800 (PST) Sender: Alexander Motin Message-ID: <4B636DEA.2060901@FreeBSD.org> Date: Sat, 30 Jan 2010 01:23:22 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Kostik Belousov References: <4B636812.8060403@FreeBSD.org> <20100129231110.GS3877@deviant.kiev.zoral.com.ua> In-Reply-To: <20100129231110.GS3877@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 23:23:32 -0000 Kostik Belousov wrote: > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: >> Hi. >> >> Experimenting with SATA hot-plug I've found quite repeatable deadlock >> case. Problem observed when several SATA devices, opened via devfs, >> disappear at exactly same time. In my case, at time of unplugging SATA >> Port Multiplier with several disks beyond it. All I have to do is to run >> several `dd if=/dev/adaX of=/dev/null bs=1m &` commands and unplug >> multiplier. That causes predictable I/O errors and devices destruction. >> But with high probability several dd processes getting stuck in kernel. >> >> I've discovered such pieces of problem: >> - CAM receives disconnect event and starts device destruction. But as >> device is still opened, it can't do it immediately. >> - dd receives I/O error and exits. >> - exit1() call closes all descriptors, including adaX device. It >> triggers final device destruction, by sending event to geom_dev. >> >> adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521 >> g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4 >> g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb >> g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425 >> devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762 >> VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55 >> vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750 >> vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854 >> devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b >> _fdrop(4566da48,4569fd80,7b604b8c,408b5cec,0,4569fe24,40eb23a8,40d10460,40c1a8bb,4560672c,721,40c1a8b2,7b604bb4,40878220,4560672c,8,40c1a8b2,721) >> at 0x40836da3 >> closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0 >> fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da >> exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423 >> sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd >> syscall(7b604d38) at 0x40b565c0 >> >> - GEOM event thread tries to destroy /dev/adaX device (which should be >> already free at this moment), but for some reason freezes, waiting for >> device to be freed: >> >> 0 2 0 0 -8 0 0 8 devdrn DL ?? 0:02.89 >> [g_event] >> >> - as GEOM event is still not handled, exit1() waits for it: >> >> kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909 >> g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f >> exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431 >> sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd >> syscall(7b604d38) at 0x40b565c0 >> >> - system stationary. GEOM frozen. No way to get out of this, except >> pushing reset. >> >> 0 1065 1055 0 44 0 5344 3040 g_wait DE 0 0:00.43 dd >> if=/dev/ada1 of=/dev/null bs=1m >> 0 1066 1055 0 44 0 5344 3040 GEOM t DE 0 0:00.07 dd >> if=/dev/ada2 of=/dev/null bs=1m >> >> >> So, does anybody have good idea why destroy_dev() can't complete? > > The devdrn state means that thread performing the device destruction, > i.e. the thread called destroy_dev(), is waiting for threads to leave > the cdevsw d_* methods. The thread that notified the destruction thread > did that from d_close() method. This resulted in the deadlock. d_close() doesn't call destroy_dev() directly. It schedules different thread to do it. destroy_dev() should run after d_close() already complete. Though I haven't checked how it is locked. > I introduced destroy_dev_sched(9) KPI to handle this and similar issues. > Note that race-free use of destroy_dev_sched(9) is quite hard. I think it should work without it here. Shouldn't it? -- Alexander Motin From owner-freebsd-geom@FreeBSD.ORG Fri Jan 29 23:27:25 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8FAD1065670 for ; Fri, 29 Jan 2010 23:27:25 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com [209.85.220.227]) by mx1.freebsd.org (Postfix) with ESMTP id 6934F8FC13 for ; Fri, 29 Jan 2010 23:27:25 +0000 (UTC) Received: by fxm27 with SMTP id 27so834978fxm.3 for ; Fri, 29 Jan 2010 15:27:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:subject:x-enigmail-version:content-type :content-transfer-encoding; bh=2k8+4l3Yg/0zvW1bOdz0HOrERSM3U5XWnbUKlamQvPI=; b=huahRTR2a/+2G/Kx9uG1eohPGwICYPai0Ay3BDSGG4RYNdInL4xuwmSKpAOY81I7sE X3Zp8qGkzsPD0pT5AoPz2v0XMI2V5dLGDx1x/4XqWRgWaXKpOpgg45MJcFdKuWkZzt4O ToXbqOT/qq7Yc+C1fZf5LBXDqYPykS1u419Wo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:subject :x-enigmail-version:content-type:content-transfer-encoding; b=EZPbPuxpe6DKKyEH+DhdVj0ReXSu38dGBl+sC0jyJNp9QW1kgPMFrqzFh+sMoAEe11 okimhtvH7vtto9EHwl5cB+wzXrXQudCWMgMw5OqP5XmfESUHejGngceK9iUzE5Vte7/M 2dwpv2i26raPQ2tilvYD9FLcymXlGYZq4M3gg= Received: by 10.102.200.17 with SMTP id x17mr657326muf.125.1264805909944; Fri, 29 Jan 2010 14:58:29 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id y37sm1759033mug.8.2010.01.29.14.58.28 (version=SSLv3 cipher=RC4-MD5); Fri, 29 Jan 2010 14:58:29 -0800 (PST) Sender: Alexander Motin Message-ID: <4B636812.8060403@FreeBSD.org> Date: Sat, 30 Jan 2010 00:58:26 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: freebsd-geom@freebsd.org, freebsd-hackers@freebsd.org, FreeBSD-Current X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: Subject: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jan 2010 23:27:25 -0000 Hi. Experimenting with SATA hot-plug I've found quite repeatable deadlock case. Problem observed when several SATA devices, opened via devfs, disappear at exactly same time. In my case, at time of unplugging SATA Port Multiplier with several disks beyond it. All I have to do is to run several `dd if=/dev/adaX of=/dev/null bs=1m &` commands and unplug multiplier. That causes predictable I/O errors and devices destruction. But with high probability several dd processes getting stuck in kernel. I've discovered such pieces of problem: - CAM receives disconnect event and starts device destruction. But as device is still opened, it can't do it immediately. - dd receives I/O error and exits. - exit1() call closes all descriptors, including adaX device. It triggers final device destruction, by sending event to geom_dev. adaclose(4571fa00,4,40c16576,76,0,...) at 0x4049c521 g_disk_access(457e2200,ffffffff,0,0,0,...) at 0x4080b9a4 g_access(45643d80,ffffffff,0,0,2000,...) at 0x40810ccb g_dev_close(45766500,1,2000,4569fd80,4569fd80,...) at 0x4080a425 devfs_close(7b604aa8,80000,457f8000,80000,7b604acc,...) at 0x407f2762 VOP_CLOSE_APV(40d03180,7b604aa8,40c2e681,128,0,...) at 0x40b6da55 vn_close(457f8000,1,45624300,4569fd80,451271e0,...) at 0x40912750 vn_closefile(4566da48,4569fd80,4566da48,0,7b604b58,...) at 0x40912854 devfs_close_f(4566da48,4569fd80,3,0,4566da48,...) at 0x407f235b _fdrop(4566da48,4569fd80,7b604b8c,408b5cec,0,4569fe24,40eb23a8,40d10460,40c1a8bb,4560672c,721,40c1a8b2,7b604bb4,40878220,4560672c,8,40c1a8b2,721) at 0x40836da3 closef(4566da48,4569fd80,721,71e,4569fe24,...) at 0x40838ad0 fdfree(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x408394da exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844423 sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd syscall(7b604d38) at 0x40b565c0 - GEOM event thread tries to destroy /dev/adaX device (which should be already free at this moment), but for some reason freezes, waiting for device to be freed: 0 2 0 0 -8 0 0 8 devdrn DL ?? 0:02.89 [g_event] - as GEOM event is still not handled, exit1() waits for it: kdb_backtrace(40c16bc4,0,40c16ab1,56,4540e640,...) at 0x408a2909 g_waitidle(4569fd80,0,40c1b1a9,107,7b604c80,...) at 0x4080cd1f exit1(4569fd80,100,7b604d2c,40b565c0,4569fd80,...) at 0x40844431 sys_exit(4569fd80,7b604cf8,40c59d34,40c26be4,4569d2a8,...) at 0x408450fd syscall(7b604d38) at 0x40b565c0 - system stationary. GEOM frozen. No way to get out of this, except pushing reset. 0 1065 1055 0 44 0 5344 3040 g_wait DE 0 0:00.43 dd if=/dev/ada1 of=/dev/null bs=1m 0 1066 1055 0 44 0 5344 3040 GEOM t DE 0 0:00.07 dd if=/dev/ada2 of=/dev/null bs=1m So, does anybody have good idea why destroy_dev() can't complete? -- Alexander Motin From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 11:08:03 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E20AB106566C for ; Sat, 30 Jan 2010 11:08:03 +0000 (UTC) (envelope-from nvass9573@gmx.com) Received: from mailout-eu.gmx.com (mailout-eu.gmx.com [213.165.64.42]) by mx1.freebsd.org (Postfix) with SMTP id 4D3E38FC1E for ; Sat, 30 Jan 2010 11:08:03 +0000 (UTC) Received: (qmail invoked by alias); 30 Jan 2010 11:08:01 -0000 Received: from unknown (EHLO [192.168.73.195]) [79.107.188.140] by mail.gmx.com (mp-eu002) with SMTP; 30 Jan 2010 12:08:01 +0100 X-Authenticated: #46156728 X-Provags-ID: V01U2FsdGVkX18Tihh55B4TQpJh1LjbvTKeN5cyvSmCRpPnVuBLpq ItlU6ITFD6O1+l Message-ID: <4B641302.7040301@gmx.com> Date: Sat, 30 Jan 2010 13:07:46 +0200 From: Nikos Vassiliadis User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: Miroslav Lachman <000.fbsd@quip.cz> References: <4B5CD0AC.7090302@quip.cz> In-Reply-To: <4B5CD0AC.7090302@quip.cz> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.53000000000000003 Cc: freebsd-rc@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: ordering problem with gjournal and iSCSI initiator X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 11:08:04 -0000 On 1/25/2010 12:58 AM, Miroslav Lachman wrote: > Hi, > > I don't know what mailing list is better for this problem, so I post to > both. > > I have server with local partition using gjournal (mfid0s2f.journal). > It is mounted from fstab by this line: > > /dev/mfid0s2f.journal /vol0 ufs rw,async,nosuid,noexec,noatime 2 2 > > Then I have storage connected by iSCSI initiator by attached rc script. > It should be mounted by this script according to fstab.iscsi line: > > /dev/da0p1.journal /vol1 ufs rw,async,nosuid,noexec,noatime 2 2 > > But there is a problem, both partitions are journaled and journal is on > local disk. > > It means that only the first partition is checked and mounted at boot time. > > GEOM_JOURNAL: Journal 2395012627: mfid0s2d contains journal. > GEOM_JOURNAL: Journal 1544711416: mfid0s2e contains journal. > GEOM_JOURNAL: Journal 2395012627: mfid0s2f contains data. > GEOM_JOURNAL: Journal mfid0s2f clean. > GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s2d. > GEOM_JOURNAL: BIO_FLUSH not supported by mfid0s2f. > Root mount waiting for: GJOURNAL > Root mount waiting for: GJOURNAL > Root mount waiting for: GJOURNAL > Root mount waiting for: GJOURNAL > Root mount waiting for: GJOURNAL > GEOM_JOURNAL: Timeout. Journal gjournal 1544711416 cannot be completed. > > This is because second data provider is not available at this time. > rc.d/iscsi is executed later. (after NETWORKING, SERVERS, before DAEMON) > > But da0p1.journal is not created later (after iscsi created /dev/da0) > with this message: > GEOM_JOURNAL: Journal 1544711416: da0p1 contains data. > GEOM_JOURNAL: Timeout. Journal gjournal 1544711416 cannot be completed. > > And that's why /vol1 can't be mounted by rc.d/iscsi. > > My question is: Is there any "Right Way" to handle this case? > > How can I "create" da0p1.journal later and complete the gjournal + mount? gjournal should have both data and journal providers available to create the .journal consumer. You can force a geom taste, so gjournal will check again the devices, if you open the device for writing. A trick suggested by pjd@ is doing a simple: true > /dev/$data_device true > /dev/$journal_device This will create the .journal and you'll be able to mount the filesystem living on it. So, modify the rc script accordingly to do the operation automatically. HTH, Nikos From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 11:28:02 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7846106566C; Sat, 30 Jan 2010 11:28:02 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 3A4E08FC14; Sat, 30 Jan 2010 11:28:00 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id B76FE45685; Sat, 30 Jan 2010 12:27:58 +0100 (CET) Received: from localhost (chello089077043238.chello.pl [89.77.43.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 1197145683; Sat, 30 Jan 2010 12:27:52 +0100 (CET) Date: Sat, 30 Jan 2010 12:27:49 +0100 From: Pawel Jakub Dawidek To: Alexander Motin Message-ID: <20100130112749.GA1660@garage.freebsd.pl> References: <4B636812.8060403@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vtzGhvizbBRQ85DL" Content-Disposition: inline In-Reply-To: <4B636812.8060403@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , kib@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 11:28:03 -0000 --vtzGhvizbBRQ85DL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > Hi. >=20 > Experimenting with SATA hot-plug I've found quite repeatable deadlock > case. Problem observed when several SATA devices, opened via devfs, > disappear at exactly same time. In my case, at time of unplugging SATA > Port Multiplier with several disks beyond it. All I have to do is to run > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug > multiplier. That causes predictable I/O errors and devices destruction. > But with high probability several dd processes getting stuck in kernel. [...] I observed the same thing yesterday while stress-testing HAST: 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, which is already held by the g_event thread. Interesting backtraces: db> bt 2 [...] _sleep(85b1bc68,8079aab8,4c,80711ab3,64,...) at _sleep+0x339 destroy_devl(5,0,80711c53,85b1bcb0,804945cd,...) at destroy_devl+0x20f destroy_dev(86a10a00,8070ea93,86a09800,860888e0,0,...) at destroy_dev+0x2f g_dev_orphan(86a09800,8070f424,871038d8,90,6,...) at g_dev_orphan+0x6d g_run_events(8079a378,0,4c,8070c221,64,...) at g_run_events+0x1c0 g_event_procbody(0,85b1bd38,80713228,343,85d0b7f8,...) at g_event_procbody+= 0x8a [...] db> bt 3658 [...] sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63 _sx_xlock_hard(8079a348,86974240,0,8070ea66,c8,...) at _sx_xlock_hard+0x496 _sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0 g_dev_close(85f8ee00,4003,2000,86974240,86974240,...) at g_dev_close+0xbd devfs_close(dc49eaac,80745707,80000,80000,868be984,...) at devfs_close+0x2b2 VOP_CLOSE_APV(80753ac0,dc49eaac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5 vn_close(868be984,4003,85fd5500,86974240,0,...) at vn_close+0x190 vn_closefile(86a20968,86974240,86a20968,0,dc49eb5c,...) at vn_closefile+0xe4 devfs_close_f(86a20968,86974240,0,0,86a20968,...) at devfs_close_f+0x2b _fdrop(86a20968,86974240,14,80719d1a,0,dc49eb98,1,86975000,8635c22c,8635c22= c,721,8071264b,dc49ebb8,804f87d0,8635c22c,8,8071264b,721) at _fdrop+0x43 closef(86a20968,86974240,721,71e,869742e4,...) at closef+0x290 fdfree(86974240,0,80712fdd,107,864c4330,...) at fdfree+0x3ea exit1(86974240,0,dc49ed2c,806d830a,86974240,...) at exit1+0x513 sys_exit(86974240,dc49ecf8,86974240,dc49ed2c,202,...) at sys_exit+0x1d [...] db> bt 3659 [...] sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63 _sx_xlock_hard(8079a348,863e06c0,0,8070ea66,c8,...) at _sx_xlock_hard+0x496 _sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0 g_dev_close(86a10a00,3,2000,863e06c0,863e06c0,...) at g_dev_close+0xbd devfs_close(dc4f6aac,80745707,80000,80000,86aa6c3c,...) at devfs_close+0x2b2 VOP_CLOSE_APV(80753ac0,dc4f6aac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5 vn_close(86aa6c3c,3,870d4080,863e06c0,80cbac08,...) at vn_close+0x190 vn_closefile(871028f8,863e06c0,871028f8,0,dc4f6b5c,...) at vn_closefile+0xe4 devfs_close_f(871028f8,863e06c0,0,0,871028f8,...) at devfs_close_f+0x2b _fdrop(871028f8,863e06c0,8071809c,40e,0,805354ab,8071809c,8071df19,8635d42c= ,8635d42c,721,8071264b,dc4f6bb8,804f87d0,8635d42c,8,8071264b,721) at _fdrop= +0x43 closef(871028f8,863e06c0,721,71e,863e0764,...) at closef+0x290 fdfree(863e06c0,0,80712fdd,107,86153088,...) at fdfree+0x3ea exit1(863e06c0,100,dc4f6d2c,806d830a,863e06c0,...) at exit1+0x513 sys_exit(863e06c0,dc4f6cf8,863e06c0,dc4f6d2c,202,...) at sys_exit+0x1d [...] db> show lock 0x8079a348 class: sx name: GEOM topology state: XLOCK: 0x85d0d000 (tid 100008, pid 2, "g_event") waiters: exclusive --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --vtzGhvizbBRQ85DL Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFLZBe1ForvXbEpPzQRArJDAKCIvEtVTvwLwDjFJFcK1wxfJjq/NACeIR/M lEoKsO8kDLty3lh8oeG/aHg= =/n19 -----END PGP SIGNATURE----- --vtzGhvizbBRQ85DL-- From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 11:45:01 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4FC5106566B; Sat, 30 Jan 2010 11:45:01 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id DE63A8FC16; Sat, 30 Jan 2010 11:45:00 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 0B20145E9C; Sat, 30 Jan 2010 12:44:59 +0100 (CET) Received: from localhost (chello089077043238.chello.pl [89.77.43.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id D5DA145CAC; Sat, 30 Jan 2010 12:44:53 +0100 (CET) Date: Sat, 30 Jan 2010 12:44:51 +0100 From: Pawel Jakub Dawidek To: Alexander Motin Message-ID: <20100130114451.GB1660@garage.freebsd.pl> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="EuxKj2iCbKjpUGkD" Content-Disposition: inline In-Reply-To: <20100130112749.GA1660@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , kib@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 11:45:01 -0000 --EuxKj2iCbKjpUGkD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > > Hi. > >=20 > > Experimenting with SATA hot-plug I've found quite repeatable deadlock > > case. Problem observed when several SATA devices, opened via devfs, > > disappear at exactly same time. In my case, at time of unplugging SATA > > Port Multiplier with several disks beyond it. All I have to do is to run > > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unplug > > multiplier. That causes predictable I/O errors and devices destruction. > > But with high probability several dd processes getting stuck in kernel. > [...] >=20 > I observed the same thing yesterday while stress-testing HAST: >=20 > 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd > 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd > 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] >=20 > Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, > which is already held by the g_event thread. Maybe I'll add how I understand what's going on: GEOM calls destroy_dev() while holding the topology lock. Destroy_dev() wants to destroy device, but can't because there are threads that still have it open. The threads can't close it, because to close it they need the topology lock. The deadlock is quite obvious, IMHO. I believe the problem could be solved by dropping the topology lock in g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if it is safe to drop the topology lock there. Maybe Poul-Henning could take a look. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --EuxKj2iCbKjpUGkD Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFLZBuyForvXbEpPzQRAoaLAJ9X1IIhEfBcTNHc2CYBkh4RAzc/twCgj6x0 y1PsqIMgcFnE/ILC2kevD28= =hEg0 -----END PGP SIGNATURE----- --EuxKj2iCbKjpUGkD-- From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 13:51:40 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C270E1065676; Sat, 30 Jan 2010 13:51:40 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 42BD58FC1B; Sat, 30 Jan 2010 13:51:39 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o0UDpQC3053952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 30 Jan 2010 15:51:26 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o0UDpQU4073478; Sat, 30 Jan 2010 15:51:26 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o0UDpQZu073477; Sat, 30 Jan 2010 15:51:26 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 30 Jan 2010 15:51:26 +0200 From: Kostik Belousov To: Pawel Jakub Dawidek Message-ID: <20100130135126.GV3877@deviant.kiev.zoral.com.ua> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="2HdWiV8iqzNK3pYB" Content-Disposition: inline In-Reply-To: <20100130114451.GB1660@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, Alexander Motin , FreeBSD-Current , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 13:51:41 -0000 --2HdWiV8iqzNK3pYB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: > > On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > > > Hi. > > >=20 > > > Experimenting with SATA hot-plug I've found quite repeatable deadlock > > > case. Problem observed when several SATA devices, opened via devfs, > > > disappear at exactly same time. In my case, at time of unplugging SATA > > > Port Multiplier with several disks beyond it. All I have to do is to = run > > > several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unp= lug > > > multiplier. That causes predictable I/O errors and devices destructio= n. > > > But with high probability several dd processes getting stuck in kerne= l. > > [...] > >=20 > > I observed the same thing yesterday while stress-testing HAST: > >=20 > > 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd > > 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd > > 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] > >=20 > > Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit pat= h, > > which is already held by the g_event thread. >=20 > Maybe I'll add how I understand what's going on: >=20 > GEOM calls destroy_dev() while holding the topology lock. >=20 > Destroy_dev() wants to destroy device, but can't because there are > threads that still have it open. >=20 > The threads can't close it, because to close it they need the topology > lock. >=20 > The deadlock is quite obvious, IMHO. >=20 > I believe the problem could be solved by dropping the topology lock in > g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if > it is safe to drop the topology lock there. Maybe Poul-Henning could > take a look. As I already said, if you cannot drop a lock, destroy_dev_sched() is designed to handle this. You should be careful to not allow any further activitity on the device scheduled for destruction. --2HdWiV8iqzNK3pYB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktkOV0ACgkQC3+MBN1Mb4geLQCg3v+nX9pTfbMUUpasQBDnMwnd B7EAoN5oA9K9nFfI62P4vwKRzIUyAMO7 =15Wt -----END PGP SIGNATURE----- --2HdWiV8iqzNK3pYB-- From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 18:51:33 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B4BF106568B; Sat, 30 Jan 2010 18:51:33 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.153]) by mx1.freebsd.org (Postfix) with ESMTP id DA62F8FC15; Sat, 30 Jan 2010 18:51:31 +0000 (UTC) Received: by fg-out-1718.google.com with SMTP id e21so473520fga.13 for ; Sat, 30 Jan 2010 10:51:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=dEXZ1Vj0P9wqQ2ZosrktV+LFjbF33lwlRYQktMttHto=; b=B562ljXk9MSAlrEc3CcjbqRtts1ajmsJoWmCi7LTVICuHDBZBmZ+cZJMK/s0loPN2p 3bb3eLGKfa3AI0Y5rNaD0uLyeADXwG2033Hs3CCqpXrM1n0P3AUCEfljKaI2Y8YbHERt hlY3g2RXvFxGO+3EokysMnYY2Ia9nGugm5ywk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=TOyD6AFwyTsGKverlDiI6053vcBcTXxzXBoZfm7CHWvtasDeCGJUFe9AisXO097Z0o oIieYmLjqgJc+79tNGmf7dmPfi5O3sNoX+/7O1GgM5mFPXfAUc07q56AgWbnoxbffUaa uNlDUTmS5uI7Al5T1naqFsozGZ9ugxO+hKWz4= Received: by 10.103.35.5 with SMTP id n5mr1155025muj.132.1264877490919; Sat, 30 Jan 2010 10:51:30 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id 23sm3341149mun.41.2010.01.30.10.51.29 (version=SSLv3 cipher=RC4-MD5); Sat, 30 Jan 2010 10:51:30 -0800 (PST) Sender: Alexander Motin Message-ID: <4B647FAF.4090409@FreeBSD.org> Date: Sat, 30 Jan 2010 20:51:27 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> In-Reply-To: <20100130114451.GB1660@garage.freebsd.pl> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , kib@FreeBSD.org, freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 18:51:33 -0000 Pawel Jakub Dawidek wrote: > On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: >> On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: >>> Experimenting with SATA hot-plug I've found quite repeatable deadlock >>> case. Problem observed when several SATA devices, opened via devfs, >>> disappear at exactly same time. In my case, at time of unplugging SATA >>> Port Multiplier with several disks beyond it. All I have to do is to run >>> several `dd if=/dev/adaX of=/dev/null bs=1m &` commands and unplug >>> multiplier. That causes predictable I/O errors and devices destruction. >>> But with high probability several dd processes getting stuck in kernel. >> [...] >> >> I observed the same thing yesterday while stress-testing HAST: >> >> 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd >> 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd >> 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] >> >> Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, >> which is already held by the g_event thread. > > Maybe I'll add how I understand what's going on: > > GEOM calls destroy_dev() while holding the topology lock. > > Destroy_dev() wants to destroy device, but can't because there are > threads that still have it open. > > The threads can't close it, because to close it they need the topology > lock. > > The deadlock is quite obvious, IMHO. You are right, but as it happens not every time I was interested why. After closer look I found two different scenarios. In first case application receives I/O error and closes device. On device close CAM calls disk_destroy(), which schedules device destruction. When destroy_dev() called, device already free and there is no problem, as these events are always asynchronous. In second case, application also receives I/O error, but before it is able to react, GEOM starts handling of disk_gone(), called by CAM. As result, destroy_dev() called with device still opened, and it can't ever be closed due to topology lock held. I've played a bit with destroy_dev_sched(), but locking indeed looks not to be easy. Is there some known good practice? destroy_dev_sched_cb() looks a bit more promising. -- Alexander Motin From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 19:34:08 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7A64106568D; Sat, 30 Jan 2010 19:34:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id 976E88FC12; Sat, 30 Jan 2010 19:34:07 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o0UJY32S077289 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 30 Jan 2010 21:34:03 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o0UJY2js006692; Sat, 30 Jan 2010 21:34:02 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o0UJY2pt006691; Sat, 30 Jan 2010 21:34:02 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 30 Jan 2010 21:34:02 +0200 From: Kostik Belousov To: Alexander Motin Message-ID: <20100130193402.GB3877@deviant.kiev.zoral.com.ua> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> <4B647FAF.4090409@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Vu7hzOi38yxTgbOc" Content-Disposition: inline In-Reply-To: <4B647FAF.4090409@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, FreeBSD-Current , Pawel Jakub Dawidek , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 19:34:08 -0000 --Vu7hzOi38yxTgbOc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 30, 2010 at 08:51:27PM +0200, Alexander Motin wrote: > Pawel Jakub Dawidek wrote: > > On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: > >> On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: > >>> Experimenting with SATA hot-plug I've found quite repeatable deadlock > >>> case. Problem observed when several SATA devices, opened via devfs, > >>> disappear at exactly same time. In my case, at time of unplugging SATA > >>> Port Multiplier with several disks beyond it. All I have to do is to = run > >>> several `dd if=3D/dev/adaX of=3D/dev/null bs=3D1m &` commands and unp= lug > >>> multiplier. That causes predictable I/O errors and devices destructio= n. > >>> But with high probability several dd processes getting stuck in kerne= l. > >> [...] > >> > >> I observed the same thing yesterday while stress-testing HAST: > >> > >> 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd > >> 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd > >> 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] > >> > >> Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit pa= th, > >> which is already held by the g_event thread. > >=20 > > Maybe I'll add how I understand what's going on: > >=20 > > GEOM calls destroy_dev() while holding the topology lock. > >=20 > > Destroy_dev() wants to destroy device, but can't because there are > > threads that still have it open. > >=20 > > The threads can't close it, because to close it they need the topology > > lock. > >=20 > > The deadlock is quite obvious, IMHO. >=20 > You are right, but as it happens not every time I was interested why. > After closer look I found two different scenarios. >=20 > In first case application receives I/O error and closes device. On > device close CAM calls disk_destroy(), which schedules device > destruction. When destroy_dev() called, device already free and there is > no problem, as these events are always asynchronous. >=20 > In second case, application also receives I/O error, but before it is > able to react, GEOM starts handling of disk_gone(), called by CAM. As > result, destroy_dev() called with device still opened, and it can't ever > be closed due to topology lock held. >=20 > I've played a bit with destroy_dev_sched(), but locking indeed looks not > to be easy. Is there some known good practice? destroy_dev_sched_cb() > looks a bit more promising. What do you mean by not easy locking ? destroy_dev_sched(dev) =3D=3D destroy_dev_sched_cb(dev, NULL, NULL). There is even a man page describing the interface. Main issue with destroy_dev_sched is the window between a moment when device is scheduled for destruction and thus kept in half-demolished state, and actual removal of devfs node. My exemplary case has been snp(4) before tty got rewritten, see r. 1.107 of sys/dev/snp/snp.c. No calls to destroy_dev_sched() that I placed in the src/ a kept around, that is good because corresponding subsystems got serious rewrite. --Vu7hzOi38yxTgbOc Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktkiaoACgkQC3+MBN1Mb4igcACeI1FTL2MKQZW5g92KEk1V6PJD CsEAoKaG2t3br7mDNjSSVcfGA9zA0Khp =rl8T -----END PGP SIGNATURE----- --Vu7hzOi38yxTgbOc-- From owner-freebsd-geom@FreeBSD.ORG Sat Jan 30 20:07:42 2010 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C36131065694; Sat, 30 Jan 2010 20:07:42 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from palm.hoeg.nl (mx0.hoeg.nl [IPv6:2001:7b8:613:100::211]) by mx1.freebsd.org (Postfix) with ESMTP id 64BE08FC14; Sat, 30 Jan 2010 20:07:42 +0000 (UTC) Received: by palm.hoeg.nl (Postfix, from userid 1000) id 81C851CEF1; Sat, 30 Jan 2010 21:07:41 +0100 (CET) Date: Sat, 30 Jan 2010 21:07:41 +0100 From: Ed Schouten To: Kostik Belousov Message-ID: <20100130200741.GG77705@hoeg.nl> References: <4B636812.8060403@FreeBSD.org> <20100130112749.GA1660@garage.freebsd.pl> <20100130114451.GB1660@garage.freebsd.pl> <4B647FAF.4090409@FreeBSD.org> <20100130193402.GB3877@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="213B/23bCmw+8GSd" Content-Disposition: inline In-Reply-To: <20100130193402.GB3877@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-hackers@freebsd.org, Alexander Motin , FreeBSD-Current , Pawel Jakub Dawidek , freebsd-geom@freebsd.org Subject: Re: Deadlock between GEOM and devfs device destroy and process exit. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jan 2010 20:07:42 -0000 --213B/23bCmw+8GSd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, * Kostik Belousov wrote: > My exemplary case has been snp(4) before tty got rewritten, see r. 1.107 > of sys/dev/snp/snp.c. No calls to destroy_dev_sched() that I placed in > the src/ a kept around, that is good because corresponding subsystems > got serious rewrite. The current TTY code still uses destroy_dev_sched_cb(). In a very old version of the new TTY code, close() on a pseudo-terminal master device would also end up calling destroy_dev(), which meant it blocked until the TTY was closed as well, which is obviously not what it should do. I changed the TTY code to destroy_dev_sched_cb(), which means tty_gone() doesn't block. The TTY layer later calls a callback function, so the pts driver can deallocate the softc and reclaim the unit number (pts/%d). --=20 Ed Schouten WWW: http://80386.nl/ --213B/23bCmw+8GSd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAktkkY0ACgkQ52SDGA2eCwWflQCdFWmTG3J08ANqTv7nfWwvgTqB B48An2Pi0/1RaRXOzwYoGOXgGBYinlHo =VEWu -----END PGP SIGNATURE----- --213B/23bCmw+8GSd--