From owner-freebsd-scsi@FreeBSD.ORG Tue Jul 29 20:43:57 2014 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36C5F5EE for ; Tue, 29 Jul 2014 20:43:57 +0000 (UTC) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CE34522AA for ; Tue, 29 Jul 2014 20:43:56 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.9/8.14.2) with ESMTP id s6TKhsxm078783; Tue, 29 Jul 2014 14:43:54 -0600 (MDT) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.9/8.14.9/Submit) id s6TKhsuH078782; Tue, 29 Jul 2014 14:43:54 -0600 (MDT) (envelope-from ken) Date: Tue, 29 Jul 2014 14:43:54 -0600 From: "Kenneth D. Merry" To: Joerg Wunsch , freebsd-scsi@freebsd.org, Martin Simmons Subject: Re: Bacula fails on FreeBSD 10.x / "mt fsf" infinitely proceeds Message-ID: <20140729204354.GA78616@nargothrond.kdm.org> References: <20140729090724.GA26577@uriah.heep.sax.de> <201407291823.s6TINAad032318@higson.cam.lispworks.com> <20140729191829.GK3121@uriah.heep.sax.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140729191829.GK3121@uriah.heep.sax.de> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jul 2014 20:43:57 -0000 On Tue, Jul 29, 2014 at 21:18:29 +0200, Joerg Wunsch wrote: > As Martin Simmons wrote: > > > Maybe you are now connecting the tape drive via a different SCSI > > driver? > > No, I forgot to say: the tape drive/library is a Sun L9 which has > HV-Diff-SCSI, so I have to use the exact same Symbios Logic SCSI > controller (and driver) as before. > > > It sounds like you are running Bacula with "Fast Forward Space File > > = yes" in the configuration. > > Yes, that's the case. However, even without that, I'm afraid the > Bacula logic would run into an infinite loop, as a single FSF > operation now always succeeds, and pretends it encountered a new tape > file. (Besides, the "Fast Forward Space File" thing did work for many > years.) > > Looking into saspace() in sys/cam/scsi/scsi_sa.c, I see: > > ==================================================================== > } else if (code == SS_FILEMARKS && softc->fileno != (daddr_t) -1) { > softc->fileno += (count - softc->last_ctl_resid); > if (softc->fileno < 0) /* we must of hit BOT */ > softc->fileno = 0; > softc->blkno = 0; > ==================================================================== > > That piece of code ought to be responsible when the SPACE command hit > a filemark. It hasn't been changed for more than a decade though. > > Now the following SVN log message rang a bell to me: > > ==================================================================== > r225950 | ken | 2011-10-03 22:32:55 +0200 (Mo, 03. Okt 2011) | 146 Zeilen > > Add descriptor sense support to CAM, and honor sense residuals properly in > CAM. > ==================================================================== > > It went in after my older (working) 8.2 system, it talks about > residual handling, and the code above uses "softc->last_ctl_resid". > > It wouldn't surprise me if that's somehow related to the issue. Yes, it could be related. The descriptor sense changes abstracted out sense data handling so that fixed and descriptor sense would be handled in the same way. The residual got bumped up from 32 to 64 bits to accommodate the increased size of the descriptor sense fields. In theory the values should be equivalent, but it is possible that there is breakage. Can you put a printf in the above code snippet, and print out the count, fileno, and last_ctl_resid before fileno is set? That might tell us something. The original code in saerror did this with the residual: info = (int32_t) scsi_4btoul(sense->info); resid = info; resid was then assigned to last_ctl_resid. Everything was a 32 bit value; info was int32_t and resid was uint32_t. The new code (in scsi_get_sense_info() in scsi_all.c) effectively does: uint32_t info_val; info_val = scsi_4btoul(sense->info); *info = info_val; if (signed_info != NULL) *signed_info = (int32_t)info_val; info and signed_info are uint64_t and int64_t, respectively. The info value is what makes it into last_ctl_resid. Another possibility here is that the driver is setting the sense residual incorrectly. If that happens, then we would think that the info field isn't present in the sense, and would report the entire transfer length as the residual. (For a space command, I don't think there would be a transfer.) The sym(4) driver does set the sense residual, but I'd have to dig into it a little more to figure out whether it is doing the right thing. Hopefully a few printfs will give us a better idea of what is going on. If the printf in saspace() doesn't show anything suspicious, the next place to look would be at the sense_len in saerror(). > I'm Cc'ing Ken (as the committer of 225950) for an opinion, just in > case he doesn't follow the list so closely. Ken -- Kenneth Merry ken@FreeBSD.ORG