Date: Fri, 13 Oct 2000 16:22:43 +0100 From: dbhague@allstor-sw.co.uk To: freebsd-scsi@FreeBSD.org, freebsd-fs@FreeBSD.org Cc: gibbs@scsiguy.com, mjacob@feral.com, Andre Albsmeier <andre.albsmeier@mchp.siemens.de>, smcintyre@allstor-sw.co.uk Subject: Re: Stressed SCSI subsystem locks up the system Message-ID: <80256977.00547AA1.00@mail.plasmon.co.uk>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
This is increasingly looking like a filesystem issue.
We have done some more testing on the 4.1 build. We believe the issues with 4.1
are different to the ones
with 3.0. Due to the age of 3.0 we have decide to forget about it and
concentrate on the 4.1 issues.
We have two 4.1 test systems; one failed with the following on the console
> dev #da/3 block=0 fs=/RAID blocks
>panic ffs_blkfree freeing free frags
>syncing disk
The system was locked in this state and was ping-able but we could not telnet
in.
The other system is still running after two days. This is the longest a test
has run.
The only difference we can see between the systems is the way the filesystem was
built.
This was built by
> dd if=/dev/zero of=/dev/rda0 count=2048
> disklabel -Brw da0 auto
> disklabel -e da0
> newfs /dev/rda0d
normally this is done in a script which effectively does
> disklabel -rw da0 auto
> dd if=/dev/zero of=/dev/rda0 count=2048
> disklabel -rw da0 auto
> "disklabel" | disklabel -rR da0
> newfs /dev/rda0d
The only real difference is the -B option in disklabel. We do not attempt to
boot from this partition so this should not matter.
Let me know what you think ?
Regards Dave
For the sake of Freebsd-fs subscribers:
For your information we are a small software company, just south of Cambridge
UK, and have a thin server that is running FreeBSD. We are developing a storage
system which controls a RAID system and automatically archives the data to
optical jukeboxes or tape libraries. Basically HSM in a box.
The system is as follows:
Single board computer, AMD K6-II,
128MB RAM,
Adaptec AHA 3940AU
Intel Ethernet Pro 100
This is attached to a RAID system although we have had the same failure with a
single SCSI drive.
I enclose the test script and the source for the main test program. tmcp.c.
The test is started with "golongmulti 7"
If you require any further information please ask, or give me a call
+44-1763-264-474.
Thanks Dave
(See attached file: tmcp.c)(See attached file: golong)(See attached file:
golongmulti)(See attached file: go)
[-- Attachment #2 --]
/* $Revision: 1.21 $ */
#include <sys/types.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <stdio.h>
#include <errno.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>
#if defined(OS2) || defined(WINNT)
#include <io.h>
#include <windows.h>
#include <direct.h>
#define strcasecmp stricmp
#define stat _stat
#define unlink _unlink
#define read _read
#define lseek _lseek
#define TIME __int64
#else
#define HAS_GETTIMEOFDAY /* Some systems may not */
#include <strings.h>
#include <unistd.h>
#include <sys/time.h>
#define TIME long long
#endif /* OS2 || WINNT */
/* define a variable that will cause main and usage to be included or otherwise
*/
#define _STAND_ALONE 0
#ifndef O_BINARY
#define O_BINARY 0
#endif
#define ULTRA_BLOCK (1024*1024)
#define N_FILES 0x4000
#define N_INTERVAL 100
#define BLK_SZ 32768
#define MAX_BLK_SZ (512*1024)
/* TEJ - changed TmcpWriteFile to use these macros - need setfilesize operation for /t (tape)
option, and _chsize didn't seem to work, so NT version now uses Win32 file I/O functions
and SetEndOfFile. Non-NT should be unchanged.
*/
#ifndef WINNT
#define F_HANDLE int
#define F_CREATE(_p, q) open((_p), O_CREAT | O_BINARY | O_WRONLY , (q))
#define F_OPEN(_p, q) open((_p), O_BINARY | O_RDWR, (q));
#define F_CREATE_FAIL_VALUE -1
#define F_CLOSE( _f) close(_f)
#define F_WRITE( _f, _b, _s) write( (_f), (_b), (_s))
#define F_READ( _f, _b, _s) read( (_f), (_b), (_s))
#define F_COMMIT( _f) fsync( _f)
#define F_ERROR() errno
#define F_SETSIZE( _f, _s) exit(1)
#define F_FAIL( _x) ( -1 == (_x))
#define F_ERROR_STR( _x) strerror(_x)
#define F_DELETE( _p) unlink(_p)
#define F_SEEK_END(_f) lseek((_f), 0, SEEK_END)
#define F_MKDIR(_f) mkdir( (_f), 0777)
#else
/* NT */
#include <windows.h>
#include <winbase.h>
unsigned long F_DUMMY;
#define F_HANDLE HANDLE
#define F_CREATE(_p, q) CreateFile( (_p), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL)
#define F_OPEN(_p, q) CreateFile( (_p), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)
#define F_CREATE_FAIL_VALUE INVALID_HANDLE_VALUE
#define F_CLOSE(_f) CloseHandle(_f)
#define F_WRITE( _f, _b, _s) WriteFile( (_f), (_b), (_s), &F_DUMMY, NULL)
#define F_READ( _f, _b, _s) ReadFile( (_f), (_b), (_s), &F_DUMMY, NULL)
#define F_COMMIT( _f) FlushFileBuffers(_f)
#define F_FAIL( _x) (0 == (_x))
#define F_ERROR() GetLastError()
#define F_DELETE( _p) DeleteFile(_p)
#define F_MKDIR(_f) _mkdir((_f))
int F_SETSIZE( HANDLE _f, unsigned long _s)
{
unsigned long nul = 0;
if (0xFFFFFFFF == SetFilePointer( _f, _s, &nul, FILE_BEGIN))
return 0;
if (!SetEndOfFile( _f))
return 0;
return 1;
}
#define F_ERROR_STR( _x) "<NOT AVAILABLE>"
#define F_SEEK_END(_f) SetFilePointer(_f, 0, &F_DUMMY, FILE_END)
#endif
char Junk[MAX_BLK_SZ];
char JunkRead[MAX_BLK_SZ];
typedef unsigned long ULONG;
void usage(char * progname)
{
printf( "Usage:\n"
"\t%s [/1] [/A] [/B:x] [/C] [/D] [/E:x] [/F:files] [/L:y] [/M:z] \n"
"\t[/N] [/O] [/P] [/Q] [/R] [/S:w] [/T] [/U] [/W] [/V] [<dir>]\n\n"
"/1 Disables file size cycling (see /M) \n"
"/A Uses names Again and Again - eg lots of 00000001.tst etc\n"
"/B:w Determines the block size of writes. The default is 32768 bytes.\n"
"/C Closes after each write.\n"
"/D Deletes files after read/write.\n"
"/E:x Specifies the permissions to use on created files.\n"
"/F:# Specifies the number of files to work with.\n"
"/L:y Determines the number of files in a directory.\n"
" 0 = 16, 1 = 256, 2 = 4096, 3 = 65536. The default is Level 0\n"
"/M:z Determines the number of blocks per file. File size cycles\n"
" between z, 2*z, and 3*z blocks. The default value for z is 1.\n"
"/N Makes strange filenames\n"
"/O Leaves files open\n"
"/P Deletes partially written files\n"
"/Q Turns on Quiet mode - don't print file names, only summaries\n"
"/R Read-after-write AND COMPARE (OK in this version)\n"
"/R /R Read-without-writing AND COMPARE (OK in this version)\n"
"/S Writes files to multiple sides of media?\n"
"/T Does SetFileSize before writing (tape) NT ONLY\n"
"/U Turns on ultra-verbose mode - print performance every 1MB of data.\n"
"/V Turns on verbose mode - print performance every file, not every %d.\n\n"
"Optional <dir> parameter is the path to generate data to\n\n"
"This program will create a tree of data at the given point\n"
"This will consist of %ld files in total.\n", progname, (int) N_INTERVAL, (ULONG) N_FILES);
exit(1);
}
int CheckDir(char *dirpath, int create)
{
struct stat sbuf;
int error = 0;
char *op = NULL;
if (stat(dirpath, &sbuf))
{
if (ENOENT == errno)
{
if (!create)
return errno;
else
if (F_MKDIR(dirpath))
{
op = "mkdir";
error = errno;
}
}
else
{
op = "stat";
error = errno;
}
}
else
{
if (!(sbuf.st_mode & S_IFDIR))
{
printf("<<Error: 'mkdir %s' fails because this file exists>>\n", dirpath);
error = EEXIST;
}
}
if (error && op)
printf("<<CheckDir(%s)...Error %d during '%s': %s>>\n",
dirpath, error, op, strerror(error));
return error;
}
#ifdef WINNT
static void get_time(TIME *newtime)
{
SYSTEMTIME sys_time;
FILETIME file_time;
GetSystemTime(&sys_time);
SystemTimeToFileTime(&sys_time, &file_time);
*newtime = ((TIME)(file_time.dwHighDateTime << 32) + file_time.dwLowDateTime);
*newtime /= (TIME)(10);
}
#else
# ifdef HAS_GETTIMEOFDAY
static void get_time(TIME *newtime)
{
struct timeval tv;
struct timezone tz;
tz.tz_minuteswest = 0;
tz.tz_dsttime = 0;
gettimeofday(&tv, &tz);
*newtime = ((TIME)tv.tv_sec * 1000000) + tv.tv_usec;
}
# else
static void get_time(TIME *newtime)
{
time_t foo;
time(&foo);
*newtime=(TIME)foo * 1000000;
}
# endif /* HAS_GETTIMEOFDAY */
#endif /* WINNT */
int GeneratePath(int side, int level, char *path_out, char *root_path_in, ULONG idx, int create, int strange_names, int common_names)
{
int i = 0, error = 0;
char *FullPath = path_out;
if (level > 7)
level = 7;
if (level < 0)
level = 0;
sprintf(path_out, "%s", root_path_in);
path_out += strlen(path_out);
error = CheckDir(FullPath, create);
if (!error && side)
{
sprintf(path_out, "/side%d", side);
error = CheckDir(FullPath, create);
path_out += strlen(path_out);
}
*path_out++ = '/';
for (i = 7; !error && i > level; i--)
{
*path_out++ = "0123456789ABCDEF"[(int) ((idx >> (4*i))) & 15];
*path_out = 0;
error = CheckDir(FullPath, create);
*path_out++ = '/';
}
if (strange_names)
{
int length;
char *p = path_out;
srand((level+1) * idx);
length = rand() % 30;
for (i = 0; i < length; i++)
*(p++) = "01234567890abcdfeghijklmnopqrstuvwxyz_."[rand() % 37];
p += sprintf(p, "_%08lx_", idx);
length = rand() % 40;
for (i = 0; i < length; i++)
*(p++) = "01234567890abcdfeghijklmnopqrstuvwxyz_."[rand() % 37];
*(p++) = 0;
}
else if (common_names)
{
sprintf(path_out, "%08lx.tst", idx % (16 ^ (1+level)));
}
else
sprintf(path_out, "%08lx.tst", idx);
if (error)
printf("GeneratePath(%s, %ld) returns error %d", root_path_in, idx, error);
return error;
}
int TmcpCreateFile(char *path, int mode, char *buf, long blklen, int n_blks, int leave_open,
int ultra_verbose, int close_after_each_write, int commit_writes,
int delete_partially_written, int set_file_size_before_write)
{
F_HANDLE fnum;
int error = 0, lines = 0;
TIME basetime, dt=0, dt2, lastbase;
long written = 0;
fnum = F_CREATE( path, mode);
get_time(&basetime);
lastbase = (TIME)basetime;
if (F_CREATE_FAIL_VALUE == fnum)
{
printf("\rCould not create '%s'. Error(%d)\n", path, F_ERROR());
error = F_ERROR();
}
else
{
if (set_file_size_before_write)
{
#ifdef WINNT
if (F_FAIL( F_SETSIZE( fnum, n_blks*blklen)))
{
printf("SetSize( %d) failed for %s (e %d)\n", blklen*n_blks, path, F_ERROR());
error = F_ERROR();
if (F_FAIL(F_CLOSE(fnum)))
{
printf("\rCould not close file '%s'. Error(%d)\n", path, F_ERROR());
error = F_ERROR();
}
}
#else
printf("WARNING: /T not supported non-NT\n");
#endif
}
while (n_blks-- && !error)
{
if (F_FAIL(F_WRITE(fnum, buf, blklen)))
{
printf("\rCould not write '%s'. Error(%d): %s\n", path, errno, strerror(errno));
error = F_ERROR();
if (delete_partially_written)
{
if (F_FAIL(F_COMMIT(fnum)))
{
printf("\rCould not commit close partially written file '%s'. Error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(errno));
error = F_ERROR();
}
if (F_FAIL(F_CLOSE(fnum)))
{
printf("\rCould not close partially written file '%s'. Error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(errno));
error = F_ERROR();
}
fnum = F_CREATE_FAIL_VALUE; /* set to -1 to prevent trying to commit and close again, etc... */
if (F_FAIL(F_DELETE(path)))
{
error = F_ERROR();
printf("\rCould not delete partially written file '%s'. error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(errno));
}
else
printf("Deleted partially written file '%s'.\n",path);
}
}
else
{
if (commit_writes>1)
{
if (F_FAIL( F_COMMIT(fnum)))
{
printf("\rCould not commit write '%s'. Error(%d): %s\n", path, F_ERROR(), F_ERROR_STR(F_ERROR()));
error = F_ERROR();
}
else
{
if (ultra_verbose)
printf(" CommitOK:");
}
}
written += blklen;
if (ultra_verbose)
{
if (written >= ULTRA_BLOCK*ultra_verbose)
{
get_time(&dt);
dt2 = (TIME)dt;
dt-=(TIME)basetime;
dt2-=(TIME)lastbase;
if (dt2)
{
if (!lines)
printf("\n");
printf("[%04lld Kb/s] ", (written * 1000000 / 1024) / (TIME)dt2);
get_time(&lastbase);
written = 0;
lines = (lines+1) %4;
}
}
}
if (close_after_each_write)
{
if (F_FAIL(F_CLOSE( fnum)))
{
printf("\rCould not close '%s'. Error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(F_ERROR()));
error = F_ERROR();
}
else
{
fnum = F_OPEN(path, mode);
if (F_CREATE_FAIL_VALUE == fnum)
{
printf("\rCould not create '%s'. Error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(F_ERROR()));
error = F_ERROR();
}
else
{
if (F_FAIL(F_SEEK_END(fnum)))
{
printf("\rCould not seek to end on %s. Error(%d): %s\n",
path, F_ERROR(), F_ERROR_STR(F_ERROR()));
error = F_ERROR();
}
}
}
}
}
}
}
if ((fnum != F_CREATE_FAIL_VALUE) && commit_writes && (F_FAIL(F_COMMIT(fnum))))
{
printf("\rCould not commit close '%s'. Error(%d): %s\n", path, F_ERROR(), F_ERROR_STR(F_ERROR()));
error = F_ERROR();
}
if ((fnum != F_CREATE_FAIL_VALUE) && !leave_open)
{
if (F_FAIL(F_CLOSE(fnum)))
{
error = F_ERROR();
printf("\rCould not close '%s'. Error(%d): %s\n", path, error, F_ERROR_STR(error));
}
}
if (ultra_verbose && dt)
printf("\n");
return error;
}
int TmcpReadFile(char *path, char *buf, long blklen, int n_blks, int leave_open, int ultra_verbose, int close_after_each_read)
{
F_HANDLE fnum = 0;
int error = 0;
TIME basetime, dt=0, dt2, lastbase;
long read1 = 0, lines = 0;
get_time(&basetime);
lastbase = (TIME)basetime;
fnum = F_OPEN(path, S_IREAD | S_IWRITE);
if ((F_HANDLE)-1 == fnum)
{
printf("\rCould not open '%s' for reading. Error(%d): %s\n", path, errno, strerror(errno));
error = errno;
}
else
{
while (n_blks-- && !error)
{
if (-1 == F_READ(fnum, buf, blklen))
{
printf("\rCould not read '%s'. Error(%d): %s\n", path, errno, strerror(errno));
error = errno;
}
else
{
read1 += blklen;
if (ultra_verbose)
{
if (read1 >= ULTRA_BLOCK*ultra_verbose)
{
get_time(&dt);
dt2 = (TIME)dt;
dt-=(TIME)basetime;
dt2-=(TIME)lastbase;
if (dt2)
{
if (!lines)
printf("\n");
printf("[%04lld Kb/s] ", (read1 * 1000000 / 1024) / (TIME)dt2);
get_time(&lastbase);
read1 = 0;
lines = (lines+1) %4;
}
}
}
}
}
}
if (fnum != (F_HANDLE)-1 && !leave_open)
{
if (F_CLOSE(fnum))
{
printf("\rCould not close '%s' after reading. Error(%d): %s\n", path, errno, strerror(errno));
error = errno;
}
}
if (ultra_verbose && dt)
printf("\n");
return error;
}
int TmcpDeleteFile(char *path)
{
int error = 0;
if (-1 == unlink(path))
{
error = errno;
printf("\rCould not delete file '%s'. Error(%d): %s\n", path, errno, strerror(errno));
}
return error;
}
int main(int argc, char *argv[])
{
char *RootPath;
char szFPath[512];
ULONG count, files = N_FILES, KB, thisKB;
TIME basetime = 0, lastbase, dt, dt2;
int i, error = 0;
long blksz = BLK_SZ;
int level = 0, mult = 1;
long thisfile;
#ifndef WINNT
int mode = (S_IREAD | S_IWRITE);
#else
int mode = 0;
#endif
int quiet = 0;
int sides = 0, read1 = 0, delete = 0, no_cycling_sizes = 0, commit_writes = 0, common_names = 0;
int verbose = 0, leave_open = 0, ultra_verbose = 0, close_after_each_write = 0;
int delete_partially_written = 0, strange_names = 0;
int set_file_size_before_write = 0;
memset(Junk, 0x55, sizeof(Junk));
setvbuf(stdout, NULL, _IONBF, 0);
#ifndef WINNT
umask(0);
#endif
if (argc > 1 && (!strcasecmp(argv[1], "/h") || !strcasecmp(argv[1], "/help")))
usage("tmcp");
while (argc > 1 && (argv[1][0] == '/' || argv[1][0] == '-'))
{
switch (argv[1][1])
{
case '1':
no_cycling_sizes++;
break;
case 'a':
case 'A':
common_names++;
strange_names = 0;
break;
case 'b':
case 'B':
blksz = atol(argv[1]+3);
break;
case 'c':
case 'C':
close_after_each_write = 1;
break;
case 'd':
case 'D':
delete++;
break;
case 'e':
case 'E':
mode = strtol(argv[1]+3, (char **)NULL, 8);
break;
case 'f':
case 'F':
files = (ULONG)atol(argv[1]+3);
break;
case 'l':
case 'L':
level = atoi(argv[1]+3);
printf("Level = %d\n", level);
break;
case 'm':
case 'M':
mult = atoi(argv[1]+3);
break;
case 'n':
case 'N':
strange_names++;
common_names = 0;
break;
case 'o':
case 'O':
leave_open++;
break;
case 'P':
case 'p':
delete_partially_written++;
break;
case 'q':
case 'Q':
quiet++;
break;
case 'r':
case 'R':
read1++; /* 0 = no read, 1 = read & write, 2 = read only */
break;
case 's':
case 'S':
sides = atoi(argv[1]+3);
break;
case 'T':
case 't':
set_file_size_before_write = 1;
break;
case 'u':
case 'U':
ultra_verbose++;
break;
case 'v':
case 'V':
verbose++;
break;
case 'W':
case 'w':
commit_writes++;
break;
default:
usage("tmcp");
}
argc--; argv++;
}
if (argc > 2)
usage("tmcp");
if (blksz > MAX_BLK_SZ)
blksz = MAX_BLK_SZ;
for (i = 0; i < blksz; i++)
Junk[i] = i & 0xFF;
RootPath = (argc == 1) ? "." : argv[1];
printf("TMCP %s to %s.. level = %d, blocksize = %ld #Blocks/file = %d .. %d\n",
read1 ? (read1 == 1 ? "Write&Read" : "ReadOnly") : "WriteOnly", RootPath, level, blksz, mult,
no_cycling_sizes ? mult : 3*mult);
get_time(&basetime);
lastbase = (TIME)basetime;
KB = thisKB = 0;
for (count = 0; !error && count < files; count++)
{
error = GeneratePath( sides ? 1+(count % sides) : 0,
level, szFPath, RootPath, count, read1 < 2, strange_names, common_names);
if(!quiet)
printf("File #%08ld -> %s \r", count, szFPath);
if (no_cycling_sizes)
thisfile = mult;
else
thisfile = mult * ((count %3) +1);
if (read1 < 2)
{
if (!error)
error = TmcpCreateFile(szFPath, mode, Junk, blksz, thisfile, leave_open,
ultra_verbose, close_after_each_write, commit_writes,
delete_partially_written, set_file_size_before_write);
}
if (read1 > 0)
{
if (!error)
{
if (ultra_verbose)
printf("Read...\n");
error = TmcpReadFile(szFPath, JunkRead, blksz, thisfile, leave_open, ultra_verbose,
close_after_each_write);
}
if (!error && memcmp(Junk, JunkRead, blksz))
printf("\n - MISCOMPARE in %s\n", szFPath);
}
if (delete > 0)
{
if (!error)
error = TmcpDeleteFile(szFPath);
}
if (!error)
{
thisKB += (blksz * thisfile) / 1024 ;
KB += (blksz * thisfile) / 1024 ;
}
if (!error && (count == (files-1) || verbose || 0 == (count % N_INTERVAL)))
{
int min,sec;
get_time(&dt);
dt2 = dt;
dt-=(TIME)basetime;
dt2-=(TIME)lastbase;
min = (int)(dt / 60000000l);
sec = (int)((dt / 1000000) %60) ;
printf("\r%04dm%2.2ds:", min, sec);
printf("Files: %-7ld ", count);
printf("Kbytes: %-7ld ", KB);
if(dt)
printf("KB/s: %-6lld ", (TIME)KB*1000000/dt);
else
printf("KB/s: XXXXXX ");
if(dt2)
printf("This KB/s: %7lld\n", (TIME)thisKB*1000000/dt2);
else
printf("This KB/s: XXXXXXX\n");
thisKB = 0;
get_time(&lastbase);
}
}
if (error)
printf("Test failed with error code %d\n", error);
return error;
}
[-- Attachment #3 --]
#!/bin/sh
#
# golong - call the go script
#
NUM=$1
LOG=/go$1.log
date >> $LOG
echo About to start golong >> $LOG
while
true
do
/go $NUM
done
[-- Attachment #4 --]
#!/bin/sh
#
# golongmulti - call the golong scripts in parallel
#
NUM=$1
LOG=/go$1.log
while [ $NUM -gt 0 ]
do
echo Starting golong $NUM
/golong $NUM &
NUM=$(( $NUM - 1))
done
[-- Attachment #5 --]
#!/bin/sh
#
# go
#
NUM=$1
LOG=/go$1.log
if [ ! -d /RAID ] ; then
echo No /RAID
fi
mkdir /RAID/$NUM
cd /RAID/$NUM
date >> $LOG
echo About to tmcp >> $LOG
tmcp /f:90000 /m:1 /b:1024 /1 >/dev/null
date >> $LOG
echo About to delete >> $LOG
cd /
rm -rf /RAID/$NUM
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?80256977.00547AA1.00>
