Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Feb 2003 02:47:27 -0800
From:      Sean Chittenden <seanc@FreeBSD.org>
To:        audit@FreeBSD.org
Subject:   Making random(6) actually useful...
Message-ID:  <20030209104727.GO15936@perrin.int.nxad.com>

next in thread | raw e-mail | index | archive | help

--9wgEzdE/cOoYWTWe
Content-Type: multipart/mixed; boundary="FIfSo9pyi3Jhph90"
Content-Disposition: inline


--FIfSo9pyi3Jhph90
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

There isn't much useful functionality that I'm aware of that comes out
of random(6) as its implemented currently.  What I was originally out
to do was to randomize the search order of dist sites for
ports/packages and didn't find anything that did what I wanted and was
BSD licensed (or small).  The new random(6) functionality allows for:

1) randomizing the order of downloads from a mirrors list.  For
   example, in a Makefile:

DOWNLOAD_SITES=3D `echo -n "${MASTER_SITE_RUBY} "| random -wf -`
DOWNLOAD_SITES+=3D MASTER_SITE_LOCAL

2) randomizing playlists for MP3s:

   find ~/mp3s -name '*.mp3' | random -f - > play_list.m3u

3) being WARNS 5 compliant

The biggest use though is for the ports downloads.  If there aren't
any objections to this, I'd like approval to commit it that way it can
possibly MFC'ed into 4.8 before the freeze and used to help distribute
the load of download sites more evenly.  This hasn't been optimized
for large files, but it does work quite well for anything under 50K
lines/words.  I've tested this locally with the following script:

#BEGIN ruby
for file in `locate '.'|egrep -i '\.(c|txt|sgml|h|conf|[1-9]|rb)$'`
  file.chomp!
  orig =3D `cat #{file}|sort|md5`.chomp
  new  =3D `random -f #{file}|sort|md5`.chomp
  raise "#{file} different!" if orig !=3D new
end
#END

The biggest bug with this right now is performance in that I haven't
done anything special for handling large files.  If it's a problem, I
can do it and have a few ways of making it near O(N), but I don't have
the energy or desire to do so unless it's needed.

Comments/suggestions?

-sc

--=20
Sean Chittenden

--FIfSo9pyi3Jhph90
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Content-Transfer-Encoding: quoted-printable

Index: Makefile
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/games/random/Makefile,v
retrieving revision 1.3
diff -u -r1.3 Makefile
--- Makefile	26 Mar 2001 14:20:59 -0000	1.3
+++ Makefile	9 Feb 2003 10:41:42 -0000
@@ -3,5 +3,7 @@
=20
 PROG=3D	random
 MAN=3D	random.6
+SRCS=3D	random.c randomize_fd.c
+WARNS=3D	5
=20
 .include <bsd.prog.mk>
Index: random.6
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/games/random/random.6,v
retrieving revision 1.5
diff -u -r1.5 random.6
--- random.6	10 Jul 2001 10:32:00 -0000	1.5
+++ random.6	9 Feb 2003 10:41:42 -0000
@@ -10,7 +10,7 @@
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\" 3. All advertising materials mentioning features or use of this softwa=
re
-.\"    must display the following acknowledgement:
+.\"    must display the following acknowledgment:
 .\"	This product includes software developed by the University of
 .\"	California, Berkeley and its contributors.
 .\" 4. Neither the name of the University nor the names of its contributors
@@ -32,7 +32,7 @@
 .\"     @(#)random.6	8.2 (Berkeley) 3/31/94
 .\" $FreeBSD: src/games/random/random.6,v 1.5 2001/07/10 10:32:00 ru Exp $
 .\"
-.Dd March 31, 1994
+.Dd February 8, 2003
 .Dt RANDOM 6
 .Os
 .Sh NAME
@@ -41,14 +41,31 @@
 .Sh SYNOPSIS
 .Nm
 .Op Fl er
+.Op Fl f Ar filename
 .Op Ar denominator
 .Sh DESCRIPTION
 .Nm Random
-reads lines from the standard input and copies them to the standard
-output with a probability of 1/denominator.
-The default value for
+has two distinct modes of operations.  The default is to read in lines
+from stdin and randomly write them out to stdout with a probability of
+1 /
+.Ar denominator .
+The default
 .Ar denominator
-is 2.
+for this mode of operation is 2, giving each line a 50/50 chance of
+being displayed.
+.Pp
+The second mode of operation is to read in a file from
+.Ar filename
+and randomize the contents of the file and send it back out to stdout.
+The contents can be randomized based off of newlines or based off of
+space characters as determined by
+.Xr isspace 3 .
+The default
+.Ar denominator
+for this mode of operation is 1, which gives each line a chance to be
+displayed, but in a
+.Xr random 3
+order.
 .Pp
 The options are as follows:
 .Bl -tag -width Ds
@@ -61,10 +78,40 @@
 exit value of 0 to
 .Ar denominator
 \&- 1, inclusive.
+.It Fl f Ar filename
+The
+.Fl f
+option is used to specify the
+.Ar filename
+to read from.  stdin is used if the filename is set to "-".
+.It Fl l
+Randomize the input via newlines (the default).
 .It Fl r
 The
 .Fl r
 option guarantees that the output is unbuffered.
+.It Fl u
+Tells
+.Xr random 6
+not to select the same line or word from a file more than once (the
+default).  This does not guarantee uniqueness if there are two of the
+same tokens from the input, but it does prevent selecting the same
+token more than once.
+.It Fl U
+Tells
+.Xr random 6
+that it is okay for it to reuse any given line or word when creating a
+randomized output.
+.It Fl w
+Randomize words separated by
+.Xr isspace 3
+instead of newlines.
 .El
 .Sh SEE ALSO
-.Xr fortune 6
+.Xr fortune 6 ,
+.Xr random 3
+.Sh BUGS
+There is no index used when printing out tokens from the list which
+makes rather slow for large files (10MB+).  If this were used in
+performance sensitive areas, I'd do something about it.  For smaller
+files, however, it should still be quite fast and efficient.
Index: random.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /home/ncvs/src/games/random/random.c,v
retrieving revision 1.12
diff -u -r1.12 random.c
--- random.c	18 Feb 2002 05:15:18 -0000	1.12
+++ random.c	9 Feb 2003 10:41:42 -0000
@@ -52,13 +52,16 @@
=20
 #include <err.h>
 #include <errno.h>
+#include <fcntl.h>
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <time.h>
 #include <unistd.h>
+#include "randomize_fd.h"
=20
 void usage(void);
+int main(int argc, char**argv);
=20
 int
 main(argc, argv)
@@ -66,19 +69,46 @@
 	char *argv[];
 {
 	double denom;
-	int ch, random_exit, selected, unbuffer_output;
-	char *ep;
+	int ch, fd, random_exit, randomize_lines, random_type, ret,
+		selected, unique_output, unbuffer_output;
+	char *ep, *filename;
=20
-	random_exit =3D unbuffer_output =3D 0;
 	denom =3D 0;
-	while ((ch =3D getopt(argc, argv, "er")) !=3D -1)
+	filename =3D NULL;
+	random_type =3D RANDOM_TYPE_UNSET;
+	random_exit =3D randomize_lines =3D random_type =3D unbuffer_output =3D 0;
+	unique_output =3D 1;
+	while ((ch =3D getopt(argc, argv, "ef:lruUw")) !=3D -1)
 		switch (ch) {
 		case 'e':
 			random_exit =3D 1;
 			break;
+		case 'f':
+			randomize_lines =3D 1;
+			if (!strcmp(optarg, "-"))
+				filename =3D strdup("/dev/fd/0");
+			else
+				filename =3D optarg;
+			break;
+		case 'l':
+			randomize_lines =3D 1;
+			random_type =3D RANDOM_TYPE_LINES;
+			break;
 		case 'r':
 			unbuffer_output =3D 1;
 			break;
+		case 'u':
+			randomize_lines =3D 1;
+			unique_output =3D 1;
+			break;
+		case 'U':
+			randomize_lines =3D 1;
+			unique_output =3D 0;
+			break;
+		case 'w':
+			randomize_lines =3D 1;
+			random_type =3D RANDOM_TYPE_WORDS;
+			break;
 		default:
 		case '?':
 			usage();
@@ -90,7 +120,7 @@
=20
 	switch (argc) {
 	case 0:
-		denom =3D 2;
+		denom =3D (randomize_lines ? 1 : 2);
 		break;
 	case 1:
 		errno =3D 0;
@@ -109,16 +139,28 @@
=20
 	srandomdev();
=20
-	/* Compute a random exit status between 0 and denom - 1. */
-	if (random_exit)
-		return ((denom * random()) / LONG_MAX);
-
 	/*
 	 * Act as a filter, randomly choosing lines of the standard input
 	 * to write to the standard output.
 	 */
 	if (unbuffer_output)
 		setbuf(stdout, NULL);
+
+	/*
+	 * Act as a filter, randomizing lines read in from a given file
+	 * descriptor and write the output to standard output.
+	 */
+	if (randomize_lines) {
+		if ((fd =3D open(filename, O_RDONLY, 0)) < 0)
+			err(1, "%s", optarg);
+		ret =3D randomize_fd(fd, random_type, unique_output, denom);
+		if (!random_exit)
+			return(ret);
+	}
+
+	/* Compute a random exit status between 0 and denom - 1. */
+	if (random_exit)
+		return ((denom * random()) / LONG_MAX);
=20
 	/*
 	 * Select whether to print the first line.  (Prime the pump.)
--- /dev/null	Sun Feb  9 02:39:46 2003
+++ randomize_fd.h	Sun Feb  9 00:57:27 2003
@@ -0,0 +1,30 @@
+/* $FreeBSD$ */
+
+#ifndef __RANDOMIZE_FD__
+#define __RANDOMIZE_FD__
+
+#include <ctype.h>
+#include <err.h>
+#include <errno.h>
+#include <sys/param.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#define RANDOM_TYPE_UNSET 0
+#define RANDOM_TYPE_LINES 1
+#define RANDOM_TYPE_WORDS 2
+
+/* The multiple instance single integer key */
+struct rand_node {
+	u_char *cp;
+	u_int len;
+	struct rand_node *next;
+};
+
+int randomize_fd(int fd, int type, int unique, double denom);
+
+#endif
--- /dev/null	Sun Feb  9 02:39:46 2003
+++ randomize_fd.c	Sun Feb  9 02:20:54 2003
@@ -0,0 +1,193 @@
+/* $FreeBSD$ */
+
+#include "randomize_fd.h"
+
+struct rand_node *rand_root;
+struct rand_node *rand_tail;
+
+
+static
+struct rand_node *rand_node_allocate(void)
+{
+	struct rand_node *n;
+
+	n =3D (struct rand_node *)malloc(sizeof(struct rand_node));
+	if (n =3D=3D NULL)
+		err(1, "malloc");
+
+	n->len =3D 0;
+	n->cp =3D NULL;
+	n->next =3D NULL;
+	return(n);
+}
+
+
+static
+void rand_node_free(struct rand_node *n)
+{
+	if (n !=3D NULL) {
+		if (n->cp !=3D NULL)
+			free(n->cp);
+
+		free(n);
+	}
+}
+
+
+static
+void rand_node_free_rec(struct rand_node *n)
+{
+	if (n !=3D NULL) {
+		if (n->next !=3D NULL)
+			rand_node_free_rec(n->next);
+
+		rand_node_free(n);
+	}
+}
+
+
+static
+struct rand_node *rand_node_append(struct rand_node *n)
+{
+	if (rand_root =3D=3D NULL) {
+		rand_root =3D rand_tail =3D n;
+		return(n);
+	} else {
+		rand_tail->next =3D n;
+		rand_tail =3D n;
+
+		return(n);
+	}
+}
+
+
+int randomize_fd(int fd, int type, int unique, double denom)
+{
+	u_char *buf, *p;
+	u_int numnode, j, selected, slen;
+	struct rand_node *n, *prev;
+	int bufc, bufleft, buflen, eof, fndstr, i, len, ret;
+
+	rand_root =3D rand_tail =3D NULL;
+	bufc =3D bufleft =3D eof =3D fndstr =3D numnode =3D 0;
+
+	if (type =3D=3D RANDOM_TYPE_UNSET)
+		type =3D RANDOM_TYPE_LINES;
+
+	buflen =3D sizeof(u_char) * MAXBSIZE;
+	buf =3D (u_char *)malloc(buflen);
+	if (buf =3D=3D NULL)
+		err(1, "malloc");
+
+	while (!eof) {
+		/* Check to see if we have bits in the buffer */
+		if (bufleft =3D=3D 0) {
+			len =3D read(fd, buf, buflen);
+			if (len =3D=3D -1)
+				err(1, "read");
+			else if (len =3D=3D 0)
+				break;
+			else if (len < buflen) {
+				buflen =3D len;
+				eof++;
+			}
+
+			bufleft =3D len;
+		}
+
+		/* Look for a newline */
+		for (i =3D bufc; i <=3D buflen; i++, bufleft--) {
+			if (i =3D=3D buflen) {
+				if (fndstr) {
+					if (!eof) {
+						memmove(buf, &buf[bufc], i - bufc);
+						i =3D i - bufc;
+						bufc =3D 0;
+						len =3D read(fd, &buf[i], buflen - i);
+						if (len =3D=3D -1)
+							err(1, "read");
+						else if (len =3D=3D 0) {
+							eof++;
+							break;
+						} else if (len < buflen -i )
+							buflen =3D i + len;
+
+						bufleft =3D len;
+						fndstr =3D 0;
+					}
+				} else {
+					p =3D (u_char *)realloc(buf, buflen * 2);
+					if (p =3D=3D NULL)
+						err(1, "realloc");
+
+					buf =3D p;
+					if (!eof) {
+						len =3D read(fd, &buf[i], buflen);
+						if (len =3D=3D -1)
+							err(1, "read");
+						else if (len =3D=3D 0) {
+							eof++;
+							break;
+						} else if (len < buflen -i )
+							buflen =3D len;
+
+						bufleft =3D len;
+					}
+
+					buflen *=3D 2;
+				}
+			}
+
+			if ((type =3D=3D RANDOM_TYPE_LINES && buf[i] =3D=3D '\n') ||
+			    (type =3D=3D RANDOM_TYPE_WORDS && isspace((int)buf[i])) ||
+			    (eof && i =3D=3D buflen - 1)) {
+				n =3D rand_node_allocate();
+				slen =3D i - bufc;
+				n->len =3D slen + 2;
+				n->cp =3D (u_char *)malloc(slen + 2);
+				if (n->cp =3D=3D NULL)
+					err(1, "malloc");
+
+				memmove(n->cp, &buf[bufc], slen);
+				n->cp[slen] =3D buf[i];
+				n->cp[slen + 1] =3D '\0';
+				bufc =3D i + 1;
+				fndstr =3D 1;
+				rand_node_append(n);
+				numnode++;
+			}
+		}
+	}
+
+	(void)close(fd);
+
+	for (i =3D numnode; i > 0; i--) {
+		selected =3D ((int)denom * random())/(((double)RAND_MAX + 1) / numnode);
+
+		for (j =3D 0, prev =3D n =3D rand_root; n !=3D NULL; j++, prev =3D n, n =
=3D n->next) {
+			if (j =3D=3D selected) {
+				ret =3D printf("%.*s", n->len - 1, n->cp);
+				if (ret < 0)
+					err(1, "printf");
+				if (unique) {
+					if (n =3D=3D rand_root)
+						rand_root =3D n->next;
+					if (n =3D=3D rand_tail)
+						rand_tail =3D prev;
+
+					prev->next =3D n->next;
+					rand_node_free(n);
+					numnode--;
+					break;
+				}
+			}
+		}
+	}
+
+	fflush(stdout);
+
+	if (!unique)
+		rand_node_free_rec(rand_root);
+
+	return(0);
+}

--FIfSo9pyi3Jhph90--

--9wgEzdE/cOoYWTWe
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Comment: Sean Chittenden <sean@chittenden.org>

iD8DBQE+RjG/joUuCl9bPssRAsmxAJ910Yg6Ros75O7qtk0ND6uizZAnbgCgoOdZ
SkBTsM8+OT8Q7zNL5lFztpw=
=jbsi
-----END PGP SIGNATURE-----

--9wgEzdE/cOoYWTWe--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ports" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030209104727.GO15936>