Date: Sun, 9 Feb 2003 02:47:27 -0800 From: Sean Chittenden <seanc@FreeBSD.org> To: audit@FreeBSD.org Subject: Making random(6) actually useful... Message-ID: <20030209104727.GO15936@perrin.int.nxad.com>
next in thread | raw e-mail | index | archive | help
--9wgEzdE/cOoYWTWe Content-Type: multipart/mixed; boundary="FIfSo9pyi3Jhph90" Content-Disposition: inline --FIfSo9pyi3Jhph90 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable There isn't much useful functionality that I'm aware of that comes out of random(6) as its implemented currently. What I was originally out to do was to randomize the search order of dist sites for ports/packages and didn't find anything that did what I wanted and was BSD licensed (or small). The new random(6) functionality allows for: 1) randomizing the order of downloads from a mirrors list. For example, in a Makefile: DOWNLOAD_SITES=3D `echo -n "${MASTER_SITE_RUBY} "| random -wf -` DOWNLOAD_SITES+=3D MASTER_SITE_LOCAL 2) randomizing playlists for MP3s: find ~/mp3s -name '*.mp3' | random -f - > play_list.m3u 3) being WARNS 5 compliant The biggest use though is for the ports downloads. If there aren't any objections to this, I'd like approval to commit it that way it can possibly MFC'ed into 4.8 before the freeze and used to help distribute the load of download sites more evenly. This hasn't been optimized for large files, but it does work quite well for anything under 50K lines/words. I've tested this locally with the following script: #BEGIN ruby for file in `locate '.'|egrep -i '\.(c|txt|sgml|h|conf|[1-9]|rb)$'` file.chomp! orig =3D `cat #{file}|sort|md5`.chomp new =3D `random -f #{file}|sort|md5`.chomp raise "#{file} different!" if orig !=3D new end #END The biggest bug with this right now is performance in that I haven't done anything special for handling large files. If it's a problem, I can do it and have a few ways of making it near O(N), but I don't have the energy or desire to do so unless it's needed. Comments/suggestions? -sc --=20 Sean Chittenden --FIfSo9pyi3Jhph90 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=patch Content-Transfer-Encoding: quoted-printable Index: Makefile =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/games/random/Makefile,v retrieving revision 1.3 diff -u -r1.3 Makefile --- Makefile 26 Mar 2001 14:20:59 -0000 1.3 +++ Makefile 9 Feb 2003 10:41:42 -0000 @@ -3,5 +3,7 @@ =20 PROG=3D random MAN=3D random.6 +SRCS=3D random.c randomize_fd.c +WARNS=3D 5 =20 .include <bsd.prog.mk> Index: random.6 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/games/random/random.6,v retrieving revision 1.5 diff -u -r1.5 random.6 --- random.6 10 Jul 2001 10:32:00 -0000 1.5 +++ random.6 9 Feb 2003 10:41:42 -0000 @@ -10,7 +10,7 @@ .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. All advertising materials mentioning features or use of this softwa= re -.\" must display the following acknowledgement: +.\" must display the following acknowledgment: .\" This product includes software developed by the University of .\" California, Berkeley and its contributors. .\" 4. Neither the name of the University nor the names of its contributors @@ -32,7 +32,7 @@ .\" @(#)random.6 8.2 (Berkeley) 3/31/94 .\" $FreeBSD: src/games/random/random.6,v 1.5 2001/07/10 10:32:00 ru Exp $ .\" -.Dd March 31, 1994 +.Dd February 8, 2003 .Dt RANDOM 6 .Os .Sh NAME @@ -41,14 +41,31 @@ .Sh SYNOPSIS .Nm .Op Fl er +.Op Fl f Ar filename .Op Ar denominator .Sh DESCRIPTION .Nm Random -reads lines from the standard input and copies them to the standard -output with a probability of 1/denominator. -The default value for +has two distinct modes of operations. The default is to read in lines +from stdin and randomly write them out to stdout with a probability of +1 / +.Ar denominator . +The default .Ar denominator -is 2. +for this mode of operation is 2, giving each line a 50/50 chance of +being displayed. +.Pp +The second mode of operation is to read in a file from +.Ar filename +and randomize the contents of the file and send it back out to stdout. +The contents can be randomized based off of newlines or based off of +space characters as determined by +.Xr isspace 3 . +The default +.Ar denominator +for this mode of operation is 1, which gives each line a chance to be +displayed, but in a +.Xr random 3 +order. .Pp The options are as follows: .Bl -tag -width Ds @@ -61,10 +78,40 @@ exit value of 0 to .Ar denominator \&- 1, inclusive. +.It Fl f Ar filename +The +.Fl f +option is used to specify the +.Ar filename +to read from. stdin is used if the filename is set to "-". +.It Fl l +Randomize the input via newlines (the default). .It Fl r The .Fl r option guarantees that the output is unbuffered. +.It Fl u +Tells +.Xr random 6 +not to select the same line or word from a file more than once (the +default). This does not guarantee uniqueness if there are two of the +same tokens from the input, but it does prevent selecting the same +token more than once. +.It Fl U +Tells +.Xr random 6 +that it is okay for it to reuse any given line or word when creating a +randomized output. +.It Fl w +Randomize words separated by +.Xr isspace 3 +instead of newlines. .El .Sh SEE ALSO -.Xr fortune 6 +.Xr fortune 6 , +.Xr random 3 +.Sh BUGS +There is no index used when printing out tokens from the list which +makes rather slow for large files (10MB+). If this were used in +performance sensitive areas, I'd do something about it. For smaller +files, however, it should still be quite fast and efficient. Index: random.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/games/random/random.c,v retrieving revision 1.12 diff -u -r1.12 random.c --- random.c 18 Feb 2002 05:15:18 -0000 1.12 +++ random.c 9 Feb 2003 10:41:42 -0000 @@ -52,13 +52,16 @@ =20 #include <err.h> #include <errno.h> +#include <fcntl.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> +#include "randomize_fd.h" =20 void usage(void); +int main(int argc, char**argv); =20 int main(argc, argv) @@ -66,19 +69,46 @@ char *argv[]; { double denom; - int ch, random_exit, selected, unbuffer_output; - char *ep; + int ch, fd, random_exit, randomize_lines, random_type, ret, + selected, unique_output, unbuffer_output; + char *ep, *filename; =20 - random_exit =3D unbuffer_output =3D 0; denom =3D 0; - while ((ch =3D getopt(argc, argv, "er")) !=3D -1) + filename =3D NULL; + random_type =3D RANDOM_TYPE_UNSET; + random_exit =3D randomize_lines =3D random_type =3D unbuffer_output =3D 0; + unique_output =3D 1; + while ((ch =3D getopt(argc, argv, "ef:lruUw")) !=3D -1) switch (ch) { case 'e': random_exit =3D 1; break; + case 'f': + randomize_lines =3D 1; + if (!strcmp(optarg, "-")) + filename =3D strdup("/dev/fd/0"); + else + filename =3D optarg; + break; + case 'l': + randomize_lines =3D 1; + random_type =3D RANDOM_TYPE_LINES; + break; case 'r': unbuffer_output =3D 1; break; + case 'u': + randomize_lines =3D 1; + unique_output =3D 1; + break; + case 'U': + randomize_lines =3D 1; + unique_output =3D 0; + break; + case 'w': + randomize_lines =3D 1; + random_type =3D RANDOM_TYPE_WORDS; + break; default: case '?': usage(); @@ -90,7 +120,7 @@ =20 switch (argc) { case 0: - denom =3D 2; + denom =3D (randomize_lines ? 1 : 2); break; case 1: errno =3D 0; @@ -109,16 +139,28 @@ =20 srandomdev(); =20 - /* Compute a random exit status between 0 and denom - 1. */ - if (random_exit) - return ((denom * random()) / LONG_MAX); - /* * Act as a filter, randomly choosing lines of the standard input * to write to the standard output. */ if (unbuffer_output) setbuf(stdout, NULL); + + /* + * Act as a filter, randomizing lines read in from a given file + * descriptor and write the output to standard output. + */ + if (randomize_lines) { + if ((fd =3D open(filename, O_RDONLY, 0)) < 0) + err(1, "%s", optarg); + ret =3D randomize_fd(fd, random_type, unique_output, denom); + if (!random_exit) + return(ret); + } + + /* Compute a random exit status between 0 and denom - 1. */ + if (random_exit) + return ((denom * random()) / LONG_MAX); =20 /* * Select whether to print the first line. (Prime the pump.) --- /dev/null Sun Feb 9 02:39:46 2003 +++ randomize_fd.h Sun Feb 9 00:57:27 2003 @@ -0,0 +1,30 @@ +/* $FreeBSD$ */ + +#ifndef __RANDOMIZE_FD__ +#define __RANDOMIZE_FD__ + +#include <ctype.h> +#include <err.h> +#include <errno.h> +#include <sys/param.h> +#include <sys/types.h> +#include <sys/uio.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> +#include <unistd.h> + +#define RANDOM_TYPE_UNSET 0 +#define RANDOM_TYPE_LINES 1 +#define RANDOM_TYPE_WORDS 2 + +/* The multiple instance single integer key */ +struct rand_node { + u_char *cp; + u_int len; + struct rand_node *next; +}; + +int randomize_fd(int fd, int type, int unique, double denom); + +#endif --- /dev/null Sun Feb 9 02:39:46 2003 +++ randomize_fd.c Sun Feb 9 02:20:54 2003 @@ -0,0 +1,193 @@ +/* $FreeBSD$ */ + +#include "randomize_fd.h" + +struct rand_node *rand_root; +struct rand_node *rand_tail; + + +static +struct rand_node *rand_node_allocate(void) +{ + struct rand_node *n; + + n =3D (struct rand_node *)malloc(sizeof(struct rand_node)); + if (n =3D=3D NULL) + err(1, "malloc"); + + n->len =3D 0; + n->cp =3D NULL; + n->next =3D NULL; + return(n); +} + + +static +void rand_node_free(struct rand_node *n) +{ + if (n !=3D NULL) { + if (n->cp !=3D NULL) + free(n->cp); + + free(n); + } +} + + +static +void rand_node_free_rec(struct rand_node *n) +{ + if (n !=3D NULL) { + if (n->next !=3D NULL) + rand_node_free_rec(n->next); + + rand_node_free(n); + } +} + + +static +struct rand_node *rand_node_append(struct rand_node *n) +{ + if (rand_root =3D=3D NULL) { + rand_root =3D rand_tail =3D n; + return(n); + } else { + rand_tail->next =3D n; + rand_tail =3D n; + + return(n); + } +} + + +int randomize_fd(int fd, int type, int unique, double denom) +{ + u_char *buf, *p; + u_int numnode, j, selected, slen; + struct rand_node *n, *prev; + int bufc, bufleft, buflen, eof, fndstr, i, len, ret; + + rand_root =3D rand_tail =3D NULL; + bufc =3D bufleft =3D eof =3D fndstr =3D numnode =3D 0; + + if (type =3D=3D RANDOM_TYPE_UNSET) + type =3D RANDOM_TYPE_LINES; + + buflen =3D sizeof(u_char) * MAXBSIZE; + buf =3D (u_char *)malloc(buflen); + if (buf =3D=3D NULL) + err(1, "malloc"); + + while (!eof) { + /* Check to see if we have bits in the buffer */ + if (bufleft =3D=3D 0) { + len =3D read(fd, buf, buflen); + if (len =3D=3D -1) + err(1, "read"); + else if (len =3D=3D 0) + break; + else if (len < buflen) { + buflen =3D len; + eof++; + } + + bufleft =3D len; + } + + /* Look for a newline */ + for (i =3D bufc; i <=3D buflen; i++, bufleft--) { + if (i =3D=3D buflen) { + if (fndstr) { + if (!eof) { + memmove(buf, &buf[bufc], i - bufc); + i =3D i - bufc; + bufc =3D 0; + len =3D read(fd, &buf[i], buflen - i); + if (len =3D=3D -1) + err(1, "read"); + else if (len =3D=3D 0) { + eof++; + break; + } else if (len < buflen -i ) + buflen =3D i + len; + + bufleft =3D len; + fndstr =3D 0; + } + } else { + p =3D (u_char *)realloc(buf, buflen * 2); + if (p =3D=3D NULL) + err(1, "realloc"); + + buf =3D p; + if (!eof) { + len =3D read(fd, &buf[i], buflen); + if (len =3D=3D -1) + err(1, "read"); + else if (len =3D=3D 0) { + eof++; + break; + } else if (len < buflen -i ) + buflen =3D len; + + bufleft =3D len; + } + + buflen *=3D 2; + } + } + + if ((type =3D=3D RANDOM_TYPE_LINES && buf[i] =3D=3D '\n') || + (type =3D=3D RANDOM_TYPE_WORDS && isspace((int)buf[i])) || + (eof && i =3D=3D buflen - 1)) { + n =3D rand_node_allocate(); + slen =3D i - bufc; + n->len =3D slen + 2; + n->cp =3D (u_char *)malloc(slen + 2); + if (n->cp =3D=3D NULL) + err(1, "malloc"); + + memmove(n->cp, &buf[bufc], slen); + n->cp[slen] =3D buf[i]; + n->cp[slen + 1] =3D '\0'; + bufc =3D i + 1; + fndstr =3D 1; + rand_node_append(n); + numnode++; + } + } + } + + (void)close(fd); + + for (i =3D numnode; i > 0; i--) { + selected =3D ((int)denom * random())/(((double)RAND_MAX + 1) / numnode); + + for (j =3D 0, prev =3D n =3D rand_root; n !=3D NULL; j++, prev =3D n, n = =3D n->next) { + if (j =3D=3D selected) { + ret =3D printf("%.*s", n->len - 1, n->cp); + if (ret < 0) + err(1, "printf"); + if (unique) { + if (n =3D=3D rand_root) + rand_root =3D n->next; + if (n =3D=3D rand_tail) + rand_tail =3D prev; + + prev->next =3D n->next; + rand_node_free(n); + numnode--; + break; + } + } + } + } + + fflush(stdout); + + if (!unique) + rand_node_free_rec(rand_root); + + return(0); +} --FIfSo9pyi3Jhph90-- --9wgEzdE/cOoYWTWe Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Comment: Sean Chittenden <sean@chittenden.org> iD8DBQE+RjG/joUuCl9bPssRAsmxAJ910Yg6Ros75O7qtk0ND6uizZAnbgCgoOdZ SkBTsM8+OT8Q7zNL5lFztpw= =jbsi -----END PGP SIGNATURE----- --9wgEzdE/cOoYWTWe-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-ports" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030209104727.GO15936>