Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 May 2000 02:01:31 -0400
From:      Garance A Drosihn <drosih@rpi.edu>
To:        arch@FreeBSD.ORG
Subject:   some lpr changes: examples  (LONG)
Message-ID:  <v04210112b5577e53e9dd@[128.113.24.47]>

next in thread | raw e-mail | index | archive | help
While I am now a lot closer to freebsd's current source than I
was two months ago, things still aren't sorted out to the point
that I could just point to a list of diff's.  So, I want to list
out some of our changes, and let people tell me which ones people
would like to see first.

We've added two new printcap values, 'ss=' and 'sr='.  These
specify files to use for "transfer statistics".  'sr' tracks
print-data files as they are received from other hosts, 'ss'
tracks print-data files are they are sent (usually via the
standard lpd protocols) to some other host (usually the real
printer).  They serve different purposes (for me, at least).
Sometimes our print servers slow down due to network problems,
but usually I don't notice until the performance is really
abysmal.  And even when I *do* finally notice, I have no way of
saying when the problem started, or whether it was a gradual
problem or if it suddenly appeared.  Tracking how fast jobs
PRINT is not useful for this, because there are so many things
which can effect that (have you ever printed 3D PS files from
matlab?  ouch).  But if the print servers track transfer times
of files coming INTO the print server, that should be a good
measure of network performance.  Not only that, but it can show
if a problem is a general problem (from everywhere on campus),
or if it's only happening from some hosts.  so, 'sr' is really
a network-diagnosing tool.  And all these log files are on the
print server, which is the machine we beef up with plenty of
extra disk space.

'ss' is a poor-man's accounting record.  We don't get information
on pages-printed, but at least we know WHO printed which job at
what time, in case someone is dumping large jobs to one of these
printers.  Since MOST of our queues are driven by CAP, only a few
queues use this feature.  'sr' and 'ss' produce records in the
same format, for ease in processing.  Record includes a timestamp
(in the format '%Y.%m%d.%H%M%S%Z.a', which makes it easy to sort
or skip over, but still contains all the info one might want in
an easily "eyeball-able" form).  Some other fields are queue-name,
originating host-name, job-id, file-count (for jobs which include
multiple files, there's one record per file), userid, bytecount,
transfer-time (seconds), transfer-rate (KBytes/sec), and a few
other things.  I'll skip over some details on this.

Note that you might think you could do these 'ss' records via
some filter specified as 'if='.  That doesn't work.  The filter
gets executed to copy the original file to a new file, and *then*
lpd transfers the new file to the other host.  When the 'if='
filter is running, it can have NO idea of transfer times, or
even if the destination printer is even up.
  - - -
The current freebsd has the notion of a canonical printername.
If a printcap entry has multiple names, it is assumed that the
first name is the canonical one.  At RPI, we basically have two
canonical names.  The "pretty-print" (long, descriptive, and in
mixed-case) name, and the "easy-to-type" short name (always all
lowercase, always less than 9 characters).  I've changed some
of lpr's references of 'pp->printer' (which is our long name,
up to 32 characters), to this new 'pp->short_name'.  For some
messages, we do want the long name, and for others (like the
above statistics records...), we'd really much rather have the
short name.  I'm still sorting out which name should be in which
message, but it's basically "long name to user, short name to
log files"...  I'm also thinking that maybe I should just rename
'pp->printer' variable to 'pp->long_name', so it'd be obvious
you're making an explicit decision as to which name to use.  If
a printcap entry has only one name, both fields are set to the
same string.

At RPI, the "pretty" name is always the first in the list, and
the "short lc" name is always the last one.  If this update was
picked up, I'd probably have to do something which didn't make
that assumption.  Not sure what would be best...
  - - -
How do people handle all the input filters one can set in a
printcap entry?  Historically I have setup the printcap entry
which actually drives the printer so that 'if=', 'cf=', et al
are set to separate scripts.  Now, those scripts are really
just eight copies of the exact same script.  That script, in
turn, looks at $0 to tell which filter it is supposed to act
like, and does a 'case' statement based on that.  I think I
picked this up from some CAP examples.  Recently it occurred
to me that this is pretty stupid.  Not only that, but it does
not work when people use an 'if=' filter to send jobs to a
remote printer (because for remote printers, lpd only checks
for 'if=', and ignores all the rest).  I don't know how good
the lpd interface is on your Laser printers, but mine doesn't
know what to do about 'dvi' files or some of the other special
cases we use...

So, I added a 'jf=' entry, which is meant to say "default
per-job filter".  If it is set, then that value is used for
all per-job filters which are not explicitly set in that
printcap entry.  Furthermore, I export an environment variable
which indicates which filter the script is being called for.
LPD_SCRIPT is set to values like "standard", "fortran", "dvi",
etc.  You'd have to change your filter scripts to take advantage
of this, but the idea is that this change can go into lpr without
changing anything in anyone's current setup, and people can move
to it as they feel like it.  (if they feel like it).

Note that the value given for 'jf=' is (obviously) not used to
set a value for 'of='...
  - - -
There are a lot of other things I put into environment variables,
for the benefit of the scripts one might run.  I use them for my
CAP scripts, for instance.  (while I have a lot of interesting
changes to CAP, I should warn that my CAP sources are a hybrid
mess of changes, and thus would probably take a long time to come
up with diff's for CAP that others could use...).

I add things like:
   LPD_DF_FILE - the current datafile name (I forget what I do
                 with this in the script, it might be that I
                 don't actually use this anymore...).
   LPD_ERRFILE - the temporary file lpd uses for error messages.
                 It may be this is only useful due to other
                 changes I have in lpd...
   LPD_EMAILFILE - similar to the error file, our lpd now creates
                 a temp file for holding a message to send back
                 to the user.  Our CAP is setup to put error
                 messages or other useful info in this, and after
                 CAP is done then lpd emails the contents of this
                 (if it isn't empty) back to the user.
   LPD_FILTER - already described...
   LPD_WANTHDR - indicates whether LPD thinks a header page should
                 be printed for this file.  (note that a single
                 "print job" may consist of multiple files, and
                 the input filter is started for each file, not
                 once per job).  This reflects things like '-h'
                 and 'sh' too.  (for our plotters, the "header box"
                 is printed by the if= filter, not the of= filter).
   LPD_PSONLY - hmm, I'll skip the details on this for now.
   LPD_PTRTYPE, LPD_PTRSTYPE - indicate printer type and subtype,
                 as picked up from the printcap entry (if they
                 were set).  Type is set by 'ty=' (inspired from
                 what NeXTSTEP used...), subtype is set by 'sy='
                 (which is my own invention...)
   LPD_USERID - userid of original user.  Yes, I know this is already
                 passed as an explicit parameter.
   LPD_ORIGHOST - host which originated the print job.
   LPD_USERACCT - Hmm, I think this is the name of the accounting
                  file to write records to.
   LPD_JOBISPS - Since almost all of our printers are postscript,
                 it is nice to know if the current script is already
                 postscript.  lpd figures this out and sets the
                 environment variable, instead of having every script
                 do handsprings to figure it out.
   LPD_JOBNUM - the job number for this job, ie, what the user sees
                in the 'Job' column in 'lpq'.  This makes sure the
                accounting records (from CAP, in my case) includes
                the same number that the user remembers from lpq.
   LPD_FILENO - file number within the current job.  Our cap also
                stuffs this in the accounting record (we have some
                printers where we have a "per-job" charge, and we
                only want to charge that once PER JOB, not once per
                file in a job...).
   LPD_JOBNAME - what the user set for jobname (-J).
   LPD_CLASS   - what the user set for class (-C)
   LPD_CLAIMEDUSER - I'll skip the details on this for now
   LPD_EMAILTO - who to send email to.  I'm pretty sure I don't use
                 use this anymore, since lpd sends the email.
   LPD_XHEADER, LPD_XINITIAL, LPD_XFORCED, LPD_X_OPT - I'll skip this
                 for now, but I think this is basically the same idea
                 as lprNG now has with "Z-options".  RPI has had the
                 "X-options" ("extended options") since about 1995,
                 but most of the implementation is based on how I have
                 our CAP setup to work.
I also intend to add that timestamp value (the one for ss=/sr=) as
an environment variable, so the filter (CAP in my case) can report
the exact same timestamp for the file that lpd is using.
  - - -
We have something called "printset" files, which a user or administrator
can use to set the default printer for lpr, lpq, lprm.  If no printer
is specified by the user, then 'lpr' (etc) will first check for a
"session printset file" (which probably won't make sense outside of
RPI), and then a "user printset file", and lastly a "host printset
file".  This way we have different default printers in different
public labs, but the user just types 'lpr' if they want the "right"
one (because we've set up /etc/printset files on those machines).

Besides defining the "default" printer, these printset files define
pseudo-printers called 'PS' and 'COLORPS' (or something like that).
So, we can install applications and tell them to print to 'PS' or
'COLORPS', and the user can control where that output will go.  We
needed this because we had a few X applications early on which were
a real headache to change the printer inside the application...
  - - -
For the past few years, I've had something called linked queues
"almost" implemented.  They were pretty much implemented everywhere
except for lpd.  They are also pretty much worthless without the
support in lpd... :-)  The idea was to create a single "fake" queue
for users to send to, and then the print server could link that queue
to one or more real queues.  In particular, we'd handle high-volume
queues by buying multiple identical printers, and have jobs from
users spread over the different real printers.  If one printer died,
or ran out of paper, or whatever, operators could stop that printer
and jobs would automatically go to whatever printers were still up.

I ran into a brick wall on this because all kinds of printer-specific
variables inside lpd were global variables at the time, and trying to
correctly manipulate them all was a mess.  Thanks to all the recent
changes to freebsd's lpr, these are now all in a struct, which will
make this a whole lot easier to implement!  Thanks!

I know lprNG has something like this, but I can't think of the name
used for it right now...
  - - -
I also have "exclusive queues".  This is where you have one physical
device (in our case, a plotter), and multiple exclusive queues which
use it (in our case, "standard paper" or "glossy paper" -- but the
plotter can only be set up for one or the other at any given moment).
All "exclusive queues" mean is that you can not start one of the
queues unless all of the related queues are already stopped.  This
is another thing that has been half-implemented for years, and I hope
to get back to now.
  - - -
I'm now also thinking of "merged queues", but since I haven't even
started on that there's probably not much point in describing it...
The idea is to handle printers which have several different types
of paper trays.  All of them CAN be active at once, but I'm sure
we'll have times when we'll be out of one (say, "transparencies"),
so we'd want to 'stop' those jobs while letting all the other jobs
through.  I must admit I've just starting thinking how I want this
to work, so maybe this will wait until I move to lprNG.
  - - -
I added two more signal handlers to lpd, -USR1 and -USR2.  They can
be used to raise or lower the logging-level ("lflag") without having
to restart the daemon.

There is also now a table for logging levels associated with the
"cmdname" table in lpd.  Recvjob, printjob, and rmjob are all logged
once 'lflag' is 1.  The display-job commands are only logged when
lflag is increased to two.  Our print servers always run with lflag
set to one, so we can track things (if problems come up).
  - - -
We use an automatic procedure (called "package") to update our
system disks.  If you have a busy print server, then killing and
restarting lpd (due to a new lpd) can be tricky.  I added an option
to 'lpd' so that it will use 'SO_REUSEADDR' when trying to bind to
the print socket.  This happens if '-R' is included when starting
up lpd.  I must admit that I'm not sure this was the best solution,
but it DID solve the problem we were having when automatically
installing and restarting lpd.  Still, it's an option, so at least
it doesn't happen unless you ask it to  :-)
  - - -
In 'lpc', the 'clean' command currently makes decisions based on
some assumptions it makes about filenames in the spool directory.
In particular, it assumes that a file called 'df[A-Z]<x><hostname>'
should have a matching control file named 'cfA<x><hostname>'.  This
is, in fact, a bad assumption, for two reasons:

1. when creating a data-file (df...), lpd uses the hostname the
    machine currently uses.  however, when RECEIVING a control
    file, lpd uses the hostname of the *connection* the file is
    coming in on.  Thus, on a host with multiple interfaces, you
    can end up with datafiles based on one interface, and the
    control files based on another one.  As luck would have it,
    we used to have one such machine, and I did an 'lpc clean'
    on a busy print server, and that wiped out over a hundred
    waiting print jobs (last week of classes).  I was not popular...
2. due to printers dieing, we sometimes accept jobs on one print
    server only to bounce them to a second print server.  When
    this happens, the datafile is the original host, and the
    control file is the named based on that middle print server.

Since I was *very* unpopular after that 'lpc clean', I made a few
changes to address this.  For one, 'lpc clean' will only remove
files which are more than an hour old.  Thus, even if the names
are out-of-sync, it is much less likely you'll kill a job.

For two, I keep track of hosts which are "under our control"
(ie, all of which are using the same set of userids).  For those
hosts, lpd does not change the hostname on an incoming control
file to match the connection it is coming in on.  Thus, the cf
file does (generally) still match the df filename.

For three, I added a 'tclean' command to lpc.  It tells you what
'clean' would do without actually deleting any files.  I made a
few other changes to 'clean' while I was at it.

Caveat: While I am nearly caught up with freebsd's lpr for most
things, I'm still way-out-of-date on 'lpc', so these changes
are probably the farthest away from being usable...
  - - -
lpr has a status file it uses to hold information about the
status of a given print queue.  I have it tack on 'extended'
to that filename, and have CAP (or whatever the print filter
is) put "extended status" information into that file.  'lpq'
will include this "extended status" line if the user specifies
'lpq -l' (long).
  - - -
I have lpd setting the value in that status file in a number
of extra places in it's processing too.  I've gone on a few
wild goose changes on some problem because the status message
implied lpd was stuck in one section while it was really stuck
somewhere else.

I also changed 'lpc' so operators could SET a status message
(when a printer is down, for instance).
  - - -
I'm also in the process of providing a web interface to 'lpq'
and 'lprm'.  While most of that will probably be RPI-specific,
I am thinking of making an alternate display format which would
be easier for a script to parse, and would include more info.
Note that this will NOT spit out HTML, it would just print out
info in a format that will be easier for a CGI script to turn
into HTML.

In my case, the script notices if the web-user (authenticated
via https) owns a particular job in the queue listing, and if
so it adds a link the user can use to delete that job.  Still
in the early stages, but it seems to work pretty well.
  - - -
Related to the web-page project, I've changed lpq and lprm
to allow '-Pqueue@hostname'.  The queue has to match a queue
in the current host's printcap file, but you can ask for the
status on that queue on a particular host.

The reason for this is that we would have one server which is
our heavy-duty CGI web server.  For any given queue, a PC user
might want to check the queue's status from the samba server,
or a Mac user might want to check it from one of the hosts
which is accepting print jobs from Macs.  We do have times
where a print server is so backed up that jobs are still on
the originating host (eg, samba), and if people don't see their
job in the queue listing then they just send it again.

Lpr also recognizes '-Pqueue@hostname', and just ignores the
@hostname part (IIRC).  I probably should abort instead.  In
any case, my thinking is that I don't want people to be able
to DIRECT their print jobs to some specific host, but I want
to be able to check and remove jobs.

That's how it works right now.  I hope to soon add a config file,
probably called '/etc/hosts.lpd-dest', which would have a line
in it for "destination-specific' host information.  Then I'd
limit the @hostname to hosts specifically named in that file.
  - - -
We had some linux users on campus who were sending print jobs
with no "filterline" specified.  The control file would have
unlink lines for all the data files which were set, but it
wouldn't include any instructions of what to do with those
files except to unlink them.

Since I have no administrative control over that host, and since
I didn't know enough about linux to tell him what to do, I changed
our lpd to recognize this situation.  If a control file would
print out zero files, but it would also unlink one or more files,
then I have lpd first print out all the files it would unlink
using the standard ('if=') filter. This seems to have solved the
problem for some linux users.
  - - -
Freebsd's lpr currently recognizes a 'rg=' attribute to indicate
the printer is restricted to a certain group of users (ie, a
group name from /etc/group).  I expanded upon this to add 'ru=',
'xg=', and 'xu=' attributes.  I also changed 'rg=' to accept
more than one group.  So:
    rg= one or more groups of users.  if specified, the printer
        is limited to these users in these groups.
    ru= one more more userids.
    xg= one or more groups of users.  If specified, the users
        in these groups are NOT allowed to use the printer.
    xu= one or more userids.  If specified, these userids are
        NOT allowd to use the printer.
There is a hierarchy to how these are processed, but I forget
the exact order right now.  But you could use a group for one
set of people (say "all people in dept X"), and then exclude
some specific bad-actors out of that group using xu or xg.

These were done this way because we had a department with several
different printers, and they wanted all their faculty to print to
all those printers, but a different list of allowed students for
each one.  I didn't want to have to ten different /etc/groups,
where most of the people in any one group were in all the other
groups too.  Also, we (the computer center) could automatically
generate the appropriate list of faculty, while the list of
students would be something the department kept up "by hand".

The irony is that after using this for a year or two, the dept
decided it was easier to just let everyone use the printer,
because they couldn't remember to update the various lists of
which students were allowed to print to which printer...
  - - -
One of the OS's we run this on (Solaris?) didn't have bsd-style
signal processing, or it didn't work quite the way I wanted it,
so I have an update to use posix-based signals (sigaction, etc)
based on a compile-time setting.  I imagine I should just switch
to using posix signals for all platforms, but I don't know how
well those are supported on other OS's.
  - - -
Well, I'm sure there are other changes in RPI's lpr that I'd like
to sort out, but I'm ready to call it a day at this point.  I also
have a number of changes in the back of my head that I have not
written yet, but I'd rather start with things I already have
working in some form.  I'm also interested in seeing what other
changes might interest me from other places (netbsd, openbsd, etc),
just so freebsd's lpr "plays well" with the others.  Similarly, I
have a linux (redhat) system I can check against.

While this may all sound interesting, I'm not making any promises
on how rapidly I could produce the usable updates for all of them!

Apologies if this is too long or too specialized for arch...

---
Garance Alistair Drosehn           =   gad@eclipse.acs.rpi.edu
Senior Systems Programmer          or  drosih@rpi.edu
Rensselaer Polytechnic Institute


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?v04210112b5577e53e9dd>