Date: Mon, 04 Sep 2023 22:07:38 +0000 From: bugzilla-noreply@freebsd.org To: doc@FreeBSD.org Subject: [Bug 273245] textproc/groff: groff_mdoc(7): output from 'man 7 groff_mdoc' is badly broken Message-ID: <bug-273245-9-xha1EPcTCy@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-273245-9@https.bugs.freebsd.org/bugzilla/> References: <bug-273245-9@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D273245 --- Comment #6 from G. Branden Robinson <g.branden.robinson@gmail.com> --- Hi Wolfgang, I cloned the freebsd-src repository to have a look at the 35 cases within it that concerned you. > for i in freebsd-{src,ports,doc};do (cd $i && printf "$i "; git grep 'rof= f.* -man[^d]' |wc -l );done > freebsd-src 35 $ git grep -n 'roff.* -man[^d]' contrib/byacc/aclocal.m4:1047:${NROFF_NOTE} [\$](SHELL) -c "tbl [\$]*.$= 2 | nroff -man | col -bx" >[\$]@ contrib/byacc/aclocal.m4:1053:${GROFF_NOTE} [\$](SHELL) -c "tbl [\$]*.$= 2 | groff -man" >[\$]@ contrib/byacc/aclocal.m4:1056:${GROFF_NOTE} GROFF_NO_SGR=3Dstupid [\$](= SHELL) -c "tbl [\$]*.$2 | nroff -rHY=3D0 -Tascii -man | col -bx" >[\$]@ contrib/byacc/configure:8368:${NROFF_NOTE} \$(SHELL) -c "tbl \$*.1 | n= roff -man | col -bx" >\$@ contrib/byacc/configure:8374:${GROFF_NOTE} \$(SHELL) -c "tbl \$*.1 | g= roff -man" >\$@ contrib/byacc/configure:8377:${GROFF_NOTE} GROFF_NO_SGR=3Dstupid \$(SH= ELL) -c "tbl \$*.1 | nroff -rHY=3D0 -Tascii -man | col -bx" >\$@ byacc is maintained by Thomas Dickey. He's using Autoconf macros to produce output from sources that are known to be in man(7) format. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/yacc.1#L30 `CF_MAKE_DOCS` appears to be an Autoconf macro private to the byacc distribution; I see no other occurrences in `freebsd-src`. He also has a Autoconf test to determine how to generate HTML from man(7) pages. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L= 1580 Note in particular the here document at line 1694. https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L= 1694 contrib/dialog/makefile.in:145:@NROFF_NOTE@ GROFF_NO_SGR=3Dstupid $(SHE= LL) -c "tbl $< | nroff -rHY=3D0 -Tascii -man | col -bx" >$@ contrib/dialog/makefile.in:151:@GROFF_NOTE@ $(SHELL) -c "tbl $< | groff -man" >$@ Dialog is another Thomas Dickey project. It also builds inputs it knows to be in man(7) format. $ grep -nrFw .TH contrib/dialogcontrib/dialog/dialog.3:50:.TH \*D 3 "" "$Da= te: 2021/01/17 18:02:44 $" contrib/dialog/dialog.1:51:.TH \*D 1 "" "$Date: 2021/01/17 17:25:01 $" contrib/dialog/configure:8028:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5 contrib/dialog/aclocal.m4:5919:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5 $ grep -nrFw .Dd contrib/dialog || echo NONE # look for mdoc(7) documents NONE Next... contrib/ee/ee.1:5:.\" nroff -man ee.1 $ head contrib/ee/ee.1=20 .\" .\" .\" To format this reference page, use the command: .\" .\" nroff -man ee.1 .\" .\" $Header: /home/hugh/sources/old_ae/RCS/ee.1,v 1.22 2001/12/16 04:49:27 hugh Exp $ .\" .\" .TH ee 1 "" "" "" The man page is telling us explicitly (in a comment) what macro package to = use to format it, and unsurprisingly getting it right. Next... contrib/ldns/makewin.sh:243:for x in man1/*.1; do groff -man -Tascii -Z "$x= " | grotty -cbu > cat1/"$(basename "$x" .1).txt"; done contrib/ldns/makewin.sh:246:for x in man3/*.3; do groff -man -Tascii -Z "$x= " | grotty -cbu > cat3/"$(basename "$x" .3).txt"; done Again we have renderings of known documents. Let's see what package they u= se. $ find contrib/ldns -name "*.[13]" | xargs grep -nEw '\.(Dd|TH)' contrib/ldns/drill/drill.1:2:.TH drill 1 "28 May 2006" contrib/ldns/packaging/ldns-config.1:1:.TH ldns-config 1 "22 Sep 2011" So that's two more correct uses of '-man'. Next... contrib/ncurses/aclocal.m4:5607: nroff -man \$TMP >\$TMP.out contrib/ncurses/configure:14517: nroff -man \$TMP >\$TMP.out Another Thomas Dickey project. These come from his Autoconf macro `CF_MAN_PAGES`. I'll skip ahead here and note that I'm familiar with the ncurses man pages, having recently proposed patches to them.=20 https://lists.gnu.org/archive/html/bug-ncurses/2023-09/ They are exclusively in man(7) format, not mdoc(7). Here again we have a case of a maintainer knowing what format is required, = and using it. Next... contrib/tcp_wrappers/Banners.Makefile:12:# sequences as described in the hosts_access.5 manual page (`nroff -man' contrib/tcp_wrappers/CHANGES:2:configuration checker. See the `tcpdchk.8' manual page (`nroff -man' contrib/tcp_wrappers/CHANGES:349:have all rules within a single file. See "nroff -man hosts_options.5" contrib/tcp_wrappers/Makefile:575:# and hosts_options.5 manual pages (`nroff -man' format). contrib/tcp_wrappers/README:240:hosts_access.5 manual page, which is in `nr= off -man' format. A later contrib/tcp_wrappers/README:257:The hosts_options.5 manual page (`nroff -ma= n' format) documents an contrib/tcp_wrappers/README:395:documented in the hosts_options.5 document, which is in `nroff -man' contrib/tcp_wrappers/README:432:`nroff -man' format) can guide the requests= to the right server. These contrib/tcp_wrappers/README:453:given in the hosts_options.5 manual page (`nroff -man' format). An contrib/tcp_wrappers/README:897:hosts_access.5, which is in `nroff -man' format. This is a lengthy contrib/tcp_wrappers/README:904:The examples in the hosts_access.5 document (`nroff -man' format) show contrib/tcp_wrappers/README:912:hosts_options.5 document (`nroff -man' form= at). contrib/tcp_wrappers/README:918:program is described in the tcpdchk.8 docum= ent (`nroff -man' format). contrib/tcp_wrappers/README:929:described in the tcpdmatch.8 document (`nro= ff -man' format). contrib/tcp_wrappers/README:967:programs. The hosts_access.3 manual page (`nroff -man' format) contrib/tcp_wrappers/options.c:4: * manual page (source file: hosts_option= s.5, "nroff -man" format). These are all source comments or text file contents, and do not drive construction of anything; they therefore cannot cause failures. Neverthele= ss, let us see what macro package is employed by "tcp_wrappers". $ find contrib/tcp_wrappers -name "*.[1-9]" | xargs grep -nEw '\.(Dd|TH)' contrib/tcp_wrappers/hosts_options.5:1:.TH HOSTS_OPTIONS 5 contrib/tcp_wrappers/tcpdmatch.8:1:.TH TCPDMATCH 8 contrib/tcp_wrappers/tcpd.8:1:.TH TCPD 8 contrib/tcp_wrappers/tcpdchk.8:1:.TH TCPDCHK 8 contrib/tcp_wrappers/hosts_access.5:1:.TH HOSTS_ACCESS 5 contrib/tcp_wrappers/hosts_access.3:1:.TH HOSTS_ACCESS 3 It would appear once again that the upstream maintainer is familiar with th= eir own man pages. Next... contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e.= any other manpage. This is a comment. Some context might be helpful. $ git grep -C2 -n 'roff.* -man[^d]' contrib/tcsh/ contrib/tcsh/tcsh.man2html-11-# contrib/tcsh/tcsh.man2html-12-# Designed for tcsh manpage. Guaranteed not to work on manpages not written contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e.= any other manpage. contrib/tcsh/tcsh.man2html-14-# contrib/tcsh/tcsh.man2html-15-# Makes links FROM items which are both a) in particular sections (see Given that context "guaranteed *not* to work on [other man pages]", it does= not seem fair to hold this source remark as evidence militating against groff's change. Next... contrib/tzcode/workman.sh:18:.rm }F" | nroff -man - ${1+"$@"} | perl -ne ' More context is warranted here, too. $ head -n 18 contrib/tzcode/workman.sh #! /bin/sh # Convert manual page troff stdin to formatted .txt stdout. # This file is in the public domain, so clarified as of # 2009-05-17 by Arthur David Olson. if (type nroff && type perl) >/dev/null 2>&1; then # Tell groff not to emit SGR escape sequences (ANSI color escapes). GROFF_NO_SGR=3D1 export GROFF_NO_SGR echo ".am TH .hy 0 .na .. .rm }H .rm }F" | nroff -man - ${1+"$@"} | perl -ne ' Your concern might, at first glance, seem warranted here; the tool does pur= port to be of general use. However, closer inspection reveals that it was writt= en in ignorance or deliberate neglect of the mdoc(7) package altogether; obser= ve how it appends to the 'TH' macro, which is unused in mdoc(7). It would be straightforward to make this script handle mdoc(7) as well; sim= ply append the same two requests to the `Dd` macro. The removals of '}H' and '= }F' strings/macros/diversions suggests a familiarity with the AT&T Unix man(7) implementation or its descendants in USG/System III/System V proprietary Un= ix or BSD prior to Networking Release/2 (when the Berkeley CSRG replaced Unix troff with groff).=20 Further, if nroff or perl programs (or shell functions) are unavailable, th= is shell script proceeds to use mandoc anwyay. $ tail -n 6 contrib/tzcode/workman.sh elif (type mandoc && type col) >/dev/null 2>&1; then mandoc -man -T ascii "$@" | col -bx else echo >&2 "$0: please install nroff and perl, or mandoc and col" exit 1 fi Next... usr.bin/man/man.conf.5:116:NROFF_JA /usr/local/bin/groff -man -dlang=3Dja_JP.eucJP usr.bin/man/man.conf.5:117:TROFF_JA /usr/local/bin/groff -man -dlang=3Dja_JP.euc.jp This is from an example in FreeBSD's man.conf(5) page. The assignment to t= he "lang" string has no particular effect in groff, and to the best of my knowledge it never has. This may be evidence of jgroff, a fork of groff th= at was made unnecessary about in the Debian Project about 20 years ago (https://www.debian.org/security/2002/dsa-107 ) and was superseded by groff support in the 1.21 release about 13 years ago.=20 https://lists.gnu.org/archive/html/groff/2010-12/msg00051.html (Also, is that a typo adding a period in line 117.) I suggest that the foregoing might be bitrotted. Last... usr.bin/man/man.sh:1074:NROFF=3D'groff -S -P-h -Wall -mtty-char -man' usr.bin/man/man.sh:1078:TROFF=3D'groff -S -man' This is a case that should certainly be addressed, if this script still fulfills the purpose claimed for it in its initial commit in 2010: "Implementaiton [sic] of man, manpath, whatis, and apropos written entirely= in sh." I strongly recommend s/-man/&doc/ here so that FreeBSD users will continue = to have a positive experience. Also, for what it's worth, the `-S` option has been unnecessary; "safer" mo= de has been the default in groff since version 1.12, released on 14 December 1= 999. However, 32-33 false positives in a set of 35 suggests to me that your scan= ning criteria could benefit from sensitivity tuning. An error rate of over 90% = is generally considered unusable in serious measurement applications. Regards, Branden --=20 You are receiving this mail because: You are on the CC list for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-273245-9-xha1EPcTCy>