Date: Tue, 11 May 2010 10:24:34 -0700 From: Tim Prouty <tim.prouty@isilon.com> To: freebsd-arch@freebsd.org Cc: Matthew Fleming <matthew.fleming@isilon.com>, Zachary Loafman <zachary.loafman@isilon.com> Subject: [PATCH]/[RFC] Increase scalability of per-process file descriptor data structures Message-ID: <F2459D9D-4102-4D1D-BDCB-4F5AA8DE336D@isilon.com>
next in thread | raw e-mail | index | archive | help
--Apple-Mail-16--1061268945 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Hi, This is my first time sending a patch to the list, so let me know if there are any conventions I missed. Attached is a patch that attempts to remove the data structure limitations on the number of open file descriptors in the system. The patch is against our modified version of FreeBSD 7, so it probably won't apply cleanly against upstream, but I wanted to get this out there for discussion soon so if there is feedback, we can address it and then worry about porting a specific patch for upstream. We (Isilon) have been running this internally for a few months without any issues, although there is at least one known issue that I need to resolve, which is mentioned below. Motivation: With the increasing amount of memory and processing power in modern machines, there are certain userspace processes that are able to handle much higher concurrent load than previously possible. A specific example is a single-process/multi-threaded SMB stack which can handle thousands of connected clients, each with hundreds of files open. Once kernel sysctl limits are increased for max files, the next limitation is in the actual actual file descriptor data structures. Problem - Data Structure Limits: The existing per-process data structures for the file descriptor are flat tables, which are reallocated each time they need need to grow. This is innefficient as the amount of data to allocate and copy each time increases, but the bigger issue is the potentially limited amount of contiguous KVA memory as the table grows very large. Over time as the KVA memory becomes fragmanted, malloc may be unable to provide large enough blocks of contiguous memory. In the current code the struct proc contains both an array of struct file pointers and a bit field indicating which file descriptors are in use. The primary issue is how to handle these structures growing beyond the kernel page size of 4K. The array of file pointers will grow much faster than the bit field, especially on a 64 bit kernel. The 4K block size will be hit at 512 files (64 bit kernel) for the file pointer array and 32,768 files for the bit field. Solution: File Pointer Array Focusing first on the file pointer array limitation, an indirect block approach is used. An indirect block size of 4K (equal to page size) is used, allowing for 512 files per block. To optimize for the common case of low/normal fd usage, a flat array is initialized to 20 entries and then grows at 2x each time until the block reaches it's maximum size. Once more than 512 files are opened, the array will transition to a single level indirect block table. Fd Bitfield: The fd bit field as it stands can represent 32K files when it grows to the page size limit. Using the same indirect system as the file pointer array, it is able to grow beyond it's existing limits. Close Exec Field: One complication of the old file pointer table is that for each file pointer there was 1 byte flags. The memory was laid out such that the file pointers are all in one contiguous array, followed by a second array of chars where each char entry is a flags field that corresponds to the file pointer at the same index. Interestingly there is actually only one flag that is used: UF_EXCLOSE, so it's fairly wasteful to have an array of chars. What linux does, and what I have done is to just use a bitfield for all fds that should be closed on exec. This could be further optimized by doing some pointer trickery to store the close exec bit in the struct file pointer rather than keep a separate bitfield. Indirect Block Table: Since there are three consumers of the indirect block table, I generalized it so all of the consumers rely on the same code. This could eventually be refactored into a kernel library since it could be generally useful in other areas. The table uses a single level of indirection, so the base table can still grow beyond the 4K. As a process uses more fds, the need to continue growing the base table should be fairly limited, and a single realloc will significantly increase the number of fds the process can allocate. Accessing the new data structures: All consumers of the file pointer array and bitfield will now have to use accessors rather than using direct access. Known Issues: The new fdp locking in fdcopy needs to be reworked. Thank you for reviewing! -Tim --Apple-Mail-16--1061268945 Content-Disposition: attachment; filename=0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch Content-Type: application/octet-stream; x-unix-mode=0700; name="0001-Increase-scalabilty-of-per-process-file-descriptor-d.patch" Content-Transfer-Encoding: quoted-printable =46rom=20a9dc5a7bc921f992a8775b4e1b9abcda2a0547c6=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20tprouty=20= <tprouty@b72e2a10-2d34-0410-9a71-d3beadf02b57>=0ADate:=20Mon,=2022=20Mar=20= 2010=2018:09:31=20+0000=0ASubject:=20[PATCH]=20Increase=20scalability=20= of=20per-process=20file=20descriptor=20data=20structures=0A=0A= Motivation:=0A=0A=20=20With=20the=20increasing=20amount=20of=20memory=20= and=20processing=20power=20in=20modern=0A=20=20machines,=20there=20are=20= certain=20userspace=20processes=20that=20are=20able=20to=0A=20=20handle=20= much=20higher=20concurrent=20load=20than=20previously=20possible.=20=20A=0A= =20=20specific=20example=20is=20a=20single-process/multi-threaded=20SMB=20= stack=20which=0A=20=20can=20handle=20thousands=20of=20connected=20= clients,=20each=20with=20hundreds=20of=0A=20=20files=20open.=20=20Once=20= kernel=20sysctl=20limits=20are=20increased=20for=20max=20files,=0A=20=20= the=20next=20limitation=20is=20in=20the=20actual=20actual=20file=20= descriptor=20data=0A=20=20structures.=0A=0AProblem=20-=20Data=20= Structure=20Limits:=0A=0A=20=20The=20existing=20per-process=20data=20= structures=20for=20the=20file=20descriptor=20are=0A=20=20flat=20tables,=20= which=20are=20reallocated=20each=20time=20they=20need=20need=20to=20= grow.=0A=20=20This=20is=20innefficient=20as=20the=20amount=20of=20data=20= to=20allocate=20and=20copy=20each=0A=20=20time=20increases,=20but=20the=20= bigger=20issue=20is=20the=20potentially=20limited=0A=20=20amount=20of=20= contiguous=20KVA=20memory=20as=20the=20table=20grows=20very=20large.=20=20= Over=0A=20=20time=20as=20the=20KVA=20memory=20becomes=20fragmanted,=20= malloc=20may=20be=20unable=20to=0A=20=20provide=20large=20enough=20= blocks=20of=20contiguous=20memory.=0A=0A=20=20In=20the=20current=20code=20= the=20struct=20proc=20contains=20both=20an=20array=20of=20struct=0A=20=20= file=20pointers=20and=20a=20bit=20field=20indicating=20which=20file=20= descriptors=20are=0A=20=20in=20use.=20=20The=20primary=20issue=20is=20= how=20to=20handle=20these=20structures=20growing=0A=20=20beyond=20the=20= kernel=20page=20size=20of=204K.=0A=0A=20=20The=20array=20of=20file=20= pointers=20will=20grow=20much=20faster=20than=20the=20bit=20field,=0A=20=20= especially=20on=20a=2064=20bit=20kernel.=20The=204K=20block=20size=20= will=20be=20hit=20at=20512=0A=20=20files=20(64=20bit=20kernel)=20for=20= the=20file=20pointer=20array=20and=2032,768=20files=0A=20=20for=20the=20= bit=20field.=0A=0ASolution:=0A=0AFile=20Pointer=20Array=0A=0A=20=20= Focusing=20first=20on=20the=20file=20pointer=20array=20limitation,=20an=20= indirect=0A=20=20block=20approach=20is=20used.=20=20An=20indirect=20= block=20size=20of=204K=20(equal=20to=20page=0A=20=20size)=20is=20used,=20= allowing=20for=20512=20files=20per=20block.=20=20To=20optimize=20for=0A=20= =20the=20common=20case=20of=20low/normal=20fd=20usage,=20a=20flat=20= array=20is=20initialized=0A=20=20to=2020=20entries=20and=20then=20grows=20= at=202x=20each=20time=20until=20the=20block=20reaches=0A=20=20it's=20= maximum=20size.=20Once=20more=20than=20512=20files=20are=20opened,=20the=20= array=0A=20=20will=20transition=20to=20a=20single=20level=20indirect=20= block=20table.=0A=0AFd=20Bitfield:=0A=0A=20=20The=20fd=20bit=20field=20= as=20it=20stands=20can=20represent=2032K=20files=20when=20it=20grows=0A=20= =20to=20the=20page=20size=20limit.=20=20Using=20the=20same=20indirect=20= system=20as=20the=20file=0A=20=20pointer=20array,=20it=20is=20able=20to=20= grow=20beyond=20it's=20existing=20limits.=0A=0AClose=20Exec=20Field:=0A=0A= =20=20One=20complication=20of=20the=20old=20file=20pointer=20table=20is=20= that=20for=20each=20file=0A=20=20pointer=20there=20was=201=20byte=20= flags.=20=20The=20memory=20was=20laid=20out=20such=20that=0A=20=20the=20= file=20pointers=20are=20all=20in=20one=20contiguous=20array,=20followed=20= by=20a=0A=20=20second=20array=20of=20chars=20where=20each=20char=20entry=20= is=20a=20flags=20field=20that=0A=20=20corresponds=20to=20the=20file=20= pointer=20at=20the=20same=20index.=20=20Interestingly=0A=20=20there=20is=20= actually=20only=20one=20flag=20that=20is=20used:=20UF_EXCLOSE,=20so=20= it's=0A=20=20fairly=20wasteful=20to=20have=20an=20array=20of=20chars.=20=20= What=20linux=20does,=20and=0A=20=20what=20I=20have=20done=20is=20to=20= just=20use=20a=20bitfield=20for=20all=20fds=20that=20should=0A=20=20be=20= closed=20on=20exec.=20=20This=20could=20be=20further=20optimized=20by=20= doing=20some=0A=20=20pointer=20trickery=20to=20store=20the=20close=20= exec=20bit=20in=20the=20struct=20file=0A=20=20pointer=20rather=20than=20= keep=20a=20separate=20bitfield.=0A=0AIndirect=20Block=20Table:=0A=0A=20=20= Since=20there=20are=20three=20consumers=20of=20the=20indirect=20block=20= table,=20I=0A=20=20generalized=20it=20so=20all=20of=20the=20consumers=20= rely=20on=20the=20same=20code.=20=20This=0A=20=20could=20eventually=20be=20= refactored=20into=20a=20kernel=20library=20since=20it=20could=0A=20=20be=20= generally=20useful=20in=20other=20areas.=20=20The=20table=20uses=20a=20= single=20level=0A=20=20of=20indirection,=20so=20the=20base=20table=20can=20= still=20grow=20beyond=20the=204K.=20=20As=0A=20=20a=20process=20uses=20= more=20fds,=20the=20need=20to=20continue=20growing=20the=20base=20table=0A= =20=20should=20be=20fairly=20limited,=20and=20a=20single=20realloc=20= will=20significantly=0A=20=20increase=20the=20number=20of=20fds=20the=20= process=20can=20allocate.=0A=0AAccessing=20the=20new=20data=20= structures:=0A=0A=20=20All=20consumers=20of=20the=20file=20pointer=20= array=20and=20bitfield=20will=20now=20have=0A=20=20to=20use=20accessors=20= rather=20than=20using=20direct=20access.=0A=0AKnown=20Issues:=0A=0A=20=20= The=20new=20fdp=20locking=20in=20fdcopy=20needs=20to=20be=20reworked.=0A=0A= git-svn-id:=20https://svn/repo/onefs/branches/BR_HAB_PROTO@144587=20= b72e2a10-2d34-0410-9a71-d3beadf02b57=0A---=0A=20= src/sys/compat/linux/linux_stats.c=20=20|=20=20=20=202=20+-=0A=20= src/sys/compat/svr4/svr4_filio.c=20=20=20=20|=20=20=20=204=20+-=0A=20= src/sys/fs/fdescfs/fdesc_vfsops.c=20=20=20|=20=20=20=202=20+-=0A=20= src/sys/fs/fdescfs/fdesc_vnops.c=20=20=20=20|=20=20=20=202=20+-=0A=20= src/sys/fs/nfsserver/nfs_nfsdport.c=20|=20=20=20=202=20+-=0A=20= src/sys/ifs/bam/bam_pctl.c=20=20=20=20=20=20=20=20=20=20|=20=20=20=202=20= +-=0A=20src/sys/ifs/bam/bam_vfsops.c=20=20=20=20=20=20=20=20|=20=20=20=20= 4=20+-=0A=20src/sys/ifs/lock/lock_advisory.c=20=20=20=20|=20=20=20=202=20= +-=0A=20src/sys/ifs/rbm/rbm_user_ipc.c=20=20=20=20=20=20|=20=20=20=206=20= +-=0A=20src/sys/kern/kern_descrip.c=20=20=20=20=20=20=20=20=20|=20=20805=20= +++++++++++++++++++++++++++--------=0A=20src/sys/kern/kern_lsof.c=20=20=20= =20=20=20=20=20=20=20=20=20|=20=20=20=202=20+-=0A=20= src/sys/kern/sys_generic.c=20=20=20=20=20=20=20=20=20=20|=20=20=20=206=20= +-=0A=20src/sys/kern/uipc_mqueue.c=20=20=20=20=20=20=20=20=20=20|=20=20=20= =204=20+-=0A=20src/sys/kern/uipc_sem.c=20=20=20=20=20=20=20=20=20=20=20=20= =20|=20=20=20=204=20+-=0A=20src/sys/kern/uipc_usrreq.c=20=20=20=20=20=20=20= =20=20=20|=20=20=20=209=20+-=0A=20src/sys/kern/vfs_syscalls.c=20=20=20=20= =20=20=20=20=20|=20=20=20=202=20+-=0A=20src/sys/netsmb/smb_dev.c=20=20=20= =20=20=20=20=20=20=20=20=20|=20=20=20=202=20+-=0A=20= src/sys/sys/filedesc.h=20=20=20=20=20=20=20=20=20=20=20=20=20=20|=20=20=20= 26=20+-=0A=2018=20files=20changed,=20670=20insertions(+),=20216=20= deletions(-)=0A=0Adiff=20--git=20a/src/sys/compat/linux/linux_stats.c=20= b/src/sys/compat/linux/linux_stats.c=0Aindex=20374ce39..905db20=20100644=0A= ---=20a/src/sys/compat/linux/linux_stats.c=0A+++=20= b/src/sys/compat/linux/linux_stats.c=0A@@=20-129,7=20+129,7=20@@=20= translate_path_major_minor(struct=20thread=20*td,=20char=20*path,=20= struct=20stat=20*buf)=0A=20=09fd=20=3D=20td->td_retval[0];=0A=20=09= td->td_retval[0]=20=3D=20temp;=0A=20=09translate_fd_major_minor(td,=20= fd,=20buf);=0A-=09fdclose(fdp,=20fdp->fd_ofiles[fd],=20fd,=20td);=0A+=09= fdclose(fdp,=20ftable_get(fdp,=20fd),=20fd,=20td);=0A=20}=0A=20=0A=20= static=20int=0Adiff=20--git=20a/src/sys/compat/svr4/svr4_filio.c=20= b/src/sys/compat/svr4/svr4_filio.c=0Aindex=20701bf15..82364ca=20100644=0A= ---=20a/src/sys/compat/svr4/svr4_filio.c=0A+++=20= b/src/sys/compat/svr4/svr4_filio.c=0A@@=20-212,13=20+212,13=20@@=20= svr4_fil_ioctl(fp,=20td,=20retval,=20fd,=20cmd,=20data)=0A=20=09switch=20= (cmd)=20{=0A=20=09case=20SVR4_FIOCLEX:=0A=20=09=09FILEDESC_XLOCK(fdp);=0A= -=09=09fdp->fd_ofileflags[fd]=20|=3D=20UF_EXCLOSE;=0A+=09=09= ftable_set_cloexec(fdp,=20fd,=201);=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20= =09=09return=200;=0A=20=0A=20=09case=20SVR4_FIONCLEX:=0A=20=09=09= FILEDESC_XLOCK(fdp);=0A-=09=09fdp->fd_ofileflags[fd]=20&=3D=20= ~UF_EXCLOSE;=0A+=09=09ftable_set_cloexec(fdp,=20fd,=200);=0A=20=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09return=200;=0A=20=0Adiff=20--git=20= a/src/sys/fs/fdescfs/fdesc_vfsops.c=20= b/src/sys/fs/fdescfs/fdesc_vfsops.c=0Aindex=2016fa4cf..fb2e45e=20100644=0A= ---=20a/src/sys/fs/fdescfs/fdesc_vfsops.c=0A+++=20= b/src/sys/fs/fdescfs/fdesc_vfsops.c=0A@@=20-203,7=20+203,7=20@@=20= fdesc_statfs(mp,=20sbp,=20td)=0A=20=09last=20=3D=20min(fdp->fd_nfiles,=20= lim);=0A=20=09freefd=20=3D=200;=0A=20=09for=20(i=20=3D=20= fdp->fd_freefile;=20i=20<=20last;=20i++)=0A-=09=09if=20= (fdp->fd_ofiles[i]=20=3D=3D=20NULL)=0A+=09=09if=20(ftable_get(fdp,=20i)=20= =3D=3D=20NULL)=0A=20=09=09=09freefd++;=0A=20=0A=20=09/*=0Adiff=20--git=20= a/src/sys/fs/fdescfs/fdesc_vnops.c=20b/src/sys/fs/fdescfs/fdesc_vnops.c=0A= index=20f39c3a7..0ea6607=20100644=0A---=20= a/src/sys/fs/fdescfs/fdesc_vnops.c=0A+++=20= b/src/sys/fs/fdescfs/fdesc_vnops.c=0A@@=20-581,7=20+581,7=20@@=20= fdesc_readdir(ap)=0A=20=09=09=09dp->d_type=20=3D=20DT_DIR;=0A=20=09=09=09= break;=0A=20=09=09default:=0A-=09=09=09if=20(fdp->fd_ofiles[fcnt]=20=3D=3D= =20NULL)=20{=0A+=09=09=09if=20(ftable_get(fdp,=20fcnt)=20=3D=3D=20NULL)=20= {=0A=20=09=09=09=09FILEDESC_SUNLOCK(fdp);=0A=20=09=09=09=09goto=20done;=0A= =20=09=09=09}=0Adiff=20--git=20a/src/sys/fs/nfsserver/nfs_nfsdport.c=20= b/src/sys/fs/nfsserver/nfs_nfsdport.c=0Aindex=20232e465..94fd81c=20= 100644=0A---=20a/src/sys/fs/nfsserver/nfs_nfsdport.c=0A+++=20= b/src/sys/fs/nfsserver/nfs_nfsdport.c=0A@@=20-3103,7=20+3103,7=20@@=20= fp_getfvp(struct=20thread=20*p,=20int=20fd,=20struct=20file=20**fpp,=20= struct=20vnode=20**vpp)=0A=20=0A=20=09fdp=20=3D=20p->td_proc->p_fd;=0A=20= =09if=20(fd=20>=3D=20fdp->fd_nfiles=20||=0A-=09=20=20=20=20(fp=20=3D=20= fdp->fd_ofiles[fd])=20=3D=3D=20NULL)=0A+=09=20=20=20=20(fp=20=3D=20= ftable_get(fdp,=20fd))=20=3D=3D=20NULL)=0A=20=09=09return=20(EBADF);=0A=20= =09*fpp=20=3D=20fp;=0A=20=09return=20(0);=0Adiff=20--git=20= a/src/sys/ifs/bam/bam_pctl.c=20b/src/sys/ifs/bam/bam_pctl.c=0Aindex=20= 6ce998e..b14472f=20100644=0A---=20a/src/sys/ifs/bam/bam_pctl.c=0A+++=20= b/src/sys/ifs/bam/bam_pctl.c=0A@@=20-2619,7=20+2619,7=20@@=20= pctl2_lin_open(struct=20thread=20*td,=20const=20struct=20gmp_info=20*gi,=0A= =20=09FILEDESC_SLOCK(fdp);=0A=20=09FILE_LOCK(fp);=0A=20=09if=20= (fp->f_count=20=3D=3D=201)=20{=0A-=09=09ASSERT(fdp->fd_ofiles[indx]=20!=3D= =20fp,=0A+=09=09ASSERT(ftable_get(fdp,=20indx)=20!=3D=20fp,=0A=20=09=09=20= =20=20=20"Open=20file=20descriptor=20lost=20all=20refs");=0A=20=09=09= FILE_UNLOCK(fp);=0A=20=09=09FILEDESC_SUNLOCK(fdp);=0Adiff=20--git=20= a/src/sys/ifs/bam/bam_vfsops.c=20b/src/sys/ifs/bam/bam_vfsops.c=0Aindex=20= b5efb2f..b72e184=20100644=0A---=20a/src/sys/ifs/bam/bam_vfsops.c=0A+++=20= b/src/sys/ifs/bam/bam_vfsops.c=0A@@=20-2019,7=20+2019,7=20@@=20= bam_busy_vnodes_sysctl(SYSCTL_HANDLER_ARGS)=0A=20=09=09=09for=20(i=20=3D=20= 0;=0A=20=09=09=09=20=20=20=20fdp->fd_refcnt=20>=200=20&&=20i=20<=20= fdp->fd_nfiles;=0A=20=09=09=09=20=20=20=20i++)=20{=0A-=09=09=09=09fp=20=3D= =20fdp->fd_ofiles[i];=0A+=09=09=09=09fp=20=3D=20ftable_get(fdp,=20i);=0A=20= =09=09=09=09if=20(fp=20=3D=3D=20NULL=20||=0A=20=09=09=09=09=20=20=20=20= fp->f_type=20!=3D=20DTYPE_VNODE=20||=0A=20=09=09=09=09=20=20=20=20= fp->f_vnode=20!=3D=20vp)=0A@@=20-2056,7=20+2056,7=20@@=20= bam_busy_vnodes_sysctl(SYSCTL_HANDLER_ARGS)=0A=20=09=09for=20(i=20=3D=20= 0;=0A=20=09=09=20=20=20=20fdp->fd_refcnt=20>=200=20&&=20i=20<=20= fdp->fd_nfiles;=0A=20=09=09=20=20=20=20i++)=20{=0A-=09=09=09fp=20=3D=20= fdp->fd_ofiles[i];=0A+=09=09=09fp=20=3D=20ftable_get(fdp,=20i);=0A=20=09=09= =09if=20(fp=20=3D=3D=20NULL=20||=0A=20=09=09=09=20=20=20=20fp->f_type=20= !=3D=20DTYPE_ISIEVENT)=0A=20=09=09=09=09continue;=0Adiff=20--git=20= a/src/sys/ifs/lock/lock_advisory.c=20b/src/sys/ifs/lock/lock_advisory.c=0A= index=20c40db3c..f34e2d7=20100644=0A---=20= a/src/sys/ifs/lock/lock_advisory.c=0A+++=20= b/src/sys/ifs/lock/lock_advisory.c=0A@@=20-257,7=20+257,7=20@@=20= adv_lock_owner_fmt_conv(struct=20fmt=20*fmt,=20const=20struct=20= fmt_conv_args=20*args,=0A=20=09=09=09continue;=0A=20=09=09if=20= (sx_try_slock(FILEDESC_LOCK(fdp)))=20{=0A=20=09=09=09for=20(n=20=3D=200;=20= n=20<=20fdp->fd_nfiles;=20n++)=20{=0A-=09=09=09=09if=20(fp=20=3D=3D=20= fdp->fd_ofiles[n])=20{=0A+=09=09=09=09if=20(fp=20=3D=3D=20= ftable_get(fdp,=20n))=20{=0A=20=09=09=09=09=09/*=20Match!=20Print=20it!=20= */=0A=20=09=09=09=09=09fmt_print(fmt,=20"\nflock=20%d:=20pid=20%d,=20= %s",=0A=20=09=09=09=09=09=20=20=20=20matches,=20p->p_pid,=20p->p_comm);=0A= diff=20--git=20a/src/sys/ifs/rbm/rbm_user_ipc.c=20= b/src/sys/ifs/rbm/rbm_user_ipc.c=0Aindex=2022b0630..06ce2da=20100644=0A= ---=20a/src/sys/ifs/rbm/rbm_user_ipc.c=0A+++=20= b/src/sys/ifs/rbm/rbm_user_ipc.c=0A@@=20-399,7=20+399,7=20@@=20= ifs_rbmuipc_open(struct=20thread=20*td,=20struct=20= ifs_rbmuipc_open_sysargs=20*uap)=0A=20=09fd=20=3D=20td->td_retval[0];=0A=20= =09FILEDESC_SLOCK(fdp);=0A=20=09if=20((unsigned=20int)fd=20>=20= fdp->fd_nfiles=20||=0A-=09=20=20=20=20(fp=20=3D=20fdp->fd_ofiles[fd])=20= =3D=3D=20NULL)=20{=0A+=09=20=20=20=20(fp=20=3D=20ftable_get(fdp,=20fd))=20= =3D=3D=20NULL)=20{=0A=20=09=09FILEDESC_SUNLOCK(fdp);=0A=20=09=09error=20= =3D=20EINVAL;=0A=20=09=09fp=20=3D=20NULL;=0A@@=20-424,8=20+424,8=20@@=20= out:=0A=20=09=09/*=20Close=20user's=20fp.=20*/=0A=20=09=09if=20(fp)=20{=0A= =20=09=09=09FILEDESC_XLOCK(fdp);=0A-=09=09=09if=20(fdp->fd_ofiles[fd]=20= =3D=3D=20fp)=20{=0A-=09=09=09=09fdp->fd_ofiles[fd]=20=3D=20NULL;=0A+=09=09= =09if=20(ftable_get(fdp,=20fd)=20=3D=3D=20fp)=20{=0A+=09=09=09=09= ftable_set(fdp,=20fd,=20NULL);=0A=20=09=09=09=09FILEDESC_XUNLOCK(fdp);=0A= =20=09=09=09=09fdrop(fp,=20td);=0A=20=09=09=09}=20else=0Adiff=20--git=20= a/src/sys/kern/kern_descrip.c=20b/src/sys/kern/kern_descrip.c=0Aindex=20= 6ce0356..1a34987=20100644=0A---=20a/src/sys/kern/kern_descrip.c=0A+++=20= b/src/sys/kern/kern_descrip.c=0A@@=20-112,9=20+112,8=20@@=20enum=20= dup_type=20{=20DUP_VARIABLE,=20DUP_FIXED=20};=0A=20=0A=20static=20int=20= do_dup(struct=20thread=20*td,=20enum=20dup_type=20type,=20int=20old,=20= int=20new,=0A=20=20=20=20=20register_t=20*retval);=0A-static=20int=09= fd_first_free(struct=20filedesc=20*,=20int,=20int);=0A+static=20int=09= fd_first_free(struct=20filedesc=20*,=20int);=0A=20static=20int=09= fd_last_used(struct=20filedesc=20*,=20int,=20int);=0A-static=20void=09= fdgrowtable(struct=20filedesc=20*,=20int);=0A=20static=20int=09= fdrop_locked(struct=20file=20*fp,=20struct=20thread=20*td);=0A=20static=20= void=09fdunused(struct=20filedesc=20*fdp,=20int=20fd);=0A=20static=20= void=09fdused(struct=20filedesc=20*fdp,=20int=20fd);=0A@@=20-134,10=20= +133,406=20@@=20static=20void=09fdused(struct=20filedesc=20*fdp,=20int=20= fd);=0A=20#define=20NDBIT(x)=09((NDSLOTTYPE)1=20<<=20((x)=20%=20= NDENTRIES))=0A=20#define=09NDSLOTS(x)=09(((x)=20+=20NDENTRIES=20-=201)=20= /=20NDENTRIES)=0A=20=0A-/*=0A-=20*=20Storage=20required=20per=20open=20= file=20descriptor.=0A+#define=20IDB_BLOCK_SIZE=20PAGE_SIZE=0A+#define=20= IDB_ENT_SIZE=20sizeof(uintptr_t)=0A+#define=20IDB_ENTS_PER_BLOCK=20= (IDB_BLOCK_SIZE/IDB_ENT_SIZE)=0A+=0A+/*=20May=20be=20a=20perf=20impact=20= on=2032-bit=20kernels.=20*/=0A+CTASSERT(NDSLOTSIZE=20=3D=3D=20= IDB_ENT_SIZE);=0A+=0A+/**=0A+=20*=20Return=20the=20index=20into=20the=20= indirect=20table=20given=20an=20entry.=0A+=20*/=0A+static=20inline=20int=0A= +idb_block_index(int=20ent)=0A+{=0A+=0A+=09return=20(ent=20/=20= IDB_ENTS_PER_BLOCK);=0A+}=0A+=0A+/**=0A+=20*=20Return=20offset=20into=20= an=20indirect=20block=20given=20an=20entry.=0A+=20*/=0A+static=20inline=20= int=0A+idb_block_off(int=20ent)=0A+{=0A+=0A+=09return=20(ent=20%=20= IDB_ENTS_PER_BLOCK);=0A+}=0A+=0A+/**=0A+=20*=20Return=201=20if=20the=20= indirect=20block=20table=20is=20flat,=20else=200.=0A+=20*/=0A+static=20= inline=20int=0A+idb_is_flat(struct=20idb_table=20*idb)=0A+{=0A+=0A+=09= return=20(idb->idb_nents=20<=3D=20IDB_ENTS_PER_BLOCK);=0A+}=0A+=0A+/**=0A= +=20*=20Return=20a=20pointer=20to=20the=20block.=20=20If=20the=20block=20= is=20sparse=20or=20ent=20is=20outside=0A+=20*=20the=20current=20size=20= of=20the=20table,=20return=20NULL.=0A+=20*/=0A+static=20inline=20void=20= *=0A+idb_block(struct=20idb_table=20*idb,=20int=20ent)=0A+{=0A+=0A+=09= return=20(ent=20>=3D=20idb->idb_nents=20?=20NULL=20:=0A+=09=20=20=20=20= idb->idb_tbl.indirect[idb_block_index(ent)]);=0A+}=0A+=0A+/**=0A+=20*=20= Initialize=20a=20new=20indirect=20table.=20=20The=20caller=20is=20= responsible=20for=20allocating=0A+=20*=20the=20idb=20struct,=20and=20= must=20provide=20an=20initial=20non-null=20flat=20table.=0A+=20*=0A+=20*=20= @param=20idb=09=09=20Indirect=20table=20to=20initialize.=0A+=20*=20= @param=20idb_flat=09=20Initial=20non-null=20table.=0A+=20*=20@param=20= idb_nents=09=20Number=20of=20entries=20in=20the=20initial=20flat=20= table.=0A+=20*/=0A+static=20void=0A+idb_init(struct=20idb_table=20*idb,=20= void=20*idb_flat,=20int=20idb_nents)=0A+{=0A+=0A+=09KASSERT(idb=20!=3D=20= NULL,=20("idb=20table=20must=20be=20allocated=20by=20caller"));=0A+=09= KASSERT(idb_flat=20!=3D=20NULL,=0A+=09=20=20=20=20("idb=20flat=20table=20= must=20be=20allocated=20by=20caller"));=0A+=0A+=09idb->idb_tbl.flat=20=3D=20= idb_flat;=0A+=09idb->idb_nents=20=3D=20idb_nents;=0A+=09= idb->idb_orig_nents=20=3D=20idb_nents;=0A+}=0A+=0A+/**=0A+=20*=20Free=20= all=20blocks=20associated=20with=20the=20indirect=20table.=0A+=20*/=0A= +static=20void=0A+idb_free(struct=20idb_table=20*idb)=0A+{=0A+=09int=20= indx;=0A+=09void=20*block;=0A+=0A+=09if=20(idb_is_flat(idb))=20{=0A+=09=09= if=20(idb->idb_nents=20>=20idb->idb_orig_nents)=0A+=09=09=09= free(idb->idb_tbl.flat,=20M_FILEDESC);=0A+=09=09return;=0A+=09}=0A+=0A+=09= /*=20Free=20indirect=20leaves.=20*/=0A+=09for=20(indx=20=3D=20= idb_block_index(0);=0A+=09=20=20=20=20=20indx=20<=20= idb_block_index(idb->idb_nents);=0A+=09=20=20=20=20=20indx++)=20{=0A+=09=09= block=20=3D=20idb->idb_tbl.indirect[indx];=0A+=09=09if=20(block=20!=3D=20= NULL)=0A+=09=09=09free(block,=20M_FILEDESC);=0A+=09}=0A+=0A+=09/*=20Free=20= indirect=20root.=20*/=0A+=09free(idb->idb_tbl.indirect,=20M_FILEDESC);=0A= +}=0A+=0A+/**=0A+=20*=20Return=20a=20pointer=20into=20the=20table/block=20= given=20an=20index.=0A+=20*/=0A+static=20void=20*=0A= +idb_get_entry(struct=20idb_table=20*idb,=20int=20ent)=0A+{=0A+=09void=20= *block;=0A+=0A+=09if=20(ent=20>=20idb->idb_nents)=0A+=09=09return=20= (NULL);=0A+=0A+=09if=20(idb_is_flat(idb))=0A+=09=09return=20= (((caddr_t)idb->idb_tbl.flat)=20+=20(ent=20*=20IDB_ENT_SIZE));=0A+=0A+=09= /*=20Indirect=20block.=20=20Return=20NULL=20for=20sparse=20blocks.=20*/=0A= +=09block=20=3D=20idb_block(idb,=20ent);=0A+=09if=20(block=20=3D=3D=20= NULL)=0A+=09=09return=20(NULL);=0A+=0A+=09return=20(((caddr_t)block)=20+=20= (idb_block_off(ent)=20*=20IDB_ENT_SIZE));=0A+}=0A+=0A+/**=0A+=20*=20If=20= the=20current=20table=20size=20doesn't=20accomodate=20the=20new=20number=20= of=20entries,=0A+=20*=20grow=20it=20to=20fit=20new_nents.=20=20Mult=20is=20= a=20multiplying=20factor=20used=20to=20check=20the=0A+=20*=20number=20of=20= entries=20in=20the=20table=20against=20new_nents=20which=20allows=20= growing=20the=0A+=20*=20flat=20table=20or=20the=20indirect=20table.=20=20= The=20current=20number=20of=20entries=20in=20the=0A+=20*=20table=20must=20= be=20a=20multiple=20of=20mult.=0A+=20*=0A+=20*=20@param=20idb=09=09Table=20= to=20grow.=0A+=20*=20@param=20new_nents=09Number=20of=20entries=20to=20= grow=20the=20table=20to.=0A+=20*=20@param=20mult=09=09Multiplier=20for=20= new_nents.=0A+=20*=20@param=20sx=09=09Exclusive=20lock=20that=20may=20be=20= dropped/reqacquired.=0A=20=20*/=0A-#define=20OFILESIZE=20(sizeof(struct=20= file=20*)=20+=20sizeof(char))=0A+static=20void=0A+idb_grow_table(struct=20= idb_table=20*idb,=20int=20new_nents,=20int=20mult,=20struct=20sx=20*sx)=0A= +{=0A+=09int=20old_nents;=0A+=09void=20*ntable;=0A+=0A+=09= KASSERT(idb->idb_nents=20%=20mult=20=3D=3D=200,=0A+=09=20=20=20=20("%d=20= is=20not=20a=20multiple=20of=20%d",=20idb->idb_nents,=20mult));=0A+=0A+=09= old_nents=20=3D=20idb->idb_nents=20/=20mult;=0A+=0A+=09/*=20Do=20nothing=20= if=20the=20table=20is=20already=20big=20enough.=20*/=0A+=09if=20= (old_nents=20>=20new_nents)=0A+=09=09return;=0A+=0A+=09sx_xunlock(sx);=0A= +=09ntable=20=3D=20malloc(new_nents=20*=20IDB_ENT_SIZE,=20M_FILEDESC,=0A= +=09=20=20=20=20M_ZERO=20|=20M_WAITOK);=0A+=09sx_xlock(sx);=0A+=0A+=09/*=20= Done=20if=20table=20grew=20when=20the=20lock=20was=20dropped.=20*/=0A+=09= if=20(idb->idb_nents=20/=20mult=20>=20new_nents)=20{=0A+=09=09= free(ntable,=20M_FILEDESC);=0A+=09=09return;=0A+=09}=0A+=0A+=09/*=20Copy=20= the=20data=20to=20the=20new=20table=20and=20fix=20up=20the=20pointers.=20= */=0A+=09bcopy(idb->idb_tbl.flat,=20ntable,=20old_nents=20*=20= IDB_ENT_SIZE);=0A+=09if=20(idb->idb_nents=20>=20idb->idb_orig_nents)=0A+=09= =09free(idb->idb_tbl.flat,=20M_FILEDESC);=0A+=09idb->idb_tbl.flat=20=3D=20= ntable;=0A+=09idb->idb_nents=20=3D=20new_nents=20*=20mult;=0A+}=0A+=0A= +/**=0A+=20*=20Transition=20a=20flat=20table=20to=20an=20indirect=20= block=20table.=0A+=20*=0A+=20*=20@param=20idb=09Table=20to=20transition.=0A= +=20*=20@param=20sx=09Exclusive=20lock=20that=20may=20be=20= dropped/reqacquired.=0A+=20*/=0A+static=20void=0A= +idb_transition_to_indirect(struct=20idb_table=20*idb,=20struct=20sx=20= *sx)=0A+{=0A+=09void=20**ntable=20=3D=20NULL;=0A+=0A+=09= KASSERT(idb->idb_nents=20>=3D=20IDB_ENTS_PER_BLOCK,=0A+=09=20=20=20=20= ("Insufficient=20size=20for=20indirect=20transition:=20%d",=20= idb->idb_nents));=0A+=0A+=09/*=20Done=20if=20the=20table=20has=20already=20= transitioned.=20*/=0A+=09if=20(idb->idb_nents=20>=20IDB_ENTS_PER_BLOCK)=20= {=0A+=09=09return;=0A+=09}=0A+=0A+=09sx_xunlock(sx);=0A+=09ntable=20=3D=20= malloc(IDB_BLOCK_SIZE,=20M_FILEDESC,=0A+=09=09=20=20=20=20M_ZERO=20|=20= M_WAITOK);=0A+=09sx_xlock(sx);=0A+=0A+=09/*=20Done=20if=20indirect=20= transition=20done=20when=20the=20lock=20was=20dropped.=20*/=0A+=09if=20= (idb->idb_nents=20>=20IDB_ENTS_PER_BLOCK)=20{=0A+=09=09free(ntable,=20= M_FILEDESC);=0A+=09=09return;=0A+=09}=0A+=0A+=09/*=20Make=20indirect=20= transition.=20*/=0A+=09ntable[0]=20=3D=20idb->idb_tbl.flat;=0A+=09= idb->idb_tbl.indirect=20=3D=20ntable;=0A+=09idb->idb_nents=20=3D=20= IDB_ENTS_PER_BLOCK=20*=20IDB_ENTS_PER_BLOCK;=0A+}=0A+=0A+/**=0A+=20*=20= Allocates=20an=20indirect=20block=20in=20the=20table=20if=20one=20= doesn't=20already=20exist=20for=0A+=20*=20new_ent.=0A+=20*=0A+=20*=20= @param=20idb=09=09Table=20to=20ensure=20new_ent=20has=20an=20indirect=20= block=20in.=0A+=20*=20@param=20new_ent=09New=20entry=20index=20to=20= create=20indirect=20block=20for.=0A+=20*=20@param=20sx=09=09Exclusive=20= lock=20that=20may=20be=20dropped/reqacquired.=0A+=20*/=0A+static=20void=0A= +idb_ensure_indirect_block(struct=20idb_table=20*idb,=20int=20new_ent,=20= struct=20sx=20*sx)=0A+{=0A+=09void=20*nblock=20=3D=20NULL;=0A+=0A+=09= KASSERT(new_ent=20<=20idb->idb_nents,=0A+=09=20=20=20=20("Table=20too=20= small=20(%d)=20for=20indirect=20block=20at=20index=20%d",=0A+=09=09= idb->idb_nents,=20new_ent));=0A+=0A+=09/*=20Done=20if=20the=20block=20is=20= already=20allocated.=20*/=0A+=09if=20(idb_block(idb,=20new_ent)=20!=3D=20= NULL)=0A+=09=09return;=0A+=0A+=09sx_xunlock(sx);=0A+=09nblock=20=3D=20= malloc(IDB_BLOCK_SIZE,=20M_FILEDESC,=20M_ZERO=20|=20M_WAITOK);=0A+=09= sx_xlock(sx);=0A+=0A+=09/*=20Done=20if=20block=20was=20allocated=20when=20= the=20lock=20was=20dropped.=20*/=0A+=09if=20(idb_block(idb,=20new_ent)=20= !=3D=20NULL)=20{=0A+=09=09free(nblock,=20M_FILEDESC);=0A+=09=09return;=0A= +=09}=0A+=0A+=09idb->idb_tbl.indirect[idb_block_index(new_ent)]=20=3D=20= nblock;=0A+}=0A+=0A+/**=0A+=20*=20idb_ensure_size()=20guarantees=20that:=0A= +=20*=20=201.=20If=20the=20table=20is=20flat,=20the=20table=20will=20be=20= made=20large=20enough=20for=20new_ent,=0A+=20*=20=20=20=20=20possibly=20= being=20transitioned=20to=20an=20indirect=20table.=0A+=20*=0A+=20*=20=20= 2.=20If=20the=20table=20is=20indirect,=20the=20indirect=20table=20is=20= large=20enough=20to=20have=20an=0A+=20*=20=20=20=20=20entry=20to=20point=20= to=20the=20indirect=20block,=20and=20the=20indirect=20block=20itself=20= is=0A+=20*=20=20=20=20=20allocated.=0A+=20*=0A+=20*=20The=20sx=20lock=20= will=20be=20released=20if=20new=20memory=20needs=20to=20be=20allocated,=20= but=20will=0A+=20*=20be=20reacquired=20before=20returning.=0A+=20*=0A+=20= *=20@param=20idb=09=09Table=20to=20ensure=20new_ent=20fits=20in.=0A+=20*=20= @param=20new_ent=09New=20entry=20index.=0A+=20*=20@param=20maxsize=09Max=20= size=20of=20the=20table=20so=20excess=20memory=20isn't=20used.=0A+=20*=20= @param=20sx=09=09Exclusive=20lock=20that=20may=20be=20= dropped/reqacquired.=0A+=20*/=0A+static=20void=0A+idb_ensure_size(struct=20= idb_table=20*idb,=20int=20new_ent,=20int=20maxsize,=20struct=20sx=20*sx)=0A= +{=0A+=09KASSERT(idb->idb_nents=20>=200,=20("zero-length=20idb=20= table"));=0A+=09KASSERT(new_ent=20<=20maxsize,=0A+=09=20=20=20=20= ("new_ent(%d)=20>=3D=20maxsize(%d)",=20new_ent,=20maxsize));=0A+=0A+=09= sx_assert(sx,=20SX_XLOCKED=20|=20SX_NOTRECURSED);=0A+=0A+=09/*=20Grow=20= table=202x=20while=20it=20is=20flat.=20*/=0A+=09if=20(idb_is_flat(idb)=20= &&=20new_ent=20<=20IDB_ENTS_PER_BLOCK)=20{=0A+=09=09if=20(new_ent=20>=3D=20= idb->idb_nents)=20{=0A+=09=09=09KASSERT(new_ent=20>=200,=20("Negative=20= new_ent=20%d",=20new_ent));=0A+=09=09=09/*=20Round=20up=20to=20power=20= of=202=20to=20appease=20the=20allocator.=20*/=0A+=09=09=09= idb_grow_table(idb,=20min(min(1=20<<=20(fls(new_ent)),=0A+=09=09=09=09=20= =20=20=20IDB_ENTS_PER_BLOCK),=20maxsize),=201,=20sx);=0A+=09=09}=0A+=09=09= return;=0A+=09}=0A+=0A+=09/*=20Transition=20flat=20table=20to=20= indirect.=20*/=0A+=09if=20(idb_is_flat(idb)=20&&=20new_ent=20>=3D=20= IDB_ENTS_PER_BLOCK)=20{=0A+=09=09idb_grow_table(idb,=20= IDB_ENTS_PER_BLOCK,=201,=20sx);=0A+=09=09idb_transition_to_indirect(idb,=20= sx);=0A+=09}=0A+=0A+=09/*=20Grow=20size=20of=20indirect=20table.=20*/=0A= +=09if=20(new_ent=20>=3D=20idb->idb_nents)=20{=0A+=09=09int=20= grow_factor,=20new_nents;=0A+=09=09/*=20Need=20to=20grow=20the=20= indirect=20table.=20*/=0A+=09=09for=20(grow_factor=20=3D=202;;=20= grow_factor=20<<=3D=201)=20{=0A+=09=09=09if=20= (idb_block_index(idb->idb_nents)=20*=20grow_factor=20>=0A+=09=09=09=20=20= =20=20idb_block_index(new_ent))=0A+=09=09=09=09break;=0A+=09=09}=0A+=09=09= new_nents=20=3D=20min(idb_block_index(idb->idb_nents)=20*=20grow_factor,=0A= +=09=09=20=20=20=20idb_block_index(maxsize));=0A+=09=09= idb_grow_table(idb,=20new_nents,=20IDB_ENTS_PER_BLOCK,=20sx);=0A+=09}=0A= +=0A+=09/*=20Ensure=20block=20is=20allocated=20in=20sparse=20table.=20*/=0A= +=09idb_ensure_indirect_block(idb,=20new_ent,=20sx);=0A+}=0A+=0A+/**=0A+=20= *=20Get=20the=20file=20struct=20for=20an=20fd=20from=20the=20ftable.=0A+=20= *=0A+=20*=20@return=20The=20file=20struct=20for=20a=20particular=20or=20= NULL.=0A+=20*/=0A+struct=20file=20*=0A+ftable_get(struct=20filedesc=20= *fdp,=20int=20fd)=0A+{=0A+=09struct=20file=20**fpp;=0A+=0A+=09= FILEDESC_LOCK_ASSERT(fdp);=0A+=0A+=09fpp=20=3D=20= idb_get_entry(&fdp->fd_files,=20fd);=0A+=09return=20(fpp=20!=3D=20NULL=20= ?=20*fpp=20:=20NULL);=0A+}=0A+=0A+/**=0A+=20*=20Set=20an=20entry=20in=20= the=20table=20to=20point=20to=20a=20struct=20file.=20=20= ftable_ensure_fd()=0A+=20*=20must=20be=20first=20called=20to=20ensure=20= the=20underlying=20data=20structure=20can=20support=0A+=20*=20this=20= entry.=0A+=20*/=0A+void=0A+ftable_set(struct=20filedesc=20*fdp,=20int=20= fd,=20struct=20file=20*fp)=0A+{=0A+=09struct=20file=20**fpp;=0A+=0A+=09= FILEDESC_XLOCK_ASSERT(fdp);=0A+=0A+=09fpp=20=3D=20= idb_get_entry(&fdp->fd_files,=20fd);=0A+=09KASSERT(fpp=20!=3D=20NULL,=20= ("Trying=20to=20set=20unallocated=20entry"));=0A+=09*fpp=20=3D=20fp;=0A= +}=0A+=0A+/**=0A+=20*=20Get=20the=20close=20exec=20state=20of=20a=20file=20= descriptor.=0A+=20*=0A+=20*=20@return=201=20if=20close=20exec=20is=20= set,=20otherwise=200.=0A+=20*/=0A+int=0A+ftable_get_cloexec(struct=20= filedesc=20*fdp,=20int=20fd)=0A+{=0A+=09NDSLOTTYPE=20*map;=0A+=0A+=09= FILEDESC_LOCK_ASSERT(fdp);=0A+=0A+=09map=20=3D=20= idb_get_entry(&fdp->fd_cloexec,=20NDSLOT(fd));=0A+=09if=20(map=20=3D=3D=20= NULL)=0A+=09=09return=20(0);=0A+=0A+=09return=20((*map=20&=20NDBIT(fd))=20= !=3D=200);=0A+}=0A+=0A+/**=0A+=20*=20Set=20the=20close=20exec=20state=20= of=20a=20file=20descriptor.=0A+=20*=0A+=20*=20@param=20on=091:=20close=20= exec=20state=20will=20be=20turned=20on.=0A+=20*=09=090:=20close=20exec=20= state=20will=20be=20turned=20off.=0A+=20*/=0A+void=0A= +ftable_set_cloexec(struct=20filedesc=20*fdp,=20int=20fd,=20int=20on)=0A= +{=0A+=09NDSLOTTYPE=20*map;=0A+=0A+=09FILEDESC_XLOCK_ASSERT(fdp);=0A+=0A= +=09map=20=3D=20idb_get_entry(&fdp->fd_cloexec,=20NDSLOT(fd));=0A+=09= KASSERT(map=20!=3D=20NULL,=20("trying=20to=20set=20cloexec=20on=20an=20= unallocated=20file"));=0A+=0A+=09if=20(on)=0A+=09=09*map=20|=3D=20= NDBIT(fd);=0A+=09else=0A+=09=09*map=20&=3D=20~NDBIT(fd);=0A+}=0A+=0A+/**=0A= +=20*=20If=20the=20ftable=20is=20already=20large=20enough=20to=20store=20= the=20fd,=20then=20simply=20return.=0A+=20*=20Otherwise,=20allocate=20= the=20necessary=20blocks=20to=20accomodate=20the=20new=20fd.=20=20This=0A= +=20*=20allows=20for=20a=20sparse=20table.=20=20May=20malloc=20new=20= blocks=20requiring=20the=20fdp=20lock=20to=0A+=20*=20be=20dropped=20and=20= reacquired.=0A+=20*=0A+=20*=20@param=20nfd=09File=20descriptor=20to=20= possilbly=20grow=20the=20table=20to=20fit.=0A+=20*=20@param=20maxfd=09= Maximum=20fd=20so=20excess=20memory=20isn't=20used.=0A+=20*/=0A+static=20= void=0A+ftable_ensure_fd(struct=20filedesc=20*fdp,=20int=20nfd,=20int=20= maxfd)=0A+{=0A+=09FILEDESC_XLOCK_ASSERT(fdp);=0A+=0A+=09KASSERT(nfd=20<=3D= =20maxfd,=20("nfd(%d)=20>=20maxfd(%d)",=20nfd,=20maxfd));=0A+=0A+=09= idb_ensure_size(&fdp->fd_files,=20nfd,=20maxfd=20+=201,=20&fdp->fd_sx);=0A= +=09idb_ensure_size(&fdp->fd_map,=20NDSLOT(nfd),=20NDSLOT(maxfd)=20+=20= 1,=0A+=09=20=20=20=20&fdp->fd_sx);=0A+=09= idb_ensure_size(&fdp->fd_cloexec,=20NDSLOT(nfd),=20NDSLOT(maxfd)=20+=20= 1,=0A+=09=20=20=20=20&fdp->fd_sx);=0A+=0A+=09/*=0A+=09=20*=20ft_map=20= and=20ft_cloexec=20grow=20at=20the=20same=20rate,=20but=20ft_files=20= grows=20at=0A+=09=20*=20a=20different=20rate,=20so=20advertise=20table=20= size=20as=20the=20min.=0A+=09=20*/=0A+=09fdp->fd_nfiles=20=3D=20= min(fdp->fd_files.idb_nents,=0A+=09=20=20=20=20fdp->fd_map.idb_nents=20*=20= NDENTRIES);=0A+}=0A=20=0A=20/*=0A=20=20*=20Basic=20allocation=20of=20= descriptors:=0A@@=20-150,8=20+545,8=20@@=20struct=20filedesc0=20{=0A=20=09= =20*=20<=3D=20NDFILE,=20and=20are=20then=20pointed=20to=20by=20the=20= pointers=20above.=0A=20=09=20*/=0A=20=09struct=09file=20= *fd_dfiles[NDFILE];=0A-=09char=09fd_dfileflags[NDFILE];=0A=20=09= NDSLOTTYPE=20fd_dmap[NDSLOTS(NDFILE)];=0A+=09NDSLOTTYPE=20= fd_dcloexec[NDSLOTS(NDFILE)];=0A=20};=0A=20=0A=20/*=0A@@=20-166,14=20= +561,13=20@@=20void=09(*mq_fdclose)(struct=20thread=20*td,=20int=20fd,=20= struct=20file=20*fp);=0A=20/*=20A=20mutex=20to=20protect=20the=20= association=20between=20a=20proc=20and=20filedesc.=20*/=0A=20static=20= struct=20mtx=09fdesc_mtx;=0A=20=0A-/*=0A-=20*=20Find=20the=20first=20= zero=20bit=20in=20the=20given=20bitmap,=20starting=20at=20low=20and=20= not=0A-=20*=20exceeding=20size=20-=201.=0A+/**=0A+=20*=20Iterate=20a=20= flat=20array=20searching=20for=20the=20first=20zero=20bit=20in=20the=20= given=20bitmap,=0A+=20*=20starting=20at=20low=20and=20not=20exceeding=20= size=20-=201.=0A=20=20*/=0A=20static=20int=0A-fd_first_free(struct=20= filedesc=20*fdp,=20int=20low,=20int=20size)=0A= +fd_first_free_block(NDSLOTTYPE=20*map,=20int=20low,=20int=20size)=0A=20= {=0A-=09NDSLOTTYPE=20*map=20=3D=20fdp->fd_map;=0A=20=09NDSLOTTYPE=20= mask;=0A=20=09int=20off,=20maxoff;=0A=20=0A@@=20-193,14=20+587,61=20@@=20= fd_first_free(struct=20filedesc=20*fdp,=20int=20low,=20int=20size)=0A=20=09= return=20(size);=0A=20}=0A=20=0A+/**=0A+=20*=20Iterate=20the=20indirect=20= block=20table=20fd=20map=20searching=20for=20the=20first=20free=20fd,=0A= +=20*=20starting=20at=20low.=20=20Return=20the=20current=20number=20of=20= entries=20in=20the=20table=20if=20none=0A+=20*=20are=20free.=0A+=20*/=0A= +static=20int=0A+fd_first_free(struct=20filedesc=20*fdp,=20int=20low)=0A= +{=0A+=09struct=20idb_table=20*idb=20=3D=20&fdp->fd_map;=0A+=09= NDSLOTTYPE=20*block;=0A+=09int=20indx;=0A+=0A+=09= FILEDESC_LOCK_ASSERT(fdp);=0A+=0A+=09/*=20Flat=20table.=20*/=0A+=09if=20= (idb_is_flat(idb))=0A+=09=09return=20= (fd_first_free_block(idb->idb_tbl.flat,=20low,=0A+=09=09=09= idb->idb_nents=20*=20NDENTRIES));=0A+=0A+=09/*=20Loop=20through=20the=20= indirect=20blocks.=20*/=0A+=09for=20(indx=20=3D=20= idb_block_index(NDSLOT(low));=0A+=09=20=20=20=20=20indx=20<=20= idb_block_index(idb->idb_nents);=0A+=09=20=20=20=20=20indx++)=20{=0A+=09=09= int=20block_low,=20free_ent;=0A+=0A+=09=09block=20=3D=20= idb->idb_tbl.indirect[indx];=0A+=09=09if=20(block=20=3D=3D=20NULL)=20{=0A= +=09=09=09/*=20Unallocated=20block,=20so=20the=20first=20index=20is=20= fine.=20*/=0A+=09=09=09free_ent=20=3D=20indx=20*=20IDB_ENTS_PER_BLOCK=20= *=20NDENTRIES;=0A+=09=09=09return=20(max(free_ent,=20low));=0A+=09=09}=0A= +=0A+=09=09/*=20Scan=20block,=20starting=20mid-block=20if=20necessary.=20= */=0A+=09=09block_low=20=3D=20(indx=20=3D=3D=20= idb_block_index(NDSLOT(low)))=20?=0A+=09=09=20=20=20=20= idb_block_off(NDSLOT(low))=20*=20NDENTRIES=20:=200;=0A+=09=09free_ent=20= =3D=20fd_first_free_block(block,=20block_low,=0A+=09=09=20=20=20=20= IDB_ENTS_PER_BLOCK=20*=20NDENTRIES);=0A+=0A+=09=09/*=20If=20there=20was=20= a=20free=20fd,=20return=20it.=20*/=0A+=09=09if=20(free_ent=20<=20= IDB_ENTS_PER_BLOCK=20*=20NDENTRIES)=0A+=09=09=09return=20(indx=20*=20= IDB_ENTS_PER_BLOCK=20*=20NDENTRIES=20+=0A+=09=09=09=20=20=20=20= free_ent);=0A+=09}=0A+=0A+=09/*=20No=20free=20fds=20found.=20*/=0A+=09= return=20(idb->idb_nents);=0A+}=0A+=0A=20/*=0A=20=20*=20Find=20the=20= highest=20non-zero=20bit=20in=20the=20given=20bitmap,=20starting=20at=20= low=20and=0A=20=20*=20not=20exceeding=20size=20-=201.=0A=20=20*/=0A=20= static=20int=0A-fd_last_used(struct=20filedesc=20*fdp,=20int=20low,=20= int=20size)=0A+fd_last_used_block(NDSLOTTYPE=20*map,=20int=20low,=20int=20= size)=0A=20{=0A-=09NDSLOTTYPE=20*map=20=3D=20fdp->fd_map;=0A=20=09= NDSLOTTYPE=20mask;=0A=20=09int=20off,=20minoff;=0A=20=0A@@=20-220,12=20= +661,65=20@@=20fd_last_used(struct=20filedesc=20*fdp,=20int=20low,=20int=20= size)=0A=20=09return=20(low=20-=201);=0A=20}=0A=20=0A+/**=0A+=20*=20= Iterate=20the=20indirect=20block=20table=20fd=20map=20searching=20for=20= the=20highest=20non-zero=0A+=20*=20bit,=20starting=20at=20low=20and=20= not=20exceeding=20size=20-=201.=20=20Return=20low=20-1=20if=20no=20fds=0A= +=20*=20>=3D=20low=20are=20used.=0A+=20*/=0A+static=20int=0A= +fd_last_used(struct=20filedesc=20*fdp,=20int=20low,=20int=20size)=0A+{=0A= +=09struct=20idb_table=20*idb=20=3D=20&fdp->fd_map;=0A+=09NDSLOTTYPE=20= *block;=0A+=09int=20indx;=0A+=0A+=09FILEDESC_LOCK_ASSERT(fdp);=0A+=0A+=09= /*=20Flat=20table.=20*/=0A+=09if=20(idb_is_flat(idb))=0A+=09=09return=20= (fd_last_used_block(idb->idb_tbl.flat,=20low,=20size));=0A+=0A+=09/*=20= Loop=20through=20the=20indirect=20blocks=20backwards.=20*/=0A+=09for=20= (indx=20=3D=20idb_block_index(NDSLOT(size));=0A+=09=20=20=20=20=20indx=20= >=3D=20idb_block_index(NDSLOT(low));=0A+=09=20=20=20=20=20indx--)=20{=0A= +=09=09int=20block_low,=20block_high,=20used_ent;=0A+=0A+=09=09block=20=3D= =20idb->idb_tbl.indirect[indx];=0A+=09=09/*=20If=20the=20block=20is=20= sparse,=20move=20onto=20the=20next=20one.=20*/=0A+=09=09if=20(block=20=3D=3D= =20NULL)=0A+=09=09=09continue;=0A+=0A+=09=09/*=20Scan=20block,=20= starting/ending=20mid-block=20if=20necessary.=20*/=0A+=09=09block_low=20= =3D=20(indx=20=3D=3D=20idb_block_index(NDSLOT(low)))=20?=0A+=09=09=20=20=20= =20idb_block_off(NDSLOT(low))=20*=20NDENTRIES=20:=200;=0A+=09=09= block_high=20=3D=20(indx=20=3D=3D=20idb_block_index(NDSLOT(size)))=20?=0A= +=09=09=20=20=20=20idb_block_off(NDSLOT(size))=20*=20NDENTRIES=20:=0A+=09= =09=20=20=20=20IDB_ENTS_PER_BLOCK;=0A+=09=09used_ent=20=3D=20= fd_last_used_block(block,=20block_low,=20block_high);=0A+=0A+=09=09/*=20= If=20there=20was=20a=20used=20fd,=20return=20it.=20*/=0A+=09=09if=20= (used_ent=20>=3D=20block_low)=0A+=09=09=09return=20(indx=20*=20= IDB_ENTS_PER_BLOCK=20*=20NDENTRIES=20+=0A+=09=09=09=20=20=20=20= used_ent);=0A+=09}=0A+=0A+=09/*=20No=20used=20fds=20found.=20*/=0A+=09= return=20(low=20-=201);=0A+}=0A+=0A=20static=20int=0A=20fdisused(struct=20= filedesc=20*fdp,=20int=20fd)=0A=20{=0A+=09NDSLOTTYPE=20*map;=0A+=0A+=09= FILEDESC_LOCK_ASSERT(fdp);=0A=20=20=20=20=20=20=20=20=20KASSERT(fd=20>=3D=20= 0=20&&=20fd=20<=20fdp->fd_nfiles,=0A=20=20=20=20=20=20=20=20=20=20=20=20=20= ("file=20descriptor=20%d=20out=20of=20range=20(0,=20%d)",=20fd,=20= fdp->fd_nfiles));=0A-=09return=20((fdp->fd_map[NDSLOT(fd)]=20&=20= NDBIT(fd))=20!=3D=200);=0A+=0A+=09map=20=3D=20= idb_get_entry(&fdp->fd_map,=20NDSLOT(fd));=0A+=0A+=09return=20(map=20&&=20= (*map=20&=20NDBIT(fd))=20!=3D=200);=0A=20}=0A=20=0A=20/*=0A@@=20-234,16=20= +728,19=20@@=20fdisused(struct=20filedesc=20*fdp,=20int=20fd)=0A=20= static=20void=0A=20fdused(struct=20filedesc=20*fdp,=20int=20fd)=0A=20{=0A= +=09NDSLOTTYPE=20*map;=0A=20=0A=20=09FILEDESC_XLOCK_ASSERT(fdp);=0A-=09= KASSERT(!fdisused(fdp,=20fd),=0A-=09=20=20=20=20("fd=20already=20= used"));=0A+=09KASSERT(!fdisused(fdp,=20fd),=20("fd=20already=20used"));=0A= =20=0A-=09fdp->fd_map[NDSLOT(fd)]=20|=3D=20NDBIT(fd);=0A+=09map=20=3D=20= idb_get_entry(&fdp->fd_map,=20NDSLOT(fd));=0A+=09KASSERT(map=20!=3D=20= NULL,=20("Map=20block=20is=20NULL"));=0A+=0A+=09*map=20|=3D=20NDBIT(fd);=0A= =20=09if=20(fd=20>=20fdp->fd_lastfile)=0A=20=09=09fdp->fd_lastfile=20=3D=20= fd;=0A=20=09if=20(fd=20=3D=3D=20fdp->fd_freefile)=0A-=09=09= fdp->fd_freefile=20=3D=20fd_first_free(fdp,=20fd,=20fdp->fd_nfiles);=0A+=09= =09fdp->fd_freefile=20=3D=20fd_first_free(fdp,=20fd);=0A=20}=0A=20=0A=20= /*=0A@@=20-253,13=20+750,19=20@@=20static=20void=0A=20fdunused(struct=20= filedesc=20*fdp,=20int=20fd)=0A=20{=0A=20=0A+=09NDSLOTTYPE=20*map;=0A+=0A= =20=09FILEDESC_XLOCK_ASSERT(fdp);=0A=20=09KASSERT(fdisused(fdp,=20fd),=0A= =20=09=20=20=20=20("fd=20is=20already=20unused"));=0A-=09= KASSERT(fdp->fd_ofiles[fd]=20=3D=3D=20NULL,=0A+=09= KASSERT(ftable_get(fdp,=20fd)=20=3D=3D=20NULL,=0A=20=09=20=20=20=20("fd=20= is=20still=20in=20use"));=0A=20=0A-=09fdp->fd_map[NDSLOT(fd)]=20&=3D=20= ~NDBIT(fd);=0A+=09map=20=3D=20idb_get_entry(&fdp->fd_map,=20NDSLOT(fd));=0A= +=09KASSERT(map=20!=3D=20NULL,=20("Map=20block=20is=20NULL"));=0A+=0A+=09= *map=20&=3D=20~NDBIT(fd);=0A+=0A=20=09if=20(fd=20<=20fdp->fd_freefile)=0A= =20=09=09fdp->fd_freefile=20=3D=20fd;=0A=20=09if=20(fd=20=3D=3D=20= fdp->fd_lastfile)=0A@@=20-410,7=20+913,7=20@@=20fdtofp(int=20fd,=20= struct=20filedesc=20*fdp)=0A=20=0A=20=09FILEDESC_LOCK_ASSERT(fdp);=0A=20=09= if=20((unsigned)fd=20>=3D=20fdp->fd_nfiles=20||=0A-=09=20=20=20=20(fp=20= =3D=20fdp->fd_ofiles[fd])=20=3D=3D=20NULL)=0A+=09=20=20=20=20(fp=20=3D=20= ftable_get(fdp,=20fd))=20=3D=3D=20NULL)=0A=20=09=09return=20(NULL);=0A=20= =09return=20(fp);=0A=20}=0A@@=20-422,7=20+925,6=20@@=20kern_fcntl(struct=20= thread=20*td,=20int=20fd,=20int=20cmd,=20intptr_t=20arg)=0A=20=09struct=20= flock=20*flp;=0A=20=09struct=20file=20*fp;=0A=20=09struct=20proc=20*p;=0A= -=09char=20*pop;=0A=20=09struct=20vnode=20*vp;=0A=20=09u_int=20newmin;=0A= =20=09int=20error,=20flg,=20tmp;=0A@@=20-467,8=20+969,8=20@@=20= kern_fcntl(struct=20thread=20*td,=20int=20fd,=20int=20cmd,=20intptr_t=20= arg)=0A=20=09=09=09error=20=3D=20EBADF;=0A=20=09=09=09break;=0A=20=09=09= }=0A-=09=09pop=20=3D=20&fdp->fd_ofileflags[fd];=0A-=09=09= td->td_retval[0]=20=3D=20(*pop=20&=20UF_EXCLOSE)=20?=20FD_CLOEXEC=20:=20= 0;=0A+=09=09td->td_retval[0]=20=3D=20ftable_get_cloexec(fdp,=20fd)=20?=0A= +=09=09=20=20=20=20FD_CLOEXEC=20:0;=0A=20=09=09FILEDESC_SUNLOCK(fdp);=0A=20= =09=09break;=0A=20=0A@@=20-479,9=20+981,7=20@@=20kern_fcntl(struct=20= thread=20*td,=20int=20fd,=20int=20cmd,=20intptr_t=20arg)=0A=20=09=09=09= error=20=3D=20EBADF;=0A=20=09=09=09break;=0A=20=09=09}=0A-=09=09pop=20=3D=20= &fdp->fd_ofileflags[fd];=0A-=09=09*pop=20=3D=20(*pop=20&~=20UF_EXCLOSE)=20= |=0A-=09=09=20=20=20=20(arg=20&=20FD_CLOEXEC=20?=20UF_EXCLOSE=20:=200);=0A= +=09=09ftable_set_cloexec(fdp,=20fd,=20arg=20&=20FD_CLOEXEC);=0A=20=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09break;=0A=20=0A@@=20-651,7=20+1151,7=20= @@=20kern_fcntl(struct=20thread=20*td,=20int=20fd,=20int=20cmd,=20= intptr_t=20arg)=0A=20=09=09/*=20Check=20for=20race=20with=20close=20*/=0A= =20=09=09FILEDESC_SLOCK(fdp);=0A=20=09=09if=20((unsigned)=20fd=20>=3D=20= fdp->fd_nfiles=20||=0A-=09=09=20=20=20=20fp=20!=3D=20fdp->fd_ofiles[fd])=20= {=0A+=09=09=20=20=20=20fp=20!=3D=20ftable_get(fdp,=20fd))=20{=0A=20=09=09= =09FILEDESC_SUNLOCK(fdp);=0A=20=09=09=09flp->l_whence=20=3D=20SEEK_SET;=0A= =20=09=09=09flp->l_start=20=3D=200;=0A@@=20-750,7=20+1250,7=20@@=20= do_dup(struct=20thread=20*td,=20enum=20dup_type=20type,=20int=20old,=20= int=20new,=0A=20=09=09return=20(EMFILE);=0A=20=0A=20=09= FILEDESC_XLOCK(fdp);=0A-=09if=20(old=20>=3D=20fdp->fd_nfiles=20||=20= fdp->fd_ofiles[old]=20=3D=3D=20NULL)=20{=0A+=09if=20(old=20>=3D=20= fdp->fd_nfiles=20||=20ftable_get(fdp,=20old)=20=3D=3D=20NULL)=20{=0A=20=09= =09FILEDESC_XUNLOCK(fdp);=0A=20=09=09return=20(EBADF);=0A=20=09}=0A@@=20= -759,7=20+1259,7=20@@=20do_dup(struct=20thread=20*td,=20enum=20dup_type=20= type,=20int=20old,=20int=20new,=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09= =09return=20(0);=0A=20=09}=0A-=09fp=20=3D=20fdp->fd_ofiles[old];=0A+=09= fp=20=3D=20ftable_get(fdp,=20old);=0A=20=09fhold(fp);=0A=20=0A=20=09/*=0A= @@=20-770,9=20+1270,8=20@@=20do_dup(struct=20thread=20*td,=20enum=20= dup_type=20type,=20int=20old,=20int=20new,=0A=20=09=20*=20out=20for=20a=20= race.=0A=20=09=20*/=0A=20=09if=20(type=20=3D=3D=20DUP_FIXED)=20{=0A-=09=09= if=20(new=20>=3D=20fdp->fd_nfiles)=0A-=09=09=09fdgrowtable(fdp,=20new=20= +=201);=0A-=09=09if=20(fdp->fd_ofiles[new]=20=3D=3D=20NULL)=0A+=09=09= ftable_ensure_fd(fdp,=20new,=20maxfd);=0A+=09=09if=20(ftable_get(fdp,=20= new)=20=3D=3D=20NULL)=0A=20=09=09=09fdused(fdp,=20new);=0A=20=09}=20else=20= {=0A=20=09=09if=20((error=20=3D=20fdalloc(td,=20new,=20&new))=20!=3D=20= 0)=20{=0A@@=20-787,9=20+1286,9=20@@=20do_dup(struct=20thread=20*td,=20= enum=20dup_type=20type,=20int=20old,=20int=20new,=0A=20=09=20*=20bad=20= file=20descriptor.=20=20Userland=20should=20do=20its=20own=20locking=20= to=0A=20=09=20*=20avoid=20this=20case.=0A=20=09=20*/=0A-=09if=20= (fdp->fd_ofiles[old]=20!=3D=20fp)=20{=0A+=09if=20(ftable_get(fdp,=20old)=20= !=3D=20fp)=20{=0A=20=09=09/*=20we've=20allocated=20a=20descriptor=20= which=20we=20won't=20use=20*/=0A-=09=09if=20(fdp->fd_ofiles[new]=20=3D=3D=20= NULL)=0A+=09=09if=20(ftable_get(fdp,=20new)=20=3D=3D=20NULL)=0A=20=09=09=09= fdunused(fdp,=20new);=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09=09= fdrop(fp,=20td);=0A@@=20-805,7=20+1304,7=20@@=20do_dup(struct=20thread=20= *td,=20enum=20dup_type=20type,=20int=20old,=20int=20new,=0A=20=09=20*=0A=20= =09=20*=20XXX=20this=20duplicates=20parts=20of=20close().=0A=20=09=20*/=0A= -=09delfp=20=3D=20fdp->fd_ofiles[new];=0A+=09delfp=20=3D=20= ftable_get(fdp,=20new);=0A=20=09holdleaders=20=3D=200;=0A=20=09if=20= (delfp=20!=3D=20NULL)=20{=0A=20=09=09if=20(td->td_proc->p_fdtol=20!=3D=20= NULL)=20{=0A@@=20-821,8=20+1320,8=20@@=20do_dup(struct=20thread=20*td,=20= enum=20dup_type=20type,=20int=20old,=20int=20new,=0A=20=09/*=0A=20=09=20= *=20Duplicate=20the=20source=20descriptor=0A=20=09=20*/=0A-=09= fdp->fd_ofiles[new]=20=3D=20fp;=0A-=09fdp->fd_ofileflags[new]=20=3D=20= fdp->fd_ofileflags[old]=20&~=20UF_EXCLOSE;=0A+=09ftable_set(fdp,=20new,=20= fp);=0A+=09ftable_set_cloexec(fdp,=20new,=200);=0A=20=09if=20(new=20>=20= fdp->fd_lastfile)=0A=20=09=09fdp->fd_lastfile=20=3D=20new;=0A=20=09= *retval=20=3D=20new;=0A@@=20-1111,12=20+1610,12=20@@=20kern_close(td,=20= fd)=0A=20=0A=20=09FILEDESC_XLOCK(fdp);=0A=20=09if=20((unsigned)fd=20>=3D=20= fdp->fd_nfiles=20||=0A-=09=20=20=20=20(fp=20=3D=20fdp->fd_ofiles[fd])=20= =3D=3D=20NULL)=20{=0A+=09=20=20=20=20(fp=20=3D=20ftable_get(fdp,=20fd))=20= =3D=3D=20NULL)=20{=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09=09return=20= (EBADF);=0A=20=09}=0A-=09fdp->fd_ofiles[fd]=20=3D=20NULL;=0A-=09= fdp->fd_ofileflags[fd]=20=3D=200;=0A+=09ftable_set(fdp,=20fd,=20NULL);=0A= +=09ftable_set_cloexec(fdp,=20fd,=200);=0A=20=09fdunused(fdp,=20fd);=0A=20= =09if=20(td->td_proc->p_fdtol=20!=3D=20NULL)=20{=0A=20=09=09/*=0A@@=20= -1178,7=20+1677,7=20@@=20closefrom(struct=20thread=20*td,=20struct=20= closefrom_args=20*uap)=0A=20=09=09uap->lowfd=20=3D=200;=0A=20=09= FILEDESC_SLOCK(fdp);=0A=20=09for=20(fd=20=3D=20uap->lowfd;=20fd=20<=20= fdp->fd_nfiles;=20fd++)=20{=0A-=09=09if=20(fdp->fd_ofiles[fd]=20!=3D=20= NULL)=20{=0A+=09=09if=20(ftable_get(fdp,=20fd)=20!=3D=20NULL)=20{=0A=20=09= =09=09FILEDESC_SUNLOCK(fdp);=0A=20=09=09=09(void)kern_close(td,=20fd);=0A= =20=09=09=09FILEDESC_SLOCK(fdp);=0A@@=20-1806,70=20+2305,6=20@@=20out:=0A= =20}=0A=20=0A=20/*=0A-=20*=20Grow=20the=20file=20table=20to=20accomodate=20= (at=20least)=20nfd=20descriptors.=20=20This=20may=0A-=20*=20block=20and=20= drop=20the=20filedesc=20lock,=20but=20it=20will=20reacquire=20it=20= before=0A-=20*=20returning.=0A-=20*/=0A-static=20void=0A= -fdgrowtable(struct=20filedesc=20*fdp,=20int=20nfd)=0A-{=0A-=09struct=20= file=20**ntable;=0A-=09char=20*nfileflags;=0A-=09int=20nnfiles,=20= onfiles;=0A-=09NDSLOTTYPE=20*nmap;=0A-=0A-=09FILEDESC_XLOCK_ASSERT(fdp);=0A= -=0A-=09KASSERT(fdp->fd_nfiles=20>=200,=0A-=09=20=20=20=20("zero-length=20= file=20table"));=0A-=0A-=09/*=20compute=20the=20size=20of=20the=20new=20= table=20*/=0A-=09onfiles=20=3D=20fdp->fd_nfiles;=0A-=09nnfiles=20=3D=20= NDSLOTS(nfd)=20*=20NDENTRIES;=20/*=20round=20up=20*/=0A-=09if=20(nnfiles=20= <=3D=20onfiles)=0A-=09=09/*=20the=20table=20is=20already=20large=20= enough=20*/=0A-=09=09return;=0A-=0A-=09/*=20allocate=20a=20new=20table=20= and=20(if=20required)=20new=20bitmaps=20*/=0A-=09FILEDESC_XUNLOCK(fdp);=0A= -=09MALLOC(ntable,=20struct=20file=20**,=20nnfiles=20*=20OFILESIZE,=0A-=09= =20=20=20=20M_FILEDESC,=20M_ZERO=20|=20M_WAITOK);=0A-=09nfileflags=20=3D=20= (char=20*)&ntable[nnfiles];=0A-=09if=20(NDSLOTS(nnfiles)=20>=20= NDSLOTS(onfiles))=0A-=09=09MALLOC(nmap,=20NDSLOTTYPE=20*,=20= NDSLOTS(nnfiles)=20*=20NDSLOTSIZE,=0A-=09=09=20=20=20=20M_FILEDESC,=20= M_ZERO=20|=20M_WAITOK);=0A-=09else=0A-=09=09nmap=20=3D=20NULL;=0A-=09= FILEDESC_XLOCK(fdp);=0A-=0A-=09/*=0A-=09=20*=20We=20now=20have=20new=20= tables=20ready=20to=20go.=20=20Since=20we=20dropped=20the=0A-=09=20*=20= filedesc=20lock=20to=20call=20malloc(),=20watch=20out=20for=20a=20race.=0A= -=09=20*/=0A-=09onfiles=20=3D=20fdp->fd_nfiles;=0A-=09if=20(onfiles=20>=3D= =20nnfiles)=20{=0A-=09=09/*=20we=20lost=20the=20race,=20but=20that's=20= OK=20*/=0A-=09=09free(ntable,=20M_FILEDESC);=0A-=09=09if=20(nmap=20!=3D=20= NULL)=0A-=09=09=09free(nmap,=20M_FILEDESC);=0A-=09=09return;=0A-=09}=0A-=09= bcopy(fdp->fd_ofiles,=20ntable,=20onfiles=20*=20sizeof(*ntable));=0A-=09= bcopy(fdp->fd_ofileflags,=20nfileflags,=20onfiles);=0A-=09if=20(onfiles=20= >=20NDFILE)=0A-=09=09free(fdp->fd_ofiles,=20M_FILEDESC);=0A-=09= fdp->fd_ofiles=20=3D=20ntable;=0A-=09fdp->fd_ofileflags=20=3D=20= nfileflags;=0A-=09if=20(NDSLOTS(nnfiles)=20>=20NDSLOTS(onfiles))=20{=0A-=09= =09bcopy(fdp->fd_map,=20nmap,=20NDSLOTS(onfiles)=20*=20sizeof(*nmap));=0A= -=09=09if=20(NDSLOTS(onfiles)=20>=20NDSLOTS(NDFILE))=0A-=09=09=09= free(fdp->fd_map,=20M_FILEDESC);=0A-=09=09fdp->fd_map=20=3D=20nmap;=0A-=09= }=0A-=09fdp->fd_nfiles=20=3D=20nnfiles;=0A-}=0A-=0A-/*=0A=20=20*=20= Allocate=20a=20file=20descriptor=20for=20the=20process.=0A=20=20*/=0A=20= int=0A@@=20-1891,16=20+2326,18=20@@=20fdalloc(struct=20thread=20*td,=20= int=20minfd,=20int=20*result)=0A=20=09/*=0A=20=09=20*=20Search=20the=20= bitmap=20for=20a=20free=20descriptor.=20=20If=20none=20is=20found,=20try=0A= =20=09=20*=20to=20grow=20the=20file=20table.=20=20Keep=20at=20it=20until=20= we=20either=20get=20a=20file=0A-=09=20*=20descriptor=20or=20run=20into=20= process=20or=20system=20limits;=20fdgrowtable()=0A+=09=20*=20descriptor=20= or=20run=20into=20process=20or=20system=20limits;=20ftable_ensure_fd()=0A= =20=09=20*=20may=20drop=20the=20filedesc=20lock,=20so=20we're=20in=20a=20= race.=0A=20=09=20*/=0A=20=09for=20(;;)=20{=0A-=09=09fd=20=3D=20= fd_first_free(fdp,=20minfd,=20fdp->fd_nfiles);=0A+=09=09fd=20=3D=20= fd_first_free(fdp,=20minfd);=0A=20=09=09if=20(fd=20>=3D=20maxfd)=0A=20=09= =09=09return=20(EMFILE);=0A-=09=09if=20(fd=20<=20fdp->fd_nfiles)=0A+=09=09= /*=20Grow=20if=20necessary.=20*/=0A+=09=09ftable_ensure_fd(fdp,=20fd,=20= maxfd);=0A+=09=09/*=20Required=20check=20since=20ftable_ensure_fd()=20= can=20drop=20xlock.=20*/=0A+=09=09if=20(ftable_get(fdp,=20fd)=20=3D=3D=20= NULL)=0A=20=09=09=09break;=0A-=09=09fdgrowtable(fdp,=20= min(fdp->fd_nfiles=20*=202,=20maxfd));=0A=20=09}=0A=20=0A=20=09/*=0A@@=20= -1909,9=20+2346,9=20@@=20fdalloc(struct=20thread=20*td,=20int=20minfd,=20= int=20*result)=0A=20=09=20*/=0A=20=09KASSERT(!fdisused(fdp,=20fd),=0A=20=09= =20=20=20=20("fd_first_free()=20returned=20non-free=20descriptor"));=0A-=09= KASSERT(fdp->fd_ofiles[fd]=20=3D=3D=20NULL,=0A+=09= KASSERT(ftable_get(fdp,=20fd)=20=3D=3D=20NULL,=0A=20=09=20=20=20=20= ("free=20descriptor=20isn't"));=0A-=09fdp->fd_ofileflags[fd]=20=3D=200;=20= /*=20XXX=20needed?=20*/=0A+=09ftable_set_cloexec(fdp,=20fd,=200);=20/*=20= XXX=20needed?=20*/=0A=20=09fdused(fdp,=20fd);=0A=20=09*result=20=3D=20= fd;=0A=20=09return=20(0);=0A@@=20-1926,7=20+2363,7=20@@=20fdavail(struct=20= thread=20*td,=20int=20n)=0A=20{=0A=20=09struct=20proc=20*p=20=3D=20= td->td_proc;=0A=20=09struct=20filedesc=20*fdp=20=3D=20td->td_proc->p_fd;=0A= -=09struct=20file=20**fpp;=0A+=09struct=20file=20*fp;=0A=20=09int=20i,=20= lim,=20last;=0A=20=0A=20=09FILEDESC_LOCK_ASSERT(fdp);=0A@@=20-1937,9=20= +2374,10=20@@=20fdavail(struct=20thread=20*td,=20int=20n)=0A=20=09if=20= ((i=20=3D=20lim=20-=20fdp->fd_nfiles)=20>=200=20&&=20(n=20-=3D=20i)=20<=3D= =200)=0A=20=09=09return=20(1);=0A=20=09last=20=3D=20min(fdp->fd_nfiles,=20= lim);=0A-=09fpp=20=3D=20&fdp->fd_ofiles[fdp->fd_freefile];=0A-=09for=20= (i=20=3D=20last=20-=20fdp->fd_freefile;=20--i=20>=3D=200;=20fpp++)=20{=0A= -=09=09if=20(*fpp=20=3D=3D=20NULL=20&&=20--n=20<=3D=200)=0A+=09fp=20=3D=20= ftable_get(fdp,=20fdp->fd_freefile);=0A+=09for=20(i=20=3D=20last=20-=20= fdp->fd_freefile;=20--i=20>=3D=200;=0A+=09=20=20=20=20=20fp=20=3D=20= ftable_get(fdp,=20last=20-=20i))=20{=0A+=09=09if=20(fp=20=3D=3D=20NULL=20= &&=20--n=20<=3D=200)=0A=20=09=09=09return=20(1);=0A=20=09}=0A=20=09= return=20(0);=0A@@=20-2017,7=20+2455,7=20@@=20falloc(struct=20thread=20= *td,=20struct=20file=20**resultfp,=20int=20*resultfd)=0A=20=09= ifs_init_lockdata(fp);=0A=20=0A=20=09FILEDESC_XLOCK(p->p_fd);=0A-=09if=20= ((fq=20=3D=20p->p_fd->fd_ofiles[0]))=20{=0A+=09if=20((fq=20=3D=20= ftable_get(p->p_fd,=200)))=20{=0A=20=09=09LIST_INSERT_AFTER(fq,=20fp,=20= f_list);=0A=20=09}=20else=20{=0A=20=09=09LIST_INSERT_HEAD(&filehead,=20= fp,=20f_list);=0A@@=20-2030,7=20+2468,7=20@@=20falloc(struct=20thread=20= *td,=20struct=20file=20**resultfp,=20int=20*resultfd)=0A=20=09=09=09= fdrop(fp,=20td);=0A=20=09=09return=20(error);=0A=20=09}=0A-=09= p->p_fd->fd_ofiles[i]=20=3D=20fp;=0A+=09ftable_set(p->p_fd,=20i,=20fp);=0A= =20=09FILEDESC_XUNLOCK(p->p_fd);=0A=20=09if=20(resultfp)=0A=20=09=09= *resultfp=20=3D=20fp;=0A@@=20-2068,10=20+2506,13=20@@=20fdinit(struct=20= filedesc=20*fdp)=0A=20=09newfdp->fd_fd.fd_refcnt=20=3D=201;=0A=20=09= newfdp->fd_fd.fd_holdcnt=20=3D=201;=0A=20=09newfdp->fd_fd.fd_cmask=20=3D=20= CMASK;=0A-=09newfdp->fd_fd.fd_ofiles=20=3D=20newfdp->fd_dfiles;=0A-=09= newfdp->fd_fd.fd_ofileflags=20=3D=20newfdp->fd_dfileflags;=0A=20=09= newfdp->fd_fd.fd_nfiles=20=3D=20NDFILE;=0A-=09newfdp->fd_fd.fd_map=20=3D=20= newfdp->fd_dmap;=0A+=0A+=09idb_init(&newfdp->fd_fd.fd_files,=20= &newfdp->fd_dfiles,=20NDFILE);=0A+=09idb_init(&newfdp->fd_fd.fd_map,=20= &newfdp->fd_dmap,=20NDSLOTS(NDFILE));=0A+=09= idb_init(&newfdp->fd_fd.fd_cloexec,=20&newfdp->fd_dcloexec,=0A+=09=20=20=20= =20NDSLOTS(NDFILE));=0A+=0A=20=09newfdp->fd_fd.fd_lastfile=20=3D=20-1;=0A= =20=09return=20(&newfdp->fd_fd);=0A=20}=0A@@=20-2144,6=20+2585,7=20@@=20= struct=20filedesc=20*=0A=20fdcopy(struct=20filedesc=20*fdp)=0A=20{=0A=20=09= struct=20filedesc=20*newfdp;=0A+=09struct=20file=20*fp;=0A=20=09int=20i;=0A= =20=0A=20=09/*=20Certain=20daemons=20might=20not=20have=20file=20= descriptors.=20*/=0A@@=20-2152,23=20+2594,23=20@@=20fdcopy(struct=20= filedesc=20*fdp)=0A=20=0A=20=09newfdp=20=3D=20fdinit(fdp);=0A=20=09= FILEDESC_SLOCK(fdp);=0A-=09while=20(fdp->fd_lastfile=20>=3D=20= newfdp->fd_nfiles)=20{=0A-=09=09FILEDESC_SUNLOCK(fdp);=0A-=09=09= FILEDESC_XLOCK(newfdp);=0A-=09=09fdgrowtable(newfdp,=20fdp->fd_lastfile=20= +=201);=0A-=09=09FILEDESC_XUNLOCK(newfdp);=0A-=09=09FILEDESC_SLOCK(fdp);=0A= -=09}=0A=20=09/*=20copy=20everything=20except=20kqueue=20descriptors=20= */=0A=20=09newfdp->fd_freefile=20=3D=20-1;=0A=20=09for=20(i=20=3D=200;=20= i=20<=3D=20fdp->fd_lastfile;=20++i)=20{=0A-=09=09if=20(fdisused(fdp,=20= i)=20&&=0A-=09=09=20=20=20=20fdp->fd_ofiles[i]->f_type=20!=3D=20= DTYPE_KQUEUE=20&&=0A-=09=09=20=20=20=20fdp->fd_ofiles[i]->f_ops=20!=3D=20= &badfileops)=20{=0A-=09=09=09newfdp->fd_ofiles[i]=20=3D=20= fdp->fd_ofiles[i];=0A-=09=09=09newfdp->fd_ofileflags[i]=20=3D=20= fdp->fd_ofileflags[i];=0A-=09=09=09fhold(newfdp->fd_ofiles[i]);=0A+=09=09= if=20(fdisused(fdp,=20i)=20&&=20(fp=20=3D=20ftable_get(fdp,=20i))=20&&=0A= +=09=09=20=20=20=20fp->f_type=20!=3D=20DTYPE_KQUEUE=20&&=20fp->f_ops=20= !=3D=20&badfileops)=20{=0A+=09=09=09int=20cloexec=20=3D=20= ftable_get_cloexec(fdp,=20i);=0A+=09=09=09int=20maxfd=20=3D=20= fdp->fd_lastfile;=0A+=0A+=09=09=09FILEDESC_SUNLOCK(fdp);=0A+=09=09=09= FILEDESC_XLOCK(newfdp);=0A+=09=09=09ftable_ensure_fd(newfdp,=20i,=20= maxfd);=0A+=09=09=09ftable_set(newfdp,=20i,=20fp);=0A+=09=09=09= ftable_set_cloexec(newfdp,=20i,=20cloexec);=0A=20=09=09=09= newfdp->fd_lastfile=20=3D=20i;=0A+=09=09=09FILEDESC_XUNLOCK(newfdp);=0A+=09= =09=09FILEDESC_SLOCK(fdp);=0A+=09=09=09fhold(fp);=0A=20=09=09}=20else=20= {=0A=20=09=09=09if=20(newfdp->fd_freefile=20=3D=3D=20-1)=0A=20=09=09=09=09= newfdp->fd_freefile=20=3D=20i;=0A@@=20-2177,7=20+2619,7=20@@=20= fdcopy(struct=20filedesc=20*fdp)=0A=20=09FILEDESC_SUNLOCK(fdp);=0A=20=09= FILEDESC_XLOCK(newfdp);=0A=20=09for=20(i=20=3D=200;=20i=20<=3D=20= newfdp->fd_lastfile;=20++i)=0A-=09=09if=20(newfdp->fd_ofiles[i]=20!=3D=20= NULL)=0A+=09=09if=20(ftable_get(newfdp,=20i)=20!=3D=20NULL)=0A=20=09=09=09= fdused(newfdp,=20i);=0A=20=09FILEDESC_XUNLOCK(newfdp);=0A=20=09= FILEDESC_SLOCK(fdp);=0A@@=20-2195,7=20+2637,6=20@@=20void=0A=20= fdfree(struct=20thread=20*td)=0A=20{=0A=20=09struct=20filedesc=20*fdp;=0A= -=09struct=20file=20**fpp;=0A=20=09int=20i,=20locked;=0A=20=09struct=20= filedesc_to_leader=20*fdtol;=0A=20=09struct=20file=20*fp;=0A@@=20= -2216,13=20+2657,10=20@@=20fdfree(struct=20thread=20*td)=0A=20=09=09=09=20= fdtol->fdl_refcount));=0A=20=09=09if=20(fdtol->fdl_refcount=20=3D=3D=201=20= &&=0A=20=09=09=20=20=20=20(td->td_proc->p_leader->p_flag=20&=20= P_ADVLOCK)=20!=3D=200)=20{=0A-=09=09=09for=20(i=20=3D=200,=20fpp=20=3D=20= fdp->fd_ofiles;=0A-=09=09=09=20=20=20=20=20i=20<=3D=20fdp->fd_lastfile;=0A= -=09=09=09=20=20=20=20=20i++,=20fpp++)=20{=0A-=09=09=09=09if=20(*fpp=20= =3D=3D=20NULL=20||=0A-=09=09=09=09=20=20=20=20(*fpp)->f_type=20!=3D=20= DTYPE_VNODE)=0A+=09=09=09for=20(i=20=3D=200;=20i=20<=3D=20= fdp->fd_lastfile;=20i++)=20{=0A+=09=09=09=09fp=20=3D=20ftable_get(fdp,=20= i);=0A+=09=09=09=09if=20(fp=20=3D=3D=20NULL=20||=20fp->f_type=20!=3D=20= DTYPE_VNODE)=0A=20=09=09=09=09=09continue;=0A-=09=09=09=09fp=20=3D=20= *fpp;=0A=20=09=09=09=09fhold(fp);=0A=20=09=09=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09=09=09lf.l_whence=20=3D=20SEEK_SET;=0A= @@=20-2240,7=20+2678,6=20@@=20fdfree(struct=20thread=20*td)=0A=20=09=09=09= =09VFS_UNLOCK_GIANT(locked);=0A=20=09=09=09=09FILEDESC_XLOCK(fdp);=0A=20=09= =09=09=09fdrop(fp,=20td);=0A-=09=09=09=09fpp=20=3D=20fdp->fd_ofiles=20+=20= i;=0A=20=09=09=09}=0A=20=09=09}=0A=20=09retry:=0A@@=20-2281,31=20= +2718,29=20@@=20fdfree(struct=20thread=20*td)=0A=20=09}=0A=20=09= FILEDESC_XLOCK(fdp);=0A=20=09i=20=3D=20--fdp->fd_refcnt;=0A-=09= FILEDESC_XUNLOCK(fdp);=0A-=09if=20(i=20>=200)=0A+=09if=20(i=20>=200)=20{=0A= +=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09=09return;=0A+=09}=0A=20=0A-=09fpp=20= =3D=20fdp->fd_ofiles;=0A-=09for=20(i=20=3D=20fdp->fd_lastfile;=20i--=20= >=3D=200;=20fpp++)=20{=0A-=09=09if=20(*fpp)=20{=0A-=09=09=09= FILEDESC_XLOCK(fdp);=0A-=09=09=09fp=20=3D=20*fpp;=0A-=09=09=09*fpp=20=3D=20= NULL;=0A+=09for=20(i=20=3D=20fdp->fd_lastfile;=20i=20>=3D=200=20;=20i--)=20= {=0A+=09=09fp=20=3D=20ftable_get(fdp,=20i);=0A+=09=09if=20(fp)=20{=0A+=09= =09=09ftable_set(fdp,=20i,=20NULL);=0A=20=09=09=09FILEDESC_XUNLOCK(fdp);=0A= =20=09=09=09(void)=20closef(fp,=20td);=0A+=09=09=09FILEDESC_XLOCK(fdp);=0A= =20=09=09}=0A=20=09}=0A-=09FILEDESC_XLOCK(fdp);=0A=20=0A=20=09/*=20XXX=20= This=20should=20happen=20earlier.=20*/=0A=20=09mtx_lock(&fdesc_mtx);=0A=20= =09td->td_proc->p_fd=20=3D=20NULL;=0A=20=09mtx_unlock(&fdesc_mtx);=0A=20=0A= -=09if=20(fdp->fd_nfiles=20>=20NDFILE)=0A-=09=09FREE(fdp->fd_ofiles,=20= M_FILEDESC);=0A-=09if=20(NDSLOTS(fdp->fd_nfiles)=20>=20NDSLOTS(NDFILE))=0A= -=09=09FREE(fdp->fd_map,=20M_FILEDESC);=0A+=09idb_free(&fdp->fd_files);=0A= +=09idb_free(&fdp->fd_map);=0A+=09idb_free(&fdp->fd_cloexec);=0A=20=0A=20= =09fdp->fd_nfiles=20=3D=200;=0A=20=0A@@=20-2377,19=20+2812,20=20@@=20= setugidsafety(struct=20thread=20*td)=0A=20=09=20*/=0A=20=09= FILEDESC_XLOCK(fdp);=0A=20=09for=20(i=20=3D=200;=20i=20<=3D=20= fdp->fd_lastfile;=20i++)=20{=0A+=09=09struct=20file=20*fp;=0A+=0A=20=09=09= if=20(i=20>=202)=0A=20=09=09=09break;=0A-=09=09if=20(fdp->fd_ofiles[i]=20= &&=20is_unsafe(fdp->fd_ofiles[i]))=20{=0A-=09=09=09struct=20file=20*fp;=0A= +=09=09fp=20=3D=20ftable_get(fdp,=20i);=0A+=09=09if=20(fp=20&&=20= is_unsafe(fp))=20{=0A=20=0A=20=09=09=09knote_fdclose(td,=20i);=0A=20=09=09= =09/*=0A=20=09=09=09=20*=20NULL-out=20descriptor=20prior=20to=20close=20= to=20avoid=0A=20=09=09=09=20*=20a=20race=20while=20close=20blocks.=0A=20=09= =09=09=20*/=0A-=09=09=09fp=20=3D=20fdp->fd_ofiles[i];=0A-=09=09=09= fdp->fd_ofiles[i]=20=3D=20NULL;=0A-=09=09=09fdp->fd_ofileflags[i]=20=3D=20= 0;=0A+=09=09=09ftable_set(fdp,=20i,=20NULL);=0A+=09=09=09= ftable_set_cloexec(fdp,=20i,=200);=0A=20=09=09=09fdunused(fdp,=20i);=0A=20= =09=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09=09=09(void)=20closef(fp,=20td);=0A= @@=20-2411,8=20+2847,8=20@@=20fdclose(struct=20filedesc=20*fdp,=20struct=20= file=20*fp,=20int=20idx,=20struct=20thread=20*td)=0A=20{=0A=20=0A=20=09= FILEDESC_XLOCK(fdp);=0A-=09if=20(fdp->fd_ofiles[idx]=20=3D=3D=20fp)=20{=0A= -=09=09fdp->fd_ofiles[idx]=20=3D=20NULL;=0A+=09if=20(ftable_get(fdp,=20= idx)=20=3D=3D=20fp)=20{=0A+=09=09ftable_set(fdp,=20idx,=20NULL);=0A=20=09= =09fdunused(fdp,=20idx);=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20=09=09= fdrop(fp,=20td);=0A@@=20-2441,19=20+2877,20=20@@=20fdcloseexec(struct=20= thread=20*td)=0A=20=09=20*=20may=20block=20and=20rip=20them=20out=20from=20= under=20us.=0A=20=09=20*/=0A=20=09for=20(i=20=3D=200;=20i=20<=3D=20= fdp->fd_lastfile;=20i++)=20{=0A-=09=09if=20(fdp->fd_ofiles[i]=20!=3D=20= NULL=20&&=0A-=09=09=20=20=20=20(fdp->fd_ofiles[i]->f_type=20=3D=3D=20= DTYPE_MQUEUE=20||=0A-=09=09=20=20=20=20(fdp->fd_ofileflags[i]=20&=20= UF_EXCLOSE)))=20{=0A-=09=09=09struct=20file=20*fp;=0A+=09=09struct=20= file=20*fp;=0A+=0A+=09=09fp=20=3D=20ftable_get(fdp,=20i);=0A+=09=09if=20= (fp=20!=3D=20NULL=20&&=0A+=09=09=20=20=20=20(fp->f_type=20=3D=3D=20= DTYPE_MQUEUE=20||=0A+=09=09=09ftable_get_cloexec(fdp,=20i)))=20{=0A=20=0A= =20=09=09=09knote_fdclose(td,=20i);=0A=20=09=09=09/*=0A=20=09=09=09=20*=20= NULL-out=20descriptor=20prior=20to=20close=20to=20avoid=0A=20=09=09=09=20= *=20a=20race=20while=20close=20blocks.=0A=20=09=09=09=20*/=0A-=09=09=09= fp=20=3D=20fdp->fd_ofiles[i];=0A-=09=09=09fdp->fd_ofiles[i]=20=3D=20= NULL;=0A-=09=09=09fdp->fd_ofileflags[i]=20=3D=200;=0A+=09=09=09= ftable_set(fdp,=20i,=20NULL);=0A+=09=09=09ftable_set_cloexec(fdp,=20i,=20= 0);=0A=20=09=09=09fdunused(fdp,=20i);=0A=20=09=09=09if=20(fp->f_type=20= =3D=3D=20DTYPE_MQUEUE)=0A=20=09=09=09=09mq_fdclose(td,=20i,=20fp);=0A@@=20= -2486,7=20+2923,7=20@@=20fdcheckstd(struct=20thread=20*td)=0A=20=09= devnull=20=3D=20-1;=0A=20=09error=20=3D=200;=0A=20=09for=20(i=20=3D=200;=20= i=20<=203;=20i++)=20{=0A-=09=09if=20(fdp->fd_ofiles[i]=20!=3D=20NULL)=0A= +=09=09if=20(ftable_get(fdp,=20i)=20!=3D=20NULL)=0A=20=09=09=09continue;=0A= =20=09=09if=20(devnull=20<=200)=20{=0A=20=09=09=09save=20=3D=20= td->td_retval[0];=0A@@=20-2904,7=20+3341,7=20@@=20dupfdopen(struct=20= thread=20*td,=20struct=20filedesc=20*fdp,=20int=20indx,=20int=20dfd,=20= int=20mode,=0A=20=09=20*/=0A=20=09FILEDESC_XLOCK(fdp);=0A=20=09if=20(dfd=20= <=200=20||=20dfd=20>=3D=20fdp->fd_nfiles=20||=0A-=09=20=20=20=20(wfp=20=3D= =20fdp->fd_ofiles[dfd])=20=3D=3D=20NULL)=20{=0A+=09=20=20=20=20(wfp=20=3D=20= ftable_get(fdp,=20dfd))=20=3D=3D=20NULL)=20{=0A=20=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09return=20(EBADF);=0A=20=09}=0A@@=20= -2931,9=20+3368,9=20@@=20dupfdopen(struct=20thread=20*td,=20struct=20= filedesc=20*fdp,=20int=20indx,=20int=20dfd,=20int=20mode,=0A=20=09=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09=09return=20(EACCES);=0A=20=09=09}=0A-=09= =09fp=20=3D=20fdp->fd_ofiles[indx];=0A-=09=09fdp->fd_ofiles[indx]=20=3D=20= wfp;=0A-=09=09fdp->fd_ofileflags[indx]=20=3D=20fdp->fd_ofileflags[dfd];=0A= +=09=09fp=20=3D=20ftable_get(fdp,=20indx);=0A+=09=09ftable_set(fdp,=20= indx,=20wfp);=0A+=09=09ftable_set_cloexec(fdp,=20indx,=20= ftable_get_cloexec(fdp,=20dfd));=0A=20=09=09if=20(fp=20=3D=3D=20NULL)=0A=20= =09=09=09fdused(fdp,=20indx);=0A=20=09=09fhold_locked(wfp);=0A@@=20= -2951,11=20+3388,11=20@@=20dupfdopen(struct=20thread=20*td,=20struct=20= filedesc=20*fdp,=20int=20indx,=20int=20dfd,=20int=20mode,=0A=20=09=09/*=0A= =20=09=09=20*=20Steal=20away=20the=20file=20pointer=20from=20dfd=20and=20= stuff=20it=20into=20indx.=0A=20=09=09=20*/=0A-=09=09fp=20=3D=20= fdp->fd_ofiles[indx];=0A-=09=09fdp->fd_ofiles[indx]=20=3D=20= fdp->fd_ofiles[dfd];=0A-=09=09fdp->fd_ofiles[dfd]=20=3D=20NULL;=0A-=09=09= fdp->fd_ofileflags[indx]=20=3D=20fdp->fd_ofileflags[dfd];=0A-=09=09= fdp->fd_ofileflags[dfd]=20=3D=200;=0A+=09=09fp=20=3D=20ftable_get(fdp,=20= indx);=0A+=09=09ftable_set(fdp,=20indx,=20ftable_get(fdp,=20dfd));=0A+=09= =09ftable_set(fdp,=20dfd,=20NULL);=0A+=09=09ftable_set_cloexec(fdp,=20= indx,=20ftable_get_cloexec(fdp,=20dfd));=0A+=09=09= ftable_set_cloexec(fdp,=20dfd,=200);=0A=20=09=09fdunused(fdp,=20dfd);=0A=20= =09=09if=20(fp=20=3D=3D=20NULL)=0A=20=09=09=09fdused(fdp,=20indx);=0A@@=20= -3103,7=20+3540,7=20@@=20sysctl_kern_file(SYSCTL_HANDLER_ARGS)=0A=20=09=09= =09continue;=0A=20=09=09FILEDESC_SLOCK(fdp);=0A=20=09=09for=20(n=20=3D=20= 0;=20fdp->fd_refcnt=20>=200=20&&=20n=20<=20fdp->fd_nfiles;=20++n)=20{=0A= -=09=09=09if=20((fp=20=3D=20fdp->fd_ofiles[n])=20=3D=3D=20NULL)=0A+=09=09= =09if=20((fp=20=3D=20ftable_get(fdp,=20n))=20=3D=3D=20NULL)=0A=20=09=09=09= =09continue;=0A=20=09=09=09xf.xf_fd=20=3D=20n;=0A=20=09=09=09xf.xf_file=20= =3D=20fp;=0A@@=20-3215,7=20+3652,7=20@@=20= sysctl_kern_proc_ofiledesc(SYSCTL_HANDLER_ARGS)=0A=20=09=09= export_vnode_for_osysctl(fdp->fd_jdir,=20KF_FD_TYPE_JAIL,=20kif,=0A=20=09= =09=09=09fdp,=20req);=0A=20=09for=20(i=20=3D=200;=20i=20<=20= fdp->fd_nfiles;=20i++)=20{=0A-=09=09if=20((fp=20=3D=20fdp->fd_ofiles[i])=20= =3D=3D=20NULL)=0A+=09=09if=20((fp=20=3D=20ftable_get(fdp,=20i))=20=3D=3D=20= NULL)=0A=20=09=09=09continue;=0A=20=09=09bzero(kif,=20sizeof(*kif));=0A=20= =09=09kif->kf_structsize=20=3D=20sizeof(*kif);=0A@@=20-3450,7=20+3887,7=20= @@=20sysctl_kern_proc_filedesc(SYSCTL_HANDLER_ARGS)=0A=20=09=09= export_vnode_for_sysctl(fdp->fd_jdir,=20KF_FD_TYPE_JAIL,=20kif,=0A=20=09=09= =09=09fdp,=20req);=0A=20=09for=20(i=20=3D=200;=20i=20<=20fdp->fd_nfiles;=20= i++)=20{=0A-=09=09if=20((fp=20=3D=20fdp->fd_ofiles[i])=20=3D=3D=20NULL)=0A= +=09=09if=20((fp=20=3D=20ftable_get(fdp,=20i))=20=3D=3D=20NULL)=0A=20=09=09= =09continue;=0A=20=09=09bzero(kif,=20sizeof(*kif));=0A=20=09=09= FILE_LOCK(fp);=0A@@=20-3669,7=20+4106,7=20@@=20file_to_first_proc(struct=20= file=20*fp)=0A=20=09=09if=20(fdp=20=3D=3D=20NULL)=0A=20=09=09=09= continue;=0A=20=09=09for=20(n=20=3D=200;=20n=20<=20fdp->fd_nfiles;=20= n++)=20{=0A-=09=09=09if=20(fp=20=3D=3D=20fdp->fd_ofiles[n])=0A+=09=09=09= if=20(fp=20=3D=3D=20ftable_get(fdp,=20n))=0A=20=09=09=09=09return=20(p);=0A= =20=09=09}=0A=20=09}=0Adiff=20--git=20a/src/sys/kern/kern_lsof.c=20= b/src/sys/kern/kern_lsof.c=0Aindex=2004e5dd7..70a8286=20100644=0A---=20= a/src/sys/kern/kern_lsof.c=0A+++=20b/src/sys/kern/kern_lsof.c=0A@@=20= -260,7=20+260,7=20@@=20lsof(struct=20thread=20*td,=20struct=20lsof_args=20= *uap)=0A=20=0A=20=09=09/*=20Ordinary=20descriptors=20for=20files,=20= pipes,=20sockets:=20*/=0A=20=09=09}=20else=20if=20(msg.fd=20<=20= fdp->fd_nfiles)=20{=0A-=09=09=09fp=20=3D=20fdp->fd_ofiles[msg.fd];=0A+=09= =09=09fp=20=3D=20ftable_get(fdp,=20msg.fd);=0A=20=09=09=09if=20(fp)=20{=0A= =20=09=09=09=09switch=20(fp->f_type)=20{=0A=20=09=09=09=09case=20= DTYPE_VNODE:=0Adiff=20--git=20a/src/sys/kern/sys_generic.c=20= b/src/sys/kern/sys_generic.c=0Aindex=201a9b061..8aba73d=20100644=0A---=20= a/src/sys/kern/sys_generic.c=0A+++=20b/src/sys/kern/sys_generic.c=0A@@=20= -596,12=20+596,12=20@@=20kern_ioctl(struct=20thread=20*td,=20int=20fd,=20= u_long=20com,=20caddr_t=20data)=0A=20=09switch=20(com)=20{=0A=20=09case=20= FIONCLEX:=0A=20=09=09FILEDESC_XLOCK(fdp);=0A-=09=09= fdp->fd_ofileflags[fd]=20&=3D=20~UF_EXCLOSE;=0A+=09=09= ftable_set_cloexec(fdp,=20fd,=200);=0A=20=09=09FILEDESC_XUNLOCK(fdp);=0A=20= =09=09goto=20out;=0A=20=09case=20FIOCLEX:=0A=20=09=09= FILEDESC_XLOCK(fdp);=0A-=09=09fdp->fd_ofileflags[fd]=20|=3D=20= UF_EXCLOSE;=0A+=09=09ftable_set_cloexec(fdp,=20fd,=201);=0A=20=09=09= FILEDESC_XUNLOCK(fdp);=0A=20=09=09goto=20out;=0A=20=09case=20FIONBIO:=0A= @@=20-1043,7=20+1043,7=20@@=20pollscan(td,=20fds,=20nfd)=0A=20=09=09}=20= else=20if=20(fds->fd=20<=200)=20{=0A=20=09=09=09fds->revents=20=3D=200;=0A= =20=09=09}=20else=20{=0A-=09=09=09fp=20=3D=20fdp->fd_ofiles[fds->fd];=0A= +=09=09=09fp=20=3D=20ftable_get(fdp,=20fds->fd);=0A=20=09=09=09if=20(fp=20= =3D=3D=20NULL)=20{=0A=20=09=09=09=09fds->revents=20=3D=20POLLNVAL;=0A=20=09= =09=09=09n++;=0Adiff=20--git=20a/src/sys/kern/uipc_mqueue.c=20= b/src/sys/kern/uipc_mqueue.c=0Aindex=20fb2ef6a..59c339a=20100644=0A---=20= a/src/sys/kern/uipc_mqueue.c=0A+++=20b/src/sys/kern/uipc_mqueue.c=0A@@=20= -2005,8=20+2005,8=20@@=20kmq_open(struct=20thread=20*td,=20struct=20= kmq_open_args=20*uap)=0A=20=09FILE_UNLOCK(fp);=0A=20=0A=20=09= FILEDESC_XLOCK(fdp);=0A-=09if=20(fdp->fd_ofiles[fd]=20=3D=3D=20fp)=0A-=09= =09fdp->fd_ofileflags[fd]=20|=3D=20UF_EXCLOSE;=0A+=09if=20= (ftable_get(fdp,=20fd)=20=3D=3D=20fp)=0A+=09=09ftable_set_cloexec(fdp,=20= fd,=201);=0A=20=09FILEDESC_XUNLOCK(fdp);=0A=20=09td->td_retval[0]=20=3D=20= fd;=0A=20=09fdrop(fp,=20td);=0Adiff=20--git=20a/src/sys/kern/uipc_sem.c=20= b/src/sys/kern/uipc_sem.c=0Aindex=20d5525a4..b1e6b62=20100644=0A---=20= a/src/sys/kern/uipc_sem.c=0A+++=20b/src/sys/kern/uipc_sem.c=0A@@=20= -488,8=20+488,8=20@@=20ksem_create(struct=20thread=20*td,=20const=20char=20= *name,=20semid_t=20*semidp,=20mode_t=20mode,=0A=20=09fp->f_ops=20=3D=20= &ksem_ops;=0A=20=0A=20=09FILEDESC_XLOCK(fdp);=0A-=09if=20= (fdp->fd_ofiles[fd]=20=3D=3D=20fp)=0A-=09=09fdp->fd_ofileflags[fd]=20|=3D=20= UF_EXCLOSE;=0A+=09if=20(ftable_get(fdp,=20fd)=20=3D=3D=20fp)=0A+=09=09= ftable_set_cloexec(fdp,=20fd,=201);=0A=20=09FILEDESC_XUNLOCK(fdp);=0A=20=09= fdrop(fp,=20td);=0A=20=0Adiff=20--git=20a/src/sys/kern/uipc_usrreq.c=20= b/src/sys/kern/uipc_usrreq.c=0Aindex=20b8255f0..1bfc037=20100644=0A---=20= a/src/sys/kern/uipc_usrreq.c=0A+++=20b/src/sys/kern/uipc_usrreq.c=0A@@=20= -1605,7=20+1605,8=20@@=20unp_externalize(struct=20mbuf=20*control,=20= struct=20mbuf=20**controlp)=0A=20=09=09=09=09if=20(fdalloc(td,=200,=20= &f))=0A=20=09=09=09=09=09panic("unp_externalize=20fdalloc=20failed");=0A=20= =09=09=09=09fp=20=3D=20*rp++;=0A-=09=09=09=09= td->td_proc->p_fd->fd_ofiles[f]=20=3D=20fp;=0A+=09=09=09=09= ftable_set(td->td_proc->p_fd,=20f,=0A+=09=09=09=09=20=20=20=20fp);=0A=20=09= =09=09=09FILE_LOCK(fp);=0A=20=09=09=09=09fp->f_msgcount--;=0A=20=09=09=09= =09FILE_UNLOCK(fp);=0A@@=20-1735,12=20+1736,12=20@@=20= unp_internalize(struct=20mbuf=20**controlp,=20struct=20thread=20*td)=0A=20= =09=09=09for=20(i=20=3D=200;=20i=20<=20oldfds;=20i++)=20{=0A=20=09=09=09=09= fd=20=3D=20*fdp++;=0A=20=09=09=09=09if=20((unsigned)fd=20>=3D=20= fdescp->fd_nfiles=20||=0A-=09=09=09=09=20=20=20=20fdescp->fd_ofiles[fd]=20= =3D=3D=20NULL)=20{=0A+=09=09=09=09=20=20=20=20ftable_get(fdescp,=20fd)=20= =3D=3D=20NULL)=20{=0A=20=09=09=09=09=09FILEDESC_SUNLOCK(fdescp);=0A=20=09= =09=09=09=09error=20=3D=20EBADF;=0A=20=09=09=09=09=09goto=20out;=0A=20=09= =09=09=09}=0A-=09=09=09=09fp=20=3D=20fdescp->fd_ofiles[fd];=0A+=09=09=09=09= fp=20=3D=20ftable_get(fdescp,=20fd);=0A=20=09=09=09=09if=20= (!(fp->f_ops->fo_flags=20&=20DFLAG_PASSABLE))=20{=0A=20=09=09=09=09=09= FILEDESC_SUNLOCK(fdescp);=0A=20=09=09=09=09=09error=20=3D=20EOPNOTSUPP;=0A= @@=20-1765,7=20+1766,7=20@@=20unp_internalize(struct=20mbuf=20= **controlp,=20struct=20thread=20*td)=0A=20=09=09=09rp=20=3D=20(struct=20= file=20**)=0A=20=09=09=09=20=20=20=20CMSG_DATA(mtod(*controlp,=20struct=20= cmsghdr=20*));=0A=20=09=09=09for=20(i=20=3D=200;=20i=20<=20oldfds;=20= i++)=20{=0A-=09=09=09=09fp=20=3D=20fdescp->fd_ofiles[*fdp++];=0A+=09=09=09= =09fp=20=3D=20ftable_get(fdescp,=20*fdp++);=0A=20=09=09=09=09*rp++=20=3D=20= fp;=0A=20=09=09=09=09FILE_LOCK(fp);=0A=20=09=09=09=09fp->f_count++;=0A= diff=20--git=20a/src/sys/kern/vfs_syscalls.c=20= b/src/sys/kern/vfs_syscalls.c=0Aindex=202f28263..fa0a5e6=20100644=0A---=20= a/src/sys/kern/vfs_syscalls.c=0A+++=20b/src/sys/kern/vfs_syscalls.c=0A@@=20= -4884,7=20+4884,7=20@@=20getvnode(fdp,=20fd,=20fpp)=0A=20=09else=20{=0A=20= =09=09FILEDESC_SLOCK(fdp);=0A=20=09=09if=20((u_int)fd=20>=3D=20= fdp->fd_nfiles=20||=0A-=09=09=20=20=20=20(fp=20=3D=20fdp->fd_ofiles[fd])=20= =3D=3D=20NULL)=0A+=09=09=20=20=20=20(fp=20=3D=20ftable_get(fdp,=20fd))=20= =3D=3D=20NULL)=0A=20=09=09=09error=20=3D=20EBADF;=0A=20=09=09else=20if=20= (fp->f_vnode=20=3D=3D=20NULL)=20{=0A=20=09=09=09fp=20=3D=20NULL;=0Adiff=20= --git=20a/src/sys/netsmb/smb_dev.c=20b/src/sys/netsmb/smb_dev.c=0Aindex=20= fd0dcbe..a0dd80c=20100644=0A---=20a/src/sys/netsmb/smb_dev.c=0A+++=20= b/src/sys/netsmb/smb_dev.c=0A@@=20-370,7=20+370,7=20@@=20= nsmb_getfp(struct=20filedesc*=20fdp,=20int=20fd,=20int=20flag)=0A=20=0A=20= =09FILEDESC_SLOCK(fdp);=0A=20=09if=20(((u_int)fd)=20>=3D=20= fdp->fd_nfiles=20||=0A-=09=20=20=20=20(fp=20=3D=20fdp->fd_ofiles[fd])=20= =3D=3D=20NULL=20||=0A+=09=20=20=20=20(fp=20=3D=20ftable_get(fdp,=20fd))=20= =3D=3D=20NULL=20||=0A=20=09=20=20=20=20(fp->f_flag=20&=20flag)=20=3D=3D=20= 0)=20{=0A=20=09=09FILEDESC_SUNLOCK(fdp);=0A=20=09=09return=20(NULL);=0A= diff=20--git=20a/src/sys/sys/filedesc.h=20b/src/sys/sys/filedesc.h=0A= index=201831e5c..e9ea56a=20100644=0A---=20a/src/sys/sys/filedesc.h=0A+++=20= b/src/sys/sys/filedesc.h=0A@@=20-45,16=20+45,26=20@@=0A=20=20*=20This=20= structure=20is=20used=20for=20the=20management=20of=20descriptors.=20=20= It=20may=20be=0A=20=20*=20shared=20by=20multiple=20processes.=0A=20=20*/=0A= -#define=20NDSLOTTYPE=09u_long=0A+#define=20NDSLOTTYPE=09uintptr_t=0A+=0A= +/*=20Generic=20indirect=20block=20table=20*/=0A+struct=20idb_table=20{=0A= +=09union=20{=0A+=09=09void=20*flat;=0A+=09=09void=20**indirect;=0A+=09}=20= idb_tbl;=0A+=09int=20idb_nents;=20/*=20Current=20max=20#=20of=20entries.=20= */=0A+=09int=20idb_orig_nents;=20/*=20Orig=20#=20of=20entries=20for=20= the=20flat=20table.=20*/=0A+};=0A=20=0A=20struct=20filedesc=20{=0A-=09= struct=09file=20**fd_ofiles;=09/*=20file=20structures=20for=20open=20= files=20*/=0A-=09char=09*fd_ofileflags;=09=09/*=20per-process=20open=20= file=20flags=20*/=0A+=09struct=20idb_table=20fd_files;=20=20=20=20=20/*=20= table=20of=20open=20file=20structs=20*/=0A+=09struct=20idb_table=20= fd_map;=09/*=20bitmap=20of=20free=20fds=20*/=0A+=09struct=20idb_table=20= fd_cloexec;=09/*=20bitmap=20of=20fd=20close=20exec=20state=20*/=0A=20=09= struct=09vnode=20*fd_cdir;=09=09/*=20current=20directory=20*/=0A=20=09= struct=09vnode=20*fd_rdir;=09=09/*=20root=20directory=20*/=0A=20=09= struct=09vnode=20*fd_jdir;=09=09/*=20jail=20root=20directory=20*/=0A=20=09= int=09fd_nfiles;=09=09/*=20number=20of=20open=20files=20allocated=20*/=0A= -=09NDSLOTTYPE=20*fd_map;=09=09/*=20bitmap=20of=20free=20fds=20*/=0A=20=09= int=09fd_lastfile;=09=09/*=20high-water=20mark=20of=20fd_ofiles=20*/=0A=20= =09int=09fd_freefile;=09=09/*=20approx.=20next=20free=20file=20*/=0A=20=09= u_short=09fd_cmask;=09=09/*=20mask=20for=20file=20creation=20*/=0A@@=20= -130,12=20+140,18=20@@=20struct=20filedesc_to_leader=20*=0A=20int=09= getvnode(struct=20filedesc=20*fdp,=20int=20fd,=20struct=20file=20**fpp);=0A= =20void=09mountcheckdirs(struct=20vnode=20*olddp,=20struct=20vnode=20= *newdp);=0A=20void=09setugidsafety(struct=20thread=20*td);=0A+struct=09= file=20*ftable_get(struct=20filedesc=20*fdp,=20int=20fd);=0A+void=09= ftable_set(struct=20filedesc=20*fdp,=20int=20fd,=20struct=20file=20*fp);=0A= +int=09ftable_get_cloexec(struct=20filedesc=20*fdp,=20int=20fd);=0A+void=09= ftable_set_cloexec(struct=20filedesc=20*fdp,=20int=20fd,=20int=20on);=0A= +=0A=20=0A=20static=20__inline=20struct=20file=20*=0A=20= fget_locked(struct=20filedesc=20*fdp,=20int=20fd)=0A=20{=0A=20=0A-=09= return=20(fd=20<=200=20||=20fd=20>=3D=20fdp->fd_nfiles=20?=20NULL=20:=20= fdp->fd_ofiles[fd]);=0A+=09return=20(fd=20<=20= --Apple-Mail-16--1061268945 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit --Apple-Mail-16--1061268945--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F2459D9D-4102-4D1D-BDCB-4F5AA8DE336D>