Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Sep 2005 19:39:45 +0300 (EEST)
From:      Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: kern/85503: panic: wrong dirclust using msdosfs in RELENG_6
Message-ID:  <20050903190632.S1788@atlantis.atlantis.dp.ua>
In-Reply-To: <20050902205456.S2885@delplex.bde.org>
References:  <20050901183311.D62325@atlantis.atlantis.dp.ua> <20050902205456.S2885@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Hello!

On Fri, 2 Sep 2005, Bruce Evans wrote:
>> on maximum number of files for 32-bit architectures. E.g., on FreeBSD/ia64
>> u_int is 64 bits, and thus it would be no problem for it's API to create 
>> and handle more than 4G files/fs. But such a file system will be 
>> incompatible
>
> Actually u_int is 32 bits for ia64, and the ino_t API/ABI is indenpendent
> of the size of u_int.  ino_t is uint32_t.

  Hmm, what about other 64-bit architectures (e.g. alpha)? I used to think that 
on 64-bit CPUs type int should have 64 bits.

> Neither u_int nor off_t has anything to do with the correct storage
> size here.  off_t is a signed integer type suitable for representing
--------------^^^^^^^^^^^^^^^^^

  Yes, it is (at least on i386):

/usr/include/machine/_types.h:

typedef long long               __int64_t;

/usr/include/sys/_types.h:

typedef __int64_t       __off_t;        /* file offset */

> offsets within files.  Sicne off_t is unsigned, it is unsuitable for
-------------------------------^^^^^^^^^^^^^^^^^

  No, it's signed ;)

> representing offsets within file systems.  It just happens to work
> because it is 64 bits and an offset of 2^63-1 bytes is enough for
> anyone ;-).  (Actually it is not even enough for offsets within files
> since offsets in /dev/kmem are often > 2^63 on 64-bit systems.) ino_t

  I think that any file system should be flexible enough to support maximum
file size as large as file system size (ideally). So if off_t is suitable
for representing offset within single file, it should also be suitable
for representing offset within filesystem, because their maximum sizes
are almost the same. Also, I don't understand why signedness of the type
makes any difference: whe're working with offsets from the start of
our media, so all offsets are positive.

  I'm trying to be as general as possible. Size of direct access media (disks)
tends to increase, so (in order not to rewrite disk layers every 5 years)
we should have a basic data type which is suitable to hold media size
in bytes. In fact I think that we already have this type (off_t). Also,
it's clear that no media can contain more files than it's size in bytes,
so this data type should also represent file number (inode number, if
this sounds better ;).

> is closer to being the correct type.  The type of v_hash certainly needs
> to be larger than ino_t.  My main point is that although it could be
> larger so that file systems can easily create a (unique) id from things
> like (dirclust, diroffset) pairs, it is not useful for it to be larger
> since file systems need to create an id for the inode number anyway.
> (Creation in some file system, e.g. ffs, is just copying the inode
> number from the inode.)

  Of course, size of ino_t should also be upgraded. But I understand that
it isn't an easy task.

>>> So all current file systems need to generate unique 32-bit inode
>>> numbers.  This may be difficult, but once it is done I think the inode
>>                 ^^^^^^^^^^^^^^^^
>> 
>>  ...and may be close-to-impossible. What if e.g. Microsoft invites say 
>> FAT-2005 with variable-length directory entries? I'm not sure that for
>> every third-party filesystem it would be possible to generate 32-bit
>> pseudoinode. And it's very bad that we can't handle >4Gfiles/fs at all.
>
> It already invented variable-length entries for long names in 1990-1995 :-).
> But the sizes of the entries are multiples of 32.  This is required for
> compatibility and won't change.

  But even this fact doesn't rescue us when we talk about FAT32 slices
> 128Gb...

> I think I said that the inode number in msdosfs should be the cluster
> number of the first cluster in the file.  This would be broken by
> variable-sized clusters (unlikely, and even less useful) or new file
> types like symlinks (useful and not so unlikely -- FreeBSD could add
> them as an extension).

  Yes, I agree with this. While this fs has being called FAT32,
it's cluster number will fit in 32-bit word.

> Indeed.  The only important cases are ffs and some network file systems
> that already support >= 4G files.

  I think interoperability with other OSes is also important, and if, e.g.
Microsoft will invent FAT64, we will return to this topic ;)

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry@atlantis.dp.ua
nic-hdl: LYNX-RIPE



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050903190632.S1788>