Large FAT32 Filesystems under FreeBSD

NOTE: Though I haven't tried it, it appears that someone has created a work-around for this problem, an alternate implementation of FAT32 for FreeBSD. It apparently uses more RAM, but breaks the 128MB barrier. This is found in the latest versions of FreeBSD (I noticed it in 5.4-RELEASE, but I don't know when it first appeared).

So I have this 200 GB firewire drive. It has a single FAT32 filesystem on it, and I want to use it that way (so I can also use the drive with MS Windows). But FreeBSD isn't able to mount a FAT32 filesystem that big.

The Problem

When I try to mount the filesystem, I get the following message:

	mountmsdosfs(): disk too big, sorry

Glancing through the kernel source (this is FreeBSD 4.8-RELEASE), I find the following check:

        if (pmp->pm_HugeSectors > 0xffffffff /
            (pmp->pm_BytesPerSec / sizeof(struct direntry)) + 1) {
                /*
                 * We cannot deal currently with this size of disk
                 * due to fileid limitations (see msdosfs_getattr and
                 * msdosfs_readdir)
                 */
                error = EINVAL;
                printf("mountmsdosfs(): disk too big, sorry\n");
                goto error_exit;
        }

inodes and Clusters

Every file on a Unix filesystem has a unique file number associated with it. This number is commonly referred to as an inode number, since it is an index into a table of inode data structures on the disk.

On a FAT filesystem, disk blocks (or sectors) are grouped together in clusters (more recently referred to as allocation units). Disk space is assigned to a file one cluster at a time. A single cluster cannot have data from more than one file.

On a FAT filesystem, there is no inode table. Instead, there is a file allocation table (FAT). Each cluster has an entry in the FAT. A directory entry refers to the first cluster assigned to the file. In the FAT, each entry points to the next cluster, or has a special value indicating that it is the last cluster. In this way, a file occupies a chain of clusters.

So, on a FAT32 filesystem, there are X number of clusters, and each cluster is Y number of bytes. My 200 GB firewire drive has 6,103,459 clusters, and each cluster is 32,768 byutes (64 sectors).

File number assignment with FreeBSD/msdosfs

So, a FAT32 filesystem doesn't have inode numbers (file numbers). But Unix rather depends on the concept. Software for mounting a FAT32 filesystem under Unix needs some way to generate filenumbers. It would seem like using the starting cluster number is a close enough approximation. It is right there in the directory entry, and no two files can share the same number.

When I first looked at the code snippet above, I came to the conclusion that FreeBSD is doing that. I then assumed that the disk I was using must have more than 4294967296 clusters. That didn't seem right, though, because the 32 in FAT32 is the number of bits in each FAT entry, which couldn't be bigger than 4294967296. So I mounted it under WinXP and found that disk has only 6103459 clusters. What gives?

Well what IS it doing?

Hmm. The maximum number of pm_HugeSectors (which I guess is clusters) is 0xffffffff / pm_BytesPerSec / 32. I guess pm_BytesPerSec must be the cluster size, which is 32k on my disk. That would only be 4095, which doesn't seem very big. My assumptions must be wrong.

Okay, looking at struct bpb50, it's clear that Sectors really means sectors and not clusters. That could mean 0xffffffff / 512 / 32, which is 262143. In essence, a FAT32 disk cannot be any bigger than 0xffffffff / 32 bytes, which is 134217216 (or 128 GB). But, what exactly does the size of a directory entry (the 32 bytes) have to do with it?

I have to assume this limitation is broken. WinXP mounts the drive just fine. I haven't tried it, but the drive claims to work under Win98se. The problem boils down to the way FreeBSD makes up filenumbers. So, what is it doing?

cntobn()

The fileno appears to come from cntobn(). If I had to guess from it's name, I would say it converts cluster numbers to block numbers. The comment for the macro is "Map a cluster number into a filesystem relative block number." Then it takes that value and multiplies that by the number of direntry structures that fit in a structure. Why would we be using that for a fileno?

struct direntry

I still don't understand what the size of struct direntry has to do with the limit. What does the number of direntry structures we could fit on the disk have to do with the size limit? It's not like we are going to fill up the disk with filenames. Is it? I just don't understand. If the goal is to create a number that uniquely identifies the file, isn't the starting cluster number good enough?

Oh, oh, oh. I know. An inode entry contains everything about a file except it's name and the actual data. The file dates, permissions, etc. With a FAT filesystem, this data is actually kept in the directory entry for the file, which could exist anywhere on the disk. The starting cluster number is unique for the file, and will take you right to the data in the file, but it won't take you to the file's date and attributes. Very often, those things will be looked for, having provided nothing but a file number.

What else can be done?

Yuck. Maybe we should keep a cache? That is, to say the least, a little nasty. It probably wouldn't get too huge. The question is, would there ever be a miss, and what happens when there is?

If there is a miss, we'll have to go hunting through the directory tree. That would be really bad. If a disk is greater than 128 GB, the directory tree is probably very extensive.