A packfile is simply a single file containing one or more other files. They’re very useful for load times, and can be used in a number of ways. We can put files that get loaded together into a single packfile and load them at once, such as putting the data for a single level into a single packfile and loading in a single read. We can also put our entire filesystem into one or more packfiles, which lets us handle compression cleanly as well as making distribution easier.
So what do we need to do to create a packfile? No much:
- File name and path, usually relative to a specific root directory
- File size and compressed size
In this article now we’ll look at how to implement a packfile system.
Let’s look at one possible structure for a packfile’s metadata
struct PackFileHeader
{
int32_t version; // packfile format version
int32_t numFiles; // number of files in this packfile
PackFileEntry fileEntries[];
};
struct PackFileEntry
{
uint32_t nameHash; // hash of the full path of the file
uint32_t size; // size of the original file
uint32_t sizeCompressed; // size of the compressed file, 0 if uncompressed
uint32_t offset; // offset of the file in the packfile
};
We don’t store the filename, only a hashed name. This makes the data small and checking if a file exists fast – since we can sort the PackFileEntry list by hash for quick lookup. This only supports 4GB packfiles, but that’s not a huge problem, we can have multiple packfiles if necessary. Also important is the version in the PackFileHeader – I highly recommend adding a version to your data files, not just for supporting backwards compatibility but also for detecting out of date data.
So how do we read from a packfile?
- Read the header
- Read the entry list
- Seek to the file entry’s offset
- Read the file
Hmm, 3 read operations to access a file? That’s not very good. If we cache the header and file entry list then we only have to seek and read. We can also open the packfile once and keep the file handle around, so we don’t need to open and close the file.
Of course we could just read the entire packfile at once if we know we’ll load all the files inside it. That only works if we’ve carefully organized our data into separate packfiles. It’s a very good idea, and ideally you should organize your data that way. However, if you have a very dynamic game, customized characters or an open world it’s not always feasible.
Using packfiles as part of your file system is straightforward. When you start your game register each packfile by loading the header data and open the packfile, keeping the file handle around. Then when a file is read, you check the list of packfile entries and if found read from the packfile.
But enough talk, lets see how we could integrate packfiles into our file manager:
class PackFile
{
public:
PackFile(const char *packfile);
~PackFile();
// returns index of file, -1 if not in packfile
int FindFile(const char *filename);
// initializes a fileoperation to point to the given index
void OpenFile( int index, FileOperation &operation );
private:
PackFileHeader *mHeader; // pointer to the file header (& entries)
HANDLE mHandle; // open handle to packfile
};
int PackFile::FindFile(const char *filename)
{
// get hash of file name, assumes Unix style path
uint32_t nameHash = Hash(filename);
// search entires, returns -1 if not found
int index = BinarySearch( nameHash, mHeader->fileEntries, mHeader->numEntries );
return index;
}
void OpenFile( int index, FileOperation &operation )
{
assert( index > -1 && index < mHeader->numEntires );
operation.mHandle = mHandle;
operation.mPosition = mHeader->entires[ index ].offset;
operation.mState = STATE_END;
operation.mOperation = OP_OPEN;
operation.mFileSize = mHeader->entires[ index ].size;
operation.mCompressedSize = mHeader->entires[ index ].sizeCompressed;
}
We also need to change our LoadFile a little, to search through a list of registered packfiles
FileHandle FileManager::LoadFile(const wchar_t *filename, void **buffer, uint64_t *size)
{
FileHandle file = GetFreeOperation();
if ( INVALID_FILE_HANDLE == file ) return file;
int packFileIndex = -1;
for ( int = iPackFile = 0; iPackFile < mNumPackFiles; iPackFile++ )
{
packFileIndex = mPackFiles[iPackFile].FindFile( filename );
if ( -1 != packFileIndex )
{
mPackFiles[iPackFile].OpenFile( packFileIndex, mOperations[file] );
}
}
// normal file, open normally
if ( -1 == packFileIndex )
{
mOperations[file].Open(filename);
}
mParams[file].mBufferPtr = buffer;
mParams[file].mSizePtr = size;
return file;
}
Then we need to add a seek stage, since we’re not always loading from the beginning a file. Also, we change compression to only work from packfiles, since magic numbers suck.
void FileManager::Update()
{
for ( int i = 0; i < MAX_OPERATIONS; i++ )
{
switch ( mOperations[i].GetOperation() )
{
case OP_NONE:
break;
case OP_OPEN:
int fileSize = mOperations[i].mFileSize;
void *fileBuffer = new uint8_t[fileSize];
*mParams[i].mBufferPtr = fileBuffer;
*mParams[i].mSizePtr = fileSize;
mOperations[i].Seek( mOperations[i].mPosition );
break;
case OP_SEEK:
mOperations[i].Read( fileBuffer, fileSize );
break;
case OP_READ:
// overlapped reads are asynchronous, so poll, if done close
bool complete = GetOverlappedResult(
mOperations[i].mHandle,
&mOperations[i].mOverlapped, NULL, FALSE);
if ( complete )
{
mOperations[i].mState = STATE_END;
mOperations[i].Close();
// check for our magic number
if ( mOperations[i].mCompressedSize )
{
DecompressData( i );
}
}
case OP_CLOSE:
// nothing to do here
break;
}
}
}
What we’ve done is implement a simple version of packfiles. There a number of other tricks we can do with packfiles, but in reality this system give you 90% of the advantages. You get cleaner compression support and the ability to load many files in a single read. An additional bonus is you’ll get smaller seek times since packfiles put the data physically closer than individual files.
Next, we’ll look at how to reduce seek times by optimizing the layout of our disc.