MinFS

MinFS is a simple, unix-like filesystem built for Zircon.

It currently supports files up to 4 GB in size.

Using MinFS

Host Device (QEMU Only)

  • Create a disk image that stores MinFS
# (Linux)
$ truncate --size=16G blk.bin
# (Mac)
$ mkfile -n 16g blk.bin
  • Execute the run zircon script on your platform with the ‘--’ to pass arguments directly to QEMU and then use ‘-hda’ to point to the file. If you wish to attach additional devices, you can supply them with ‘-hdb’, '-hdc, and so on.
fx set bringup.x64
fx build
fx qemu -- -hda blk.bin

Target Device (QEMU and Real Hardware)

Warning: On real hardware, /dev/class/block/... refers to REAL storage devices (USBs, SSDs, etc).

BE CAREFUL NOT TO FORMAT THE WRONG DEVICE. If in doubt, only run the following commands through QEMU. The lsblk command can be used to see more information about the devices accessible from Zircon.

  • Within zircon, lsblk can be used to list the block devices currently on the system. On this example system below, /dev/class/block/000 is a raw block device.
> lsblk
ID  DEV      DRV      SIZE TYPE           LABEL
000 block    block     16G
  • Let's add a GPT to this block device.
> gpt init /dev/class/block/000
...
> lsblk
ID  DEV      DRV      SIZE TYPE           LABEL
002 block    block     16G
  • Now that we have a GPT on this device, let's check what we can do with it. (NOTE: after manipulating the gpt, the device number may change. Use lsblk to keep track of how to refer to the block device).
> gpt dump /dev/class/block/002
blocksize=512 blocks=33554432
Partition table is valid
GPT contains usable blocks from 34 to 33554398 (inclusive)
Total: 0 partitions
  • gpt dump tells us some important info: it tells us (1) How big blocks are, and (2) which blocks we can actually use. Let's fill part of the disk with a MinFS filesystem.
> gpt add 34 20000000 minfs /dev/class/block/002
  • Within Zircon, format the partition as MinFS. Using lsblk you should see a block device, which is the whole disk, and a slightly smaller device, which is the partition. In the above output, the partition is device 003, and would have the path /dev/class/block/003
> mkfs <PARTITION_PATH> minfs
  • If you want the device to be mounted automatically on reboot, use the GPT tool to set its type. As we did above, you must use lsblk again to locate the entry for the disk. We want to edit the type of the zero-th partition. Here we use the keyword ‘fuchsia-data’ to set the type GUID, but if you wanted to use an arbitrary GUID you would supply it where ‘fuchsia-data’ is used.
> gpt edit 0 type fuchsia-data <DEVICE_PATH>
  • On any future boots, the partition will be mounted automatically at /data.

  • If you don't want the partition to be mounted automatically, you can update the visibility (or GUID) of the partition, and simply mount it manually.

> mount <PARTITION_PATH> /data
  • Any files written to /data (the mount point for this GUID) will persist across boots. To test this, try making a file on the new MinFS volume, rebooting, and observing it still exists.
> touch /data/foobar
> dm reboot
> ls /data
  • To find out which block device/file system is mounted at each subdirectory under a given path, use the following command:
> df <PATH>

Minfs operations

The following section describes what IOs are performed to complete a simple end user operation like read()/write().

Assumptions

  • No operation, read or write, is cached or batched. Each of these operations are like calling with sync and direct io set.
  • For rename: The destination file does not exist. Rename can delete a file if the destination of the rename operation is a valid file. This assumption keeps the math simple.
  • The “Write” operation issues a single data block write to a previously unaccessed portion of the vnode.
  • The “Overwrite” operation issues a single data block write to a portion of the block that has previously been allocated from an earlier “Write” operation.

Keys to the columns.

  1. OPERATION: The action requested by a client of the filesystem.
  2. BLOCK TYPE: Each fileystem operation results in accessing one or more types of blocks.
    • Data: Contains user data and directory entries.
    • Indirect: Indirect block in file block map tree
    • Dindirect: Double indirect block in file block map tree.
    • Inode table: Inode table block that holds one or more inodes.
    • Inode bitmap: Contains an array of bits each representing free/used state of inodes.
    • Data bitmap: Contains an array of bits each representing free/used state of data blocks.
    • Superblock: Contains data describing layout and state of the filesystem.
  3. IO TYPE: What type, read/write, of IO access it is.
  4. JOURNALED: Whether the IO will be journaled. Reads are not journaled but some of the writes are journaled.
  5. CONDITIONALLY ACCESSED: Depending on the OPERATION's input parameter and state of the filesystem, some blocks are conditionally accessed.
    • No: IO is always performed.
    • Yes: Filesystem state and input parameters decide whether this IO is needed or not.
  6. READ COUNT: Number of filesystem blocks read.
  7. WRITE COUNT (IGNORING JOURNAL): Number of filesystem blocks written. Writing to journal or journaling overhead are not counted towards this number.
  8. WRITE COUNT (WITH JOURNAL): Number of filesystem blocks written to journal and then to the final location. This does not include the blocks journal writes to maintain the journal state.

A row <operation> Total, like “Create Total”, gives the total number of blocks read/written. For operations involving journaling, the journal writes two more blocks, journal entry header and commit block, per operation. The number under write count for Total is the sum of WRITE COUNT (WITH JOURNALING) and journaling overhead, which is 2 blocks per operation.

Superblock, Inode table, Inode bitmap, Data bitmap, and a part of Journal are cached in memory while starting(mount/fsck) the filesystem. So, Read IOs are never issued for those BLOCK TYPES.

OPERATIONBLOCK TYPEIO TYPEJOURNALEDCONDITIONALLY ACCESSEDREAD COUNTWRITE COUNT(IGNORING JOURNAL)WRITE COUNT(WITH JOURNAL)COMMENTS
Lookup/OpenDataReadNoNo>=100If the directory is large, multiple blocks are read.
IndirectReadNoYes>=000Lookup can be served by direct blocks. So indirect is optional.
DIndirectReadNoYes>=000Lookup can be served by direct blocks. So dindirect is optional.
Lookup/Open Total>=100
CreateDataReadNoNo>=100Create involves lookup first for name collisions.
IndirectReadNoYes>=000
DIndirectReadNoYes>=000
DataWriteYesNo0>=1>=2
IndirectWriteYesYes0>=0>=0
DIndirectWriteYesYes0>=0>=0
Inode tableWriteYesNo012Inode for the new file.
Inode bitmapWriteYesNo012Mark inode as allocated.
Data bitmapWriteYesNo0>=0>=0Directory may grow to contain new directory entry.
SuperblockWriteYesNo012Among other things, allocated inode number changes.
Create Total>=1>=4>=10Includes 2 blocks for journal entry.
RenameDataReadNoNo>=100Rename involves a lookup in source directory.
IndirectReadNoYes>=000
DIndirectReadNoYes>=000
DataWriteYesNo0>=1>=2Source directory entry.
IndirectWriteYesYes0>=0>=0
DIndirectWriteYesYes0>=0>=0
Inode tableWriteYesNo012To update source directory inode.
DataReadNoNo>=000Rename involves a lookup in source directory.
IndirectReadNoYes>=000
DIndirectReadNoYes>=000
DataWriteYesYes0>=0>=0Writing destination directory entry.
IndirectWriteYesYes0>=0>=0
DIndirectWriteYesYes0>=0>=0
Inode tableWriteYesYes012To update destination directory inode.
Inode tableWriteYesNo012Renamed file’s mtime.
Data bitmapWriteYesNo0>=0>=0In case we allocated data, indirect or Dindirect block(s).
SuperblockWriteYesNo012
Rename Total>=1>=5>=12Includes 2 blocks for journal entry.
ReadDataReadNoNo>=100
IndirectReadNoYes>=000
DIndirectReadNoYes>=000
Read Total>=100
WriteIndirectReadNoYes>=000Even if the write is not overwriting, we may share (D)indirect block with existing data. Leading to read modify write.
DIndirectReadNoYes>=000
DataWriteNoNo011
IndirectWriteYesYes0>=0>=0
DIndirectWriteYesYes0>=0>=0
Inode tableWriteYesNo012Inode's mtime update.
Data bitmapWriteYesNo012For the allocated block.
SuperblockWriteYesNo012Change in number of allocated blocks.
Write Total>=0>=4>=9Includes 2 blocks for journal entry.
OverwriteDataReadNoYes>=000Read modify write.
IndirectReadNoYes>=000
DIndirectReadNoYes>=000
DataWriteNoNo011
IndirectWriteYesYes0>=0>=0
DIndirectWriteYesYes0>=0>=0
Inode tableWriteYesNo012
Data bitmapWriteYesNo012Write new allocation.
Data bitmapWriteYesNo0>=0>=0Free old block. This block bit may belong to allocated block bitmap.
SuperblockWriteYesNo012
Overwrite Total>=0>=4>=9Includes 2 blocks for journal entry.