blob: e26f0b06b5b9fed8b12dfbfbe6f805a6e5433394 [file] [log] [blame] [view]
# MinFS
MinFS is a simple, unix-like filesystem built for Zircon.
It currently supports files up to 4 GB in size.
## Using MinFS
### Host Device (QEMU Only)
* Create a disk image that stores MinFS
```shell
# (Linux)
$ truncate --size=16G blk.bin
# (Mac)
$ mkfile -n 16g blk.bin
```
* Execute the run zircon script on your platform with the '--' to pass
arguments directly to QEMU and then use '-hda' to point to the file. If you
wish to attach additional devices, you can supply them with '-hdb', '-hdc,
and so on.
```shell
fx set bringup.x64
fx build
fx qemu -- -hda blk.bin
```
### Target Device (QEMU and Real Hardware)
Warning: On real hardware, `/dev/class/block/...` refers to **REAL** storage
devices (USBs, SSDs, etc).
**BE CAREFUL NOT TO FORMAT THE WRONG DEVICE.** If in doubt, only run the
following commands through QEMU.
The `lsblk` command can be used to see more information about the devices
accessible from Zircon.
* Within zircon, `lsblk` can be used to list the block devices currently on
the system. On this example system below, `/dev/class/block/000` is a raw
block device.
```
> lsblk
ID DEV DRV SIZE TYPE LABEL
000 block block 16G
```
* Let's add a GPT to this block device.
```
> gpt init /dev/class/block/000
...
> lsblk
ID DEV DRV SIZE TYPE LABEL
002 block block 16G
```
* Now that we have a GPT on this device, let's check what we can do with it.
(NOTE: after manipulating the gpt, the device number may change. Use `lsblk`
to keep track of how to refer to the block device).
```
> gpt dump /dev/class/block/002
blocksize=512 blocks=33554432
Partition table is valid
GPT contains usable blocks from 34 to 33554398 (inclusive)
Total: 0 partitions
```
* `gpt dump` tells us some important info: it tells us (1) How big blocks are,
and (2) which blocks we can actually use.
Let's fill part of the disk with a MinFS filesystem.
```
> gpt add 34 20000000 minfs /dev/class/block/002
```
* Within Zircon, format the partition as MinFS. Using `lsblk` you should see
a block device, which is the whole disk, and a slightly smaller device, which
is the partition. In the above output, the partition is device 003, and would
have the path `/dev/class/block/003`
```
> mkfs <PARTITION_PATH> minfs
```
* If you want the device to be mounted automatically on reboot, use the GPT
tool to set its type. As we did above, **you must** use `lsblk` **again**
to locate the entry for the disk. We want to edit the type of the zero-th
partition. Here we use the keyword 'fuchsia-data' to set the type GUID, but
if you wanted to use an arbitrary GUID you would supply it where
'fuchsia-data' is used.
```
> gpt edit 0 type fuchsia-data <DEVICE_PATH>
```
* On any future boots, the partition will be mounted automatically at `/data`.
* If you don't want the partition to be mounted automatically, you can update
the visibility (or GUID) of the partition, and simply mount it manually.
```
> mount <PARTITION_PATH> /data
```
* Any files written to `/data` (the mount point for this GUID) will persist
across boots. To test this, try making a file on the new MinFS volume,
rebooting, and observing it still exists.
```
> touch /data/foobar
> dm reboot
> ls /data
```
* To find out which block device/file system is mounted at each subdirectory
under a given path, use the following command:
```
> df <PATH>
```
## Minfs operations
The following section describes what IOs are performed to complete a simple end
user operation like read()/write().
### Assumptions
* No operation, read or write, is cached or batched. Each of these operations
are like calling with sync and direct io set.
* For rename: The destination file does not exist. Rename can delete a file if
the destination of the rename operation is a valid file. This assumption keeps
the math simple.
* The "Write" operation issues a single data block write to a previously
unaccessed portion of the vnode.
* The "Overwrite" operation issues a single data block write to a portion of the
block that has previously been allocated from an earlier "Write" operation.
### Keys to the columns.
1. OPERATION: The action requested by a client of the filesystem.
1. BLOCK TYPE: Each fileystem operation results in accessing one or more types
of blocks.
* Data: Contains user data and directory entries.
* Indirect: Indirect block in file block map tree
* Dindirect: Double indirect block in file block map tree.
* Inode table: Inode table block that holds one or more inodes.
* Inode bitmap: Contains an array of bits each representing free/used state
of inodes.
* Data bitmap: Contains an array of bits each representing free/used state
of data blocks.
* Superblock: Contains data describing layout and state of the filesystem.
1. IO TYPE: What type, read/write, of IO access it is.
1. JOURNALED: Whether the IO will be journaled. Reads are not journaled but some
of the writes are journaled.
1. CONDITIONALLY ACCESSED: Depending on the OPERATION's input parameter and
state of the filesystem, some blocks are conditionally accessed.
* No: IO is always performed.
* Yes: Filesystem state and input parameters decide whether this IO is
needed or not.
1. READ COUNT: Number of filesystem blocks read.
1. WRITE COUNT (IGNORING JOURNAL): Number of filesystem blocks written. Writing
to journal or journaling overhead are not counted towards this number.
1. WRITE COUNT (WITH JOURNAL): Number of filesystem blocks written to journal
and then to the final location. This does not include the blocks journal
writes to maintain the journal state.
A row \<operation\> Total, like "Create Total", gives the total number of blocks
read/written. For operations involving journaling, the journal writes two more
blocks, journal entry header and commit block, per operation. The number under
write count for <operations> Total is the sum of WRITE COUNT (WITH JOURNALING)
and journaling overhead, which is 2 blocks per operation.
Superblock, Inode table, Inode bitmap, Data bitmap, and a part of Journal are
cached in memory while starting(mount/fsck) the filesystem. So, Read IOs are
never issued for those BLOCK TYPES.
| OPERATION | BLOCK TYPE | IO TYPE | JOURNALED | CONDITIONALLY ACCESSED | READ COUNT | WRITE COUNT(IGNORING JOURNAL) | WRITE COUNT(WITH JOURNAL) | COMMENTS |
|--------------|--------------|---------|-----------|------------------------|------------:|------------:|---------------------------:|--------------------------------|
| Lookup/Open | Data | Read | No | No | >=1 | 0 | 0 | If the directory is large, multiple blocks are read. |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | Lookup can be served by direct blocks. So indirect is optional. |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | Lookup can be served by direct blocks. So dindirect is optional. |
| **Lookup/Open Total** | | | | | >=1 | 0 | 0 | |
| Create | Data | Read | No | No | >=1 | 0 | 0 | Create involves lookup first for name collisions. |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| | Data | Write | Yes | No | 0 | >=1 | >=2 | |
| | Indirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | DIndirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | Inode table | Write | Yes | No | 0 | 1 | 2 | Inode for the new file. |
| | Inode bitmap | Write | Yes | No | 0 | 1 | 2 | Mark inode as allocated. |
| | Data bitmap | Write | Yes | No | 0 | >=0 | >=0 | Directory may grow to contain new directory entry. |
| | Superblock | Write | Yes | No | 0 | 1 | 2 | Among other things, allocated inode number changes. |
| **Create Total** | | | | | >=1 | >=4 | >=10 | Includes 2 blocks for journal entry. |
| Rename | Data | Read | No | No | >=1 | 0 | 0 | Rename involves a lookup in source directory. |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| | Data | Write | Yes | No | 0 | >=1 | >=2 | Source directory entry. |
| | Indirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | DIndirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | Inode table | Write | Yes | No | 0 | 1 | 2 | To update source directory inode. |
| | Data | Read | No | No | >=0 | 0 | 0 | Rename involves a lookup in source directory. |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| | Data | Write | Yes | Yes | 0 | >=0 | >=0 | Writing destination directory entry. |
| | Indirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | DIndirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | Inode table | Write | Yes | Yes | 0 | 1 | 2 | To update destination directory inode. |
| | Inode table | Write | Yes | No | 0 | 1 | 2 | Renamed files mtime. |
| | Data bitmap | Write | Yes | No | 0 | >=0 | >=0 | In case we allocated data, indirect or Dindirect block(s). |
| | Superblock | Write | Yes | No | 0 | 1 | 2 | |
| **Rename Total** | | | | | >=1 | >=5 | >=12 | Includes 2 blocks for journal entry. |
| Read | Data | Read | No | No | >=1 | 0 | 0 | |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| **Read Total** | | | | | >=1 | 0 | 0 | |
| Write | Indirect | Read | No | Yes | >=0 | 0 | 0 | Even if the write is not overwriting, we may share (D)indirect block with existing data. Leading to read modify write. |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| | Data | Write | No | No | 0 | 1 | 1 | |
| | Indirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | DIndirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | Inode table | Write | Yes | No | 0 | 1 | 2 | Inode's mtime update. |
| | Data bitmap | Write | Yes | No | 0 | 1 | 2 | For the allocated block. |
| | Superblock | Write | Yes | No | 0 | 1 | 2 | Change in number of allocated blocks. |
| **Write Total** | | | | | >=0 | >=4 | >=9 | Includes 2 blocks for journal entry. |
| Overwrite | Data | Read | No | Yes | >=0 | 0 | 0 | Read modify write. |
| | Indirect | Read | No | Yes | >=0 | 0 | 0 | |
| | DIndirect | Read | No | Yes | >=0 | 0 | 0 | |
| | Data | Write | No | No | 0 | 1 | 1 | |
| | Indirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | DIndirect | Write | Yes | Yes | 0 | >=0 | >=0 | |
| | Inode table | Write | Yes | No | 0 | 1 | 2 | |
| | Data bitmap | Write | Yes | No | 0 | 1 | 2 | Write new allocation. |
| | Data bitmap | Write | Yes | No | 0 | >=0 | >=0 | Free old block. This block bit may belong to allocated block bitmap.|
| | Superblock | Write | Yes | No | 0 | 1 | 2 | |
| **Overwrite Total** | | | | | >=0 | >=4 | >=9 | Includes 2 blocks for journal entry. |