Difference between revisions of "Zfs pool"
| (2 intermediate revisions by the same user not shown) | |||
| Line 14: | Line 14: | ||
sudo zpool create -o ashift=12 tank-hdd mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RMS /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RPX mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0339T /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RFB | sudo zpool create -o ashift=12 tank-hdd mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RMS /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RPX mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0339T /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RFB | ||
``` | ``` | ||
| + | |||
| + | sudo zfs create -o atime=off -o compression=lz4 -o xattr=sa -o acltype=posixacl tank-nvme/lxd | ||
| + | |||
| + | ## ashift | ||
| + | |||
| + | `ashift` stands for **alignment shift**. It dictates the minimum block size ZFS will use when formatting and writing data to the physical storage devices in your pool. | ||
| + | |||
| + | The number you provide is an exponent of 2: | ||
| + | |||
| + | * `ashift=9` means $2^9=512$ bytes. | ||
| + | * `ashift=12` means $2^{12}=4096$ bytes (4K). | ||
| + | * `ashift=13` means $2^{13}=8192$ bytes (8K). | ||
| + | |||
| + | Here is why forcing `ashift=12` is critical for modern storage. | ||
| + | |||
| + | ### **The 512-byte Lie (Emulation)** | ||
| + | |||
| + | Historically, hard drives used physical sectors that were exactly 512 bytes in size. However, almost all modern hard drives and SSDs use **Advanced Format**, meaning their physical layout is built on 4096-byte (4K) sectors. This larger size allows for higher storage density and better error correction. | ||
| + | |||
| + | To avoid breaking older operating systems and legacy hardware controllers, many modern 4K drives "lie" to the host system. They use a firmware feature called **512e (512-byte emulation)** to report themselves as having old-school 512-byte sectors, even though their physical architecture is 4K. | ||
| + | |||
| + | ### **The Read-Modify-Write Penalty** | ||
| + | |||
| + | If you create a ZFS pool without specifying the `ashift` value, ZFS will often interrogate the drive, hear the 512-byte lie, and set `ashift=9`. This creates a severe misalignment between ZFS's logical blocks and the drive's physical sectors. | ||
| + | |||
| + | If ZFS attempts to write a 512-byte block to a physical 4K sector, the storage drive is forced to execute a **Read-Modify-Write** operation: | ||
| + | |||
| + | 1. **Read:** The drive reads the entire 4K physical sector into its internal memory. | ||
| + | 2. **Modify:** The drive inserts the 512 bytes ZFS sent into the 4K block. | ||
| + | 3. **Write:** The drive writes the whole 4K sector back to the disk. | ||
| + | |||
| + | This massive overhead completely destroys write performance (especially random I/O) and causes severe "write amplification," which prematurely burns through the endurance limits of NVMe and SSD drives. | ||
| + | |||
| + | ### **The Solution** | ||
| + | |||
| + | By explicitly appending `-o ashift=12` to your `zpool create` command, you force ZFS to align all of its data payloads to exact 4K boundaries. This ensures ZFS writes map perfectly to the physical hardware beneath it, entirely bypassing the emulation penalty and ensuring optimal throughput and drive lifespan. | ||
| + | |||
| + | > **Note:** The `ashift` value is permanently baked into a top-level virtual device (vdev) at the moment of creation. If you create a pool with the wrong `ashift`, it cannot be changed later; you have to destroy the pool, wipe the drives, and start over. | ||
Latest revision as of 21:46, 15 June 2026
sudo zpool create -o ashift=12 tank-nvme mirror /dev/disk/by-id/nvme-eui.0025385281b1b872 /dev/disk/by-id/nvme-eui.0025385281b1b878 sudo wipefs -a /dev/disk/by-id/nvme-Samsung_SSD_960_EVO_1TB_S3X3NF0K204029J sudo wipefs -a /dev/disk/by-id/nvme-Samsung_SSD_960_EVO_1TB_S3X3NF0K204035E sudo wipefs -a /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RMS sudo wipefs -a /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RPX sudo wipefs -a /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0339T sudo wipefs -a /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RFB sudo zpool create -o ashift=12 tank-hdd mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RMS /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RPX mirror /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0339T /dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN07RFB
sudo zfs create -o atime=off -o compression=lz4 -o xattr=sa -o acltype=posixacl tank-nvme/lxd
ashift
ashift stands for alignment shift. It dictates the minimum block size ZFS will use when formatting and writing data to the physical storage devices in your pool.
The number you provide is an exponent of 2:
ashift=9means $2^9=512$ bytes.ashift=12means $2^{12}=4096$ bytes (4K).ashift=13means $2^{13}=8192$ bytes (8K).
Here is why forcing ashift=12 is critical for modern storage.
The 512-byte Lie (Emulation)
Historically, hard drives used physical sectors that were exactly 512 bytes in size. However, almost all modern hard drives and SSDs use Advanced Format, meaning their physical layout is built on 4096-byte (4K) sectors. This larger size allows for higher storage density and better error correction.
To avoid breaking older operating systems and legacy hardware controllers, many modern 4K drives "lie" to the host system. They use a firmware feature called 512e (512-byte emulation) to report themselves as having old-school 512-byte sectors, even though their physical architecture is 4K.
The Read-Modify-Write Penalty
If you create a ZFS pool without specifying the ashift value, ZFS will often interrogate the drive, hear the 512-byte lie, and set ashift=9. This creates a severe misalignment between ZFS's logical blocks and the drive's physical sectors.
If ZFS attempts to write a 512-byte block to a physical 4K sector, the storage drive is forced to execute a Read-Modify-Write operation:
- Read: The drive reads the entire 4K physical sector into its internal memory.
- Modify: The drive inserts the 512 bytes ZFS sent into the 4K block.
- Write: The drive writes the whole 4K sector back to the disk.
This massive overhead completely destroys write performance (especially random I/O) and causes severe "write amplification," which prematurely burns through the endurance limits of NVMe and SSD drives.
The Solution
By explicitly appending -o ashift=12 to your zpool create command, you force ZFS to align all of its data payloads to exact 4K boundaries. This ensures ZFS writes map perfectly to the physical hardware beneath it, entirely bypassing the emulation penalty and ensuring optimal throughput and drive lifespan.
Note: The
ashiftvalue is permanently baked into a top-level virtual device (vdev) at the moment of creation. If you create a pool with the wrongashift, it cannot be changed later; you have to destroy the pool, wipe the drives, and start over.