Introduction
3FS, short for Fire-Flyer File System, is a high-performance distributed file system developed by DeepSeek AI to meet the requirements of data-intensive AI training and inference workloads.
3FS is used only in the scenario where models on compute nodes batch read sample data during AI training. It accelerates model training through high-speed compute-storage interaction. Featuring high throughput, low latency, and strong consistency, 3FS utilizes RDMA networking to provide a streamlined shared storage layer for distributed application developers. This document describes how to compile, install, and enable 3FS on openEuler 22.03 (Arm).

3FS consists of four core components: Cluster Manager, Client, Meta Service, and Storage Service. All components are interconnected through the RDMA network.
- As the control center of the cluster, it manages nodes and adopts a multi-node hot backup mechanism to ensure high availability.
- It uses FoundationDB for primary node election, ensuring the reliability and consistency of the primary node.
- It monitors the status of Meta Service and Storage Service, processes node status changes in a timely manner through the heartbeat mechanism, and notifies the cluster of the changes.
- It manages the online status of clients and revokes the file write permission of disconnected clients.
- Two access solutions are provided: Filesystem in Userspace (FUSE) client (hf3fs_fuse) and native client (User Space Ring-Based I/O, USRBIO)
- The FUSE client supports POSIX interfaces for quick integration with most applications.
- USRBIO works as an SDK to provide high-performance access, which is suitable for scenarios with high performance requirements.
- It adopts a storage-compute decoupling design. Metadata is stored in FoundationDB whose transaction mechanism is used to maintain the file system's directory tree structure.
- Stateless and scalable, it converts POSIX directory operations into FoundationDB transactions.
- It adopts a storage-compute coupled design. Each node manages its local SSDs to provide efficient read and write capabilities.
- It stores data in three-replica mode and uses Chain Replication with Apportioned Queries (CRAQ) to optimize read performance.
- Data is fragmented and stored separately to implement load balancing and improve data availability.