系列文章目录
3FS系列(一):存储新纪元的开篇——3FS编译调优与部署的工程实践
引言
2月24日,重磅存储系统3FS(Fire-Flyer File System) 在 DeepSeek 轰轰烈烈的开源周压轴登场,补齐了计算、网络以外的另一块拼图——存储。区别于过往巧妙算法的开源库,3FS 是完整的涉及多种节点、结合多种外部节点的高速并行文件系统,其代码结构清晰、模块间解耦程度高,充分展现了 DeepSeek 工程师对复杂工程的驾驭能力。作为 DeepSeek 开源生态的一部分,3FS 于 2025 年 2 月 27 日在 GitHub 上正式开源,一经发布热度值瞬间爆表,引发业界关注。3FS 提供了几个关键特性,使其极其适合 AI 工作负载:
分离式架构
强一致性保护
标准的文件接口
支持多样化文件负载
尽管官方已提供 3FS 详尽的设计文档,其复杂程度对于想要学习 3FS 系统的爱好者仍然提出了不小的挑战。作为人工智能基础软件方向的前沿力量,九章云极的研发大咖们近期也都在热议3FS,但我们今天不讨论3FS本身的产品设计,而是尝试借助我们存储方向的专业知识一步步抽丝剥茧、为大家手把手教学AGI时代需要什么样的存储系统以及存储主要的应用场景,并提供一些存储系统编译和部署的过程中的技巧和思路,希望能起到抛砖引玉的作用。
本篇文章是九章云极 3FS 系列文章的第一篇,我们将通过一次操作实例为大家讲述 3FS 的编译与部署过程。本文篇幅较长,请耐心操作。
实例步骤如下:
1、编译
前置说明
我们在 ubuntu 22.04
发行版上进行编译。默认的编译路径为当前用户的 Home
目录:
export BUILD_DIR=$HOME
步骤 1:安装依赖
1.1 安装依赖
$ apt update
$ apt install -y cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \
libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev cargo git g++ wget meson
1.2 安装 FoundationDB
$ cd ${BUILD_DIR}
$ wget https://github.com/apple/foundationdb/releases/download/7.1.67/foundationdb-server_7.1.67-1_amd64.deb \
https://github.com/apple/foundationdb/releases/download/7.1.67/foundationdb-clients_7.1.67-1_amd64.deb
$ dpkg -i foundationdb-server_7.1.67-1_amd64.deb foundationdb-clients_7.1.67-1_amd64.deb
1.3 安装 Fuse
$ cd ${BUILD_DIR}
$ wget https://github.com/libfuse/libfuse/releases/download/fuse-3.16.2/fuse-3.16.2.tar.gz
$ tar -zxvf fuse-3.16.2.tar.gz
$ cd fuse-3.16.2; mkdir build; cd build
$ meson setup ..
$ ninja
$ ninja install
步骤 2:编译 3FS
$ cd ${BUILD_DIR}
$ git clone https://github.com/deepseek-ai/3fs
$ cd 3fs
$ git submodule update --init --recursive
$ ./patches/apply.sh
$ cmake -S . -B build -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
$ cmake --build build -j 45
一旦编译成功,将会生成以下这些二进制:
$ ls -ls ${BUILD_DIR}/3fs/build/bin
total 2308428
355344 -rwxr-xr-x 1 root root 363871904 Mar 4 11:36 admin_cli
144976 -rwxr-xr-x 1 root root 148454880 Mar 4 11:30 hf3fs-admin
204336 -rwxr-xr-x 1 root root 209239320 Mar 4 11:32 hf3fs_fuse_main
277812 -rwxr-xr-x 1 root root 284476352 Mar 4 11:30 meta_main
174700 -rwxr-xr-x 1 root root 178892200 Mar 4 11:27 mgmtd_main
168300 -rwxr-xr-x 1 root root 172336688 Mar 4 11:26 migration_main
102740 -rwxr-xr-x 1 root root 105205000 Mar 4 11:19 monitor_collector_main
170628 -rwxr-xr-x 1 root root 174721688 Mar 4 11:26 simple_example_main
395964 -rwxr-xr-x 1 root root 405484072 Mar 4 11:34 storage_bench
313628 -rwxr-xr-x 1 root root 321173936 Mar 4 11:28 storage_main
步骤 3:打包二进制
因为我们需要在多台机器上部署服务,遂将需要的二进制以及配置文件打包成 tar
包,以便分发至各台机器。我们部署需要的所有都将包含在该 tar
包中:
$ cd ${BUILD_DIR}
$ mkdir -p /tmp/3fs/{conf,logs,misc/{deps,scripts}}
$ cp -r 3fs/build/bin /tmp/3fs
$ cp -r 3fs/configs/* /tmp/3fs/conf
$ cp -r 3fs/deploy/{data_placement,sql,systemd} /tmp/3fs/misc
$ cp 3fs/build/src/lib/api/libhf3fs_api_shared.so /tmp/3fs/misc/deps/
$ cp foundationdb-server_7.1.67-1_amd64.deb foundationdb-clients_7.1.67-1_amd64.deb fuse-3.16.2.tar.gz /tmp/3fs/misc/deps
$ vim /tmp/3fs/misc/scripts/setup.sh # setup 脚本的内容见以下
$ (cd /tmp; tar -zcvf 3fs-deploy.tar.gz 3fs); cp /tmp/3fs-deploy.tar.gz .
setup.sh 脚本内容如下:
#!/usr/bin/env bash
apt update
apt install -y cmake libuv1-dev liblz4-dev liblzma-dev libdouble-conversion-dev libdwarf-dev libunwind-dev \
libaio-dev libgflags-dev libgoogle-glog-dev libgtest-dev libgmock-dev clang-format-14 clang-14 clang-tidy-14 lld-14 \
libgoogle-perftools-dev google-perftools libssl-dev gcc-12 g++-12 libboost-all-dev cargo git g++ wget meson libjemalloc-dev
(
cd misc/deps
dpkg -i foundationdb-server_7.1.67-1_amd64.deb foundationdb-clients_7.1.67-1_amd64.deb
systemctl stop foundationdb
tar -zxvf fuse-3.16.2.tar.gz
cd fuse-3.16.2; mkdir build; cd build
meson setup ..
ninja
ninja install
)
2、部署
机器角色
我们一共准备了 12 台物理机:
1 台:部署监控、管理服务、元数据服务
5 台:部署数据节点(每台机器拥有 3 块盘)
6 台:部署 Fuse 客户端
并且每台机器有一张 400 Gb 支持 RDMA 的网卡,并配置 2 个网口:
ib7s400p0
、bond1
前置步骤
我们需要将上述打包的 3fs-deploy.tar.gz
分发至所有需要部署服务的机器,解压至指定目录,并安装相应依赖:
$ tar -zxvf 3fs-deploy.tar.gz -C /usr/local
$ cd /usr/local/3fs; bash misc/scripts/setup.sh
服务所有的二进制、配置、日志都在 /usr/local/3fs
目录下。如果在部署的情况下遇到错误,你可以通过查看 /usr/local/3fs/logs
下的日志来排查问题。
1. 监控存储 - ClickHouse
ClickHouse 主要用于存储监控数据,该步骤需在元数据节点执行。
1.1 安装 ClickHouse
$ apt-get install -y apt-transport-https ca-certificates curl gnupg
$ curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
$ ARCH=$(dpkg --print-architecture)
$ echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
$ apt-get update
$ apt-get install -y clickhouse-server clickhouse-client # 安装时需要输入密码,我们输入 zetyun
1.2 启动 ClickHouse 服务端
$ vim /etc/clickhouse-server/config.xml # 修改监听端口 tcp_port 为 19000
$ clickhouse start
1.3 试着启动 ClickHouse 客户端
$ clickhouse-client --port 19000 --password 'zetyun'
ClickHouse client version 25.2.1.3085 (official build).
Connecting to localhost:19000 as user default.
Connected to ClickHouse server version 25.2.1.
Warnings:
* Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. You can enable it using `echo 1 > /proc/sys/kernel/task_delayacct` or by using sysctl.
zetyun-gpu-0001 :)
1.4 创建 Metric Table
退出客户端后,运行以下命令创建 Metric Table:
$ clickhouse-client --port 19000 --password 'zetyun' -n < /usr/local/3fs/misc/sql/3fs-monitor.sql
2. 监控服务 - Monitor
该步骤需在元数据节点执行。
2.1 修改配置
需要修改一下配置,主要是 IB 网卡、各等级日志的路径、ClickHouse 监听的地址:
$ vim /usr/local/3fs/conf/monitor_collector_main.toml
[common]
cluster_id = 'zetyun'
[common.ib_devices]
device_filter = [ 'ib7s400p0' ]
[common.log.handlers]]
file_path = '/usr/local/3fs/logs/monitor_collector_main.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/monitor_collector_main-err.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/monitor_collector_main-fatal.log'
[server.base.groups.listener]
filter_list = [ 'bond1' ]
[server.monitor_collector.reporter.clickhouse]
db = '3fs'
host = '127.0.0.1'
passwd = 'zetyun'
port = '19000'
user = 'default'
2.2 启动服务
$ cp /usr/local/3fs/misc/systemd/monitor_collector_main.service /usr/lib/systemd/system
$ vim /usr/lib/systemd/system/monitor_collector_main.service # 需修改文件路径,内容见以下
$ systemctl start monitor_collector_main
monitor_collector_main.service 修改如下:
ExecStart=/usr/local/3fs/bin/monitor_collector_main --cfg /usr/local/3fs/conf/monitor_collector_main.toml
2.3 检查服务状态
检查服务运行状态:
$ systemctl status monitor_collector_main
检查监听地址是否符合预期:
$ netstat -antlp | grep LISTEN | grep monitor
tcp 0 0 172.30.12.61:10000 0.0.0.0:* LISTEN 399127/monitor_coll
检查日志是否有错误:
$ cat /usr/local/3fs/logs/monitor_collector_main-err.log
3. 存储服务 - FoundationDB
FoundationDB 主要用于存储集群配置以及文件系统的元数据(这里我们选择共用),该步骤需在元数据节点执行。
3.1 启动服务
$ systemctl start foundationdb
$ systemctl status foundationdb
3.2 检查服务状态
集群默认会监听本地的 4500
端口:
$ netstat -antlp | grep LISTEN | grep fdb
tcp 0 0 127.0.0.1:4500 0.0.0.0:* LISTEN 2336918/fdbserver
4. 配置管理员工具 - AdminClient
该步骤需在元数据节点执行。
4.1 拷贝 fdb.cluster
$ cp /etc/foundationdb/fdb.cluster /usr/local/3fs/conf/
该文件主要存储着 FoundationDB 的集群地址,用于客户端连接使用
4.2 修改配置
修改
admin_cli.toml
$ vim /usr/local/3fs/conf/admin_cli.toml
cluster_id = 'zetyun'
log = 'DBG:normal; normal=file:path=/usr/local/3fs/logs/cli.log,async=true,sync_level=ERR'
[fdb]
clusterFile = '/usr/local/3fs/conf/fdb.cluster'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
4.3 试着执行一下
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml help
如果能成功输出 Help 信息就 OK 了。
5. 集群管理服务 - Mgmtd
该步骤需在元数据节点执行。
5.1 修改配置
修改
mgmtd_main_app.toml
$ vim /usr/local/3fs/conf/mgmtd_main_app.toml
node_id = 1
修改
mgmtd_main_launcher.toml
$ vim /usr/local/3fs/conf/mgmtd_main_launcher.toml
cluster_id = 'zetyun'
[fdb]
clusterFile = '/usr/local/3fs/conf/fdb.cluster'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
修改
mgmtd_main.toml
$ vim /usr/local/3fs/conf/mgmtd_main.toml
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/mgmtd_main.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/mgmtd_main-err.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/mgmtd_main-fatal.log'
[common.monitor.reporters.monitor_collector]
remote_ip = '172.30.12.61:10000'
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 8000
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 9030
5.2 初始化集群
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml "init-cluster --mgmtd /usr/local/3fs/conf/mgmtd_main.toml 1 1048576 16"
Init filesystem, root directory layout: chain table ChainTableId(1), chunksize 1048576, stripesize 16
Init config for MGMTD version 1
1 代表 chainTable ID, 1048576 代表 chunk size, 16 代表 file strip size
该步骤会将数据写入数据库,会往 FoundationDB 监听的 4500 端口发送数据
5.3 启动服务
$ cp /usr/local/3fs/misc/systemd/mgmtd_main.service /usr/lib/systemd/system/
$ vim /usr/lib/systemd/system/mgmtd_main.service # 需修改文件路径,内容见以下
$ systemctl start mgmtd_main
mgmtd_main.service 修改内容如下:
ExecStart=/usr/local/3fs/bin/mgmtd_main --launcher_cfg /usr/local/3fs/conf/mgmtd_main_launcher.toml --app-cfg /usr/local/3fs/conf/mgmtd_main_app.toml
5.4 检查服务状态
检查服务运行状态:
$ systemctl status mgmtd_main
检查监听地址是否符合预期:
$ netstat -antlp | grep LISTEN | grep 'mgmtd_main'
tcp 0 0 172.30.12.61:8000 0.0.0.0:* LISTEN 420329/mgmtd_main
tcp 0 0 172.30.12.61:9000 0.0.0.0:* LISTEN 420329/mgmtd_main
检查日志是否有错误:
$ cat /usr/local/3fs/logs/mgmtd_main-err.log
查看节点列表:
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "list-nodes"
Id Type Status Hostname Pid Tags LastHeartbeatTime ConfigVersion ReleaseVersion
1 MGMTD PRIMARY_MGMTD hd03-gpu2-0061 420329 [] N/A 1(UPTODATE) 250228-dev-1-999999-33da0642
其他:清理服务
如果 mgmtd_main.toml 文件需要修改,你可以停掉 FoundationDB,删除对应的数据,再执行步骤 5.2:
$ systemctl stop foundationdb
$ rm -rf /var/lib/foundationdb/data/* /var/log/foundationdb/* /etc/foundationdb/*
$ dpkg -P foundationdb-clients foundationdb-server
$ dpkg -i /usr/local/3fs/misc/deps/foundationdb-clients_7.1.67-1_amd64.deb /usr/local/3fs/misc/deps/foundationdb-server_7.1.67-1_amd64.deb
$ cp /etc/foundationdb/fdb.cluster /usr/local/3fs/conf/
$ systemctl status foundationdb
6. 元数据服务 - Meta
该步骤需在元数据节点执行。
6.1 修改配置
meta_main_app.toml
$ vim /usr/local/3fs/conf/meta_main_app.toml
node_id = 100
meta_main_launcher.toml
$ vim /usr/local/3fs/conf/meta_main_launcher.toml
cluster_id = 'zetyun'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
[mgmtd_client]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
meta_main.toml
$ vim /usr/local/3fs/conf/meta_main.toml
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_meta_main.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_meta_main-err.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_meta_main-fatal.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_meta_main-event.log'
[common.monitor.reporters.monitor_collector]
remote_ip = '172.30.12.61:10000'
[server.mgmtd_client]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 8001
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 9001
[server.fdb]
clusterFile = '/usr/local/3fs/conf/fdb.cluster'
6.2 更新配置
更新配置至管理服务:
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "set-config --type META --file /usr/local/3fs/conf/meta_main.toml"
Succeed
ConfigVersion 1
6.3 启动服务
$ cp /usr/local/3fs/misc/systemd/meta_main.service /usr/lib/systemd/system
$ vim /usr/lib/systemd/system/meta_main.service # 需修改文件路径,内容见以下
$ systemctl start meta_main
meta_main.service 修改如下:
ExecStart=/usr/local/3fs/bin/meta_main --launcher_cfg /usr/local/3fs/conf/meta_main_launcher.toml --app-cfg /usr/local/3fs/conf/meta_main_app.toml
6.4 检查服务状态
检查服务运行状态:
$ systemctl status meta_main
检查监听地址是否符合预期:
$ netstat -antlp | grep LISTEN | grep meta_main
tcp 0 0 172.30.12.61:8001 0.0.0.0:* LISTEN 431374/meta_main
tcp 0 0 172.30.12.61:9001 0.0.0.0:* LISTEN 431374/meta_main
检查日志是否有错误:
$ cat /usr/local/3fs/logs/hf3fs_meta_main
查看节点列表:
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "list-nodes"
Id Type Status Hostname Pid Tags LastHeartbeatTime ConfigVersion ReleaseVersion
1 MGMTD PRIMARY_MGMTD hd03-gpu2-0061 420329 [] N/A 1(UPTODATE) 250228-dev-1-999999-33da0642
100 META HEARTBEAT_CONNECTED hd03-gpu2-0061 431374 [] 2025-03-11 11:51:12 1(UPTODATE) 250228-dev-1-999999-33da0642
7. 数据服务 - Storage
以下步骤在数据节点执行。
7.1 修改系统参数
修改 AIO 最大请求数:
$ sysctl -w fs.aio-max-nr=67108864
$ sysctl -n fs.aio-max-nr # 查看配置
7.2 修改配置
storage_main_launcher.toml
$ vim /usr/local/3fs/conf/storage_main_launcher.toml
cluster_id = 'zetyun'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
[mgmtd_client]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
storage_main.toml
$ vim /usr/local/3fs/conf/storage_main.toml
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/storage_main.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/storage_main-err.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/storage_main-fatal.log'
[common.monitor.reporters.monitor_collector]
remote_ip = '172.30.12.61:10000'
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 8000
[server.base.groups.listener]
filter_list = [ 'bond1' ]
listen_port = 9000
[server.mgmtd]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
[server.targets]
target_paths = [ '/3fs/storage/data0', '/3fs/storage/data1', '/3fs/storage/data2' ]
storage_main_app.toml
$ vim /usr/local/3fs/conf/storage_main_app.toml
node_id = 10001 # 6 台机器,配置为 10001~10006
admin_cli.toml
$ vim /usr/local/3fs/conf/admin_cli.toml
cluster_id = 'zetyun'
log = 'DBG:normal; normal=file:path=/usr/local/3fs/logs/cli.log,async=true,sync_level=ERR'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
7.3 更新配置
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "set-config --type STORAGE --file /usr/local/3fs/conf/storage_main.toml"
Succeed
ConfigVersion 1
7.4 启动服务
$ cp /usr/local/3fs/misc/systemd/storage_main.service /usr/lib/systemd/system
$ vim /usr/lib/systemd/system/storage_main.service # 修改二进制和配置文件的路径
$ systemctl start storage_main
7.5 检查服务状态
检查服务运行状态:
$ systemctl status storage_main
检查监听地址是否符合预期:
$ netstat -antlp | grep LISTEN | grep -E 'storage'
tcp 0 0 172.30.12.48:19000 0.0.0.0:* LISTEN 3379918/storage_mai
tcp 0 0 172.30.12.48:8000 0.0.0.0:* LISTEN 3379918/storage_mai
检查日志是否有错误:
$ cat /usr/local/3fs/logs/storage_main-err.log
查看节点列表:
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "list-nodes"
1 MGMTD PRIMARY_MGMTD hd03-gpu2-0061 50900 [] N/A 1(UPTODATE) 250228-dev-1-999999-33da0642
100 META HEARTBEAT_CONNECTED hd03-gpu2-0061 51569 [] 2025-03-11 19:26:09 1(UPTODATE) 250228-dev-1-999999-33da0642
10001 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0046 3382653 [] 2025-03-11 19:26:16 6(UPTODATE) 250228-dev-1-999999-33da0642
10002 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0047 3630232 [] 2025-03-11 19:26:16 6(UPTODATE) 250228-dev-1-999999-33da0642
10003 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0048 3379918 [] 2025-03-11 19:26:16 6(UPTODATE) 250228-dev-1-999999-33da0642
10004 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0049 3385727 [] 2025-03-11 19:26:16 6(UPTODATE) 250228-dev-1-999999-33da0642
10005 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0050 3631938 [] 2025-03-11 19:26:16 6(UPTODATE) 250228-dev-1-999999-33da0642
10006 STORAGE HEARTBEAT_CONNECTED hd03-gpu2-0060 253473 [] 2025-03-11 19:26:14 6(UPTODATE) 250228-dev-1-999999-33da0642
8. 配置 3FS
该步操作需要回到元数据节点操作。
8.1 创建管理员
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "user-add --root --admin 0 root"
Uid 0
Name root
Token AABCB/x58QAyLhOJ2wCGqDu4(Expired at N/A)
IsRootUser true
IsAdmin true
Gid 0
SupplementaryGids
终端会显示创建好的 token,你需要将 token 保存到 /usr/local/3fs/conf/token.txt:
$ echo AABCB/x58QAyLhOJ2wCGqDu4 > /usr/local/3fs/conf/token.txt
8.2 创建 chian table
先安装依赖:
$ apt install -y python3-pip
$ pip3 install -r /usr/local/3fs/misc/data_placement/requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
执行 data_placement 命令,需要修改 num_nodes 参数为实际 storage 的个数:
$ cd /usr/local/3fs
$ python3 /usr/local/3fs/misc/data_placement/src/model/data_placement.py -ql -relax -type CR --num_nodes 5 --replication_factor 3 --min_targets_per_disk 6
...
2025-03-11 19:38:20.416 | SUCCESS | __main__:run:148 - saved solution to: output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1
执行产生 chainTable,需要修改 node_id_begin、node_id_end、num_disks_per_node、incidence_matrix_path 等参数,incidence_matrix_path 为上一步生成的文件:
$ python3 /usr/local/3fs/misc/data_placement/src/setup/gen_chain_table.py \
--chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \
--num_disks_per_node 3 --num_targets_per_disk 6 \
--target_id_prefix 1 --chain_id_prefix 9 \
--incidence_matrix_path output/DataPlacementModel-v_3-b_6-r_6-k_3-λ_3-lb_3-ub_3/incidence_matrix.pickle
检查 output 文件:
$ ls -ls output
4 drwxr-xr-x 2 root root 4096 Mar 11 19:38 DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1
12 -rw-r--r-- 1 root root 8714 Mar 11 19:38 appsi_highs.log
12 -rw-r--r-- 1 root root 10350 Mar 11 19:39 create_target_cmd.txt
4 -rw-r--r-- 1 root root 308 Mar 11 19:39 generated_chain_table.csv
4 -rw-r--r-- 1 root root 1505 Mar 11 19:39 generated_chains.csv
12 -rw-r--r-- 1 root root 9990 Mar 11 19:39 remove_target_cmd.txt
创建 storage target:
$ /usr/local/3fs/bin/admin_cli --cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' --config.user_info.token $(<"/usr/local/3fs/conf/token.txt") < output/create_target_cmd.txt
...
Create target 101000100306 on disk 2 of 10001 succeeded
Create target 101000300306 on disk 2 of 10003 succeeded
Create target 101000500306 on disk 2 of 10005 succeeded
上传 chains 和 chain table 到 mgmtd:
$ /usr/local/3fs/bin/admin_cli --cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' --config.user_info.token $(<"/usr/local/3fs/conf/token.txt") "upload-chains output/generated_chains.csv"
Upload 30 chains succeeded
$ /usr/local/3fs/bin/admin_cli --cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' --config.user_info.token $(<"/usr/local/3fs/conf/token.txt") "upload-chain-table --desc zetyun 1 output/generated_chain_table.csv"
Upload ChainTableId(1) of ChainTableVersion(1) succeeded
检查是否上传成功,需要执行 2 条命令:
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "list-chains"
...
900300008 1 1 SERVING [] 101000100305(SERVING-UPTODATE) 101000300304(SERVING-UPTODATE) 101000400305(SERVING-UPTODATE)
900300009 1 1 SERVING [] 101000300305(SERVING-UPTODATE) 101000400306(SERVING-UPTODATE) 101000500305(SERVING-UPTODATE)
900300010 1 1 SERVING [] 101000100306(SERVING-UPTODATE) 101000300306(SERVING-UPTODATE) 101000500306(SERVING-UPTODATE)
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "list-chain-tables"
ChainTableId ChainTableVersion ChainCount ReplicaCount Desc
1 1 30 3 zetyun
9. 客户端 - Fuse Client
以下命令需要在客户端节点执行。
9.1 修改配置
保存 token(查看步骤 8.1 中生成的)
$ echo AABCB/x58QAyLhOJ2wCGqDu4 > /usr/local/3fs/conf/token.txt
hf3fs_fuse_main_launcher.toml
$ vim /usr/local/3fs/conf/hf3fs_fuse_main_launcher.toml
cluster_id = 'zetyun'
mountpoint = '/mnt/3fs'
token_file = '/usr/local/3fs/conf/token.txt'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
[mgmtd_client]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
hf3fs_fuse_main.toml
$ vim /usr/local/3fs/conf/hf3fs_fuse_main.toml
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_fuse_main.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_fuse_main-err.log'
[[common.log.handlers]]
file_path = '/usr/local/3fs/logs/hf3fs_fuse_main-fatal.log'
[mgmtd]
mgmtd_server_addresses = [ 'RDMA://172.30.12.61:8000' ]
[common.monitor.reporters.monitor_collector]
remote_ip = '172.30.12.61:10000'
admin_cli
$ vim /usr/local/3fs/conf/admin_cli.toml
cluster_id = 'zetyun'
log = 'DBG:normal; normal=file:path=/usr/local/3fs/logs/cli.log,async=true,sync_level=ERR'
[ib_devices]
device_filter = [ 'ib7s400p0' ]
9.2 更新配置
$ /usr/local/3fs/bin/admin_cli -cfg /usr/local/3fs/conf/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.30.12.61:8000"]' "set-config --type FUSE --file /usr/local/3fs/conf/hf3fs_fuse_main.toml"
9.3 挂载 Fuse
$ cp /usr/local/3fs/misc/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system
$ vim /usr/lib/systemd/system/hf3fs_fuse_main.service # 需修改文件路径,内容见以下
$ systemctl start hf3fs_fuse_main
hf3fs_fuse_main.service 修改如下:
ExecStart=/usr/local/3fs/bin/hf3fs_fuse_main --launcher_cfg /usr/local/3fs/conf/hf3fs_fuse_main_launcher.toml
9.4 检查服务状态
检查服务运行状态:
$ systemctl status hf3fs_fuse_main
检查日志是否有错误:
$ cat /usr/local/3fs/logs/hf3fs_fuse_main-err.log
检查挂载点是否存在:
$ mount | grep zetyun
尝试执行一些操作:
$ cd /mnt/3fs
$ touch f1 f2 f3
$ ls -ls
$ seq 1 1000000 > f1
$ cat f1
如果操作一切正常,代表集群已部署成功 :)
本期教程到此结束。
| 我们是谁
提供本次实操教学的为九章云极研发人员。
九章云极,全称北京九章云极科技有限公司,2013年成立,致力于人工智能基础软件的规模化应用,融合了世界前沿的人工智能技术,以自主创新的“算力包”产品和智算操作系统为载体,为广大用户提供“算力+算法”一体化AI服务。
长按二维码,领取免费算力包!
接下来我们会开辟专栏,继续在本公司深耕领域及方向做持续分享,欢迎大家留言探讨!
| 文末彩蛋
最后,为大家呈现另一款通用性更高、成本更低的存储系统—— DataCanvas DingoFS分布式存储系统,该系统由北京九章云极科技有限公司开发,于2024年11月20日首次发表,并于2025年1月14日登记。DingoFS 因其高效的数据存储和管理、支持大规模数据的分布式存储、高可用性和可扩展性在业界独树一帜,更加适用于需要处理大量数据和要求高可靠性的应用场景。DingoFS 即将推出的新版将具备更佳的元数据性能。
DingoFS 核心特性如下:
POSIX兼容性
提供与本地文件系统一致的操作体验,实现无缝系统集成
AI原生架构
深度优化大语言模型工作流,高效管理海量训练数据集与检查点工作负载
S3协议兼容
支持标准S3接口协议,实现对文件系统命名空间的便捷访问
全分布式架构
元数据服务(MDS)、数据存储层、缓存系统及客户端组件均支持线性扩展
卓越性能表现
兼具本地SSD级低延迟响应与对象存储级弹性吞吐能力
智能缓存加速体系
构建内存/本地SSD/分布式集群三级缓存拓扑,为AI场景提供高吞吐、低时延的智能I/0加速
| Alaya NeW算力云:让DeepSeek部署更简单!
借助 Alaya NeW算力云服务 提供的强大GPU资源,您可以轻松实现DeepSeek模型在云端的推理服务部署,并根据实际需求灵活使用算力,为技术创新与科研探索提供高效支持!
三步搞定一键部署,快速上手DeepSeek!
不想被复杂的配置流程困扰?别担心!只需三步,您就能轻松完成DeepSeek大语言模型的一键部署。立即行动起来吧!体验地址:
| End
欢迎同行大咖们也来体验一次 DingoFS存储系统并为DingoFS的迭代提供宝贵建议,共同推进本土大模型的演进。
| 下期预告
400G 网络性能实测3FS
3FS元数据性能详测