HowTo:CS Stash
Introduction
CS Stash is a shared departmental storage system designed for sharing files to Linux based servers and workstations. The service is based on Ceph, and more specifically CephFS: https://docs.ceph.com/en/reef/cephfs/ The goal of the service is to provide fault tolerant, secure, high-speed file sharing to Linux machines in the department. This project is currently in it's pilot state, so space is limited right now.
How it Works
The service uses CephFS to share files primarily to Linux based hosts. It works like an NFS share in that it allows multiple hosts to mount R/W at the same time. File permissions also work similar to NFS or a locally mounted file system on the host.
Creating an Allocation
All allocations are created by Techstaff. Contact Techstaff to make your request for storage space.
Ansible Script
The techstaff have provided an ansible script to install and setup your central storage allocation at: https://version.cs.vt.edu/techstaff/ansible/-/tree/main/roles/csCentralStorage
You can also install manually following the instructions below.
Installing Ceph Client
To mount your allocation, you will need to install the ceph client, if it is not already installed. You will need root level access to install ceph. See: https://docs.ceph.com/en/latest/install/ for more details and the latest information on installing ceph. Here are some brief instructions, using the cephadm
tool that should work on any supported Linux system (Redhat, Rocky, CentOS, Ubuntu, Debian, Alma, Etc.).
- Download the cephadm tool
curl --silent --remote-name --location https://download.ceph.com/rpm-reef/el9/noarch/cephadm
- Make the tool executable
chmod +x cephadm
- Install the reef ceph repo (Note: Older OSs might have to go down the quincy release, for example Rocky Linux 8)
sudo ./cephadm add-repo --release reef
- Install the ceph client
sudo ./cephadm install ceph-common
Mounting Your Allocation
When your allocation is created, you will be given the following information needed to mount your filesystem:
- <username>
- <secret_key>
- <path>
You need to save the contents of the <secret_key> into a file, and secure the file. For example, in the file /etc/.<username>.secret
with chmod mode 0600.
You can have the filesystem automatically mounted on boot by adding an entry to your /etc/fstab
file, substitute your information as needed. You can optionally mount a sub-path of your top level path.
<username>@.cephfs=<path>[/<sub-path] <mount_location> ceph mon_addr=stash.cs.vt.edu:3300,secretfile=<path_to_secret_file>,_netdev,noatime,rbytes,ms_mode=secure 0 0
Now you can mount the storage immediately, run sudo mount <mount_location>
- You may get notices about files in your /etc/ceph directory, but they can be ignored.
Mounting Your Allocation on CS Launch
Contact Techstaff to have your storage allocation added to your CS Launch project. This has the advantage of being R/W from multiple containers on CS Launch versus the built-in CS Launch storage.
Security
- Your <secret_key> should be kept secret, anyone with access to the <username> and <secret_key> can mount, and have full access to your filesystem
- The CephFS mount works like a locally mounted POSIX filesystem with ACLs enabled. You can change file ownership and permissions like you would a local filesystem.
- All data is stored encrypted at rest
- If you use the "ms_mode=secure" mount option, then the data will also be encrypted on wire. This is the recommended option.
Resizing
Your storage allocation can be resized quickly and easily without the need to unmount and remount the share. A resize will be need to done by Techstaff
Quotas
CephFS supports quotas. You can set a specific quota on any individual directory that effects that directory and any sub-directories. For full details, see: https://docs.ceph.com/en/latest/cephfs/quota/ Here is a quick example of setting a 1GB quota on a directory of your mount:
setfattr -n ceph.quota.max_bytes -v $(numfmt --from=iec 1G) /mnt/ceph/test
The command uses the value as bytes. The numfmt command is just a convenient way to convert larger values into bytes.
Performance
My benchmarks test show that performance from a 1Gbit network connection is about equivalent to a single local spindle drive. Performance from a 10Gbit network connection is much greater than a single local spindle drive.
Backups
Techstaff currently does not do any automatic backups of the data on the CS Stash service.
- The service is fault tolerant to hardware failures
- CephFS offers snapshot support that can help mitigate data loss due to accidental deletion. See: https://docs.ceph.com/en/reef/dev/cephfs-snapshots/ for more details.
- Here is a brief example of creating a snapshot of a subdirectory in your allocation:
user@localhost:/# cd /mnt/ceph/test user@localhost:/mnt/ceph/test# echo "Version 1" > version.txt user@localhost:/mnt/ceph/test# cd .snap user@localhost:/mnt/ceph/test/.snap# mkdir my_snapshot user@localhost:/mnt/ceph/test/.snap# cd .. user@localhost:/mnt/ceph/test# echo "Version 2" > version.txt user@localhost:/mnt/ceph/test# cat version.txt Version 2 user@localhost:/mnt/ceph/test# cat .snap/my_snapshot/version.txt Version 1 user@localhost:/mnt/ceph/test#
- Here is a very simple bash script that would keep a one week rolling list of snaphots for a directory
#!/bin/bash DIR=/mnt/ceph/test NAME=$(date +%A) rmdir "$DIR/.snap/$NAME" mkdir "$DIR/.snap/$NAME"
- If off-site backup is a priority, then an outside solution will need to be implemented
Benchmarking
- I used the program fio to benchmark
- small io Options:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile
- This creates a 4 GB file. It performs 4 KB reads and writes using a 75%/25% split in the file, with 64 operations running at a time.
- large_io Options:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=1M --iodepth=8 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile
- small io Options:
Reference
- Results from running on a local spindle disk with xfs filesystem
- small io results:
- Read: 618 IOPs, 2.5 MB/s
- Write: 204 IOPs, 819 KB/s
- large io results:
- Read: 94 IOPs, 94.7 MB/s
- Write: 32 IOPs, 32.6 MB/s
- small io results:
- Results from running on a local SSD disk with xfs filesystem
- small io results:
- Read: 32.8k IOPs, 128 MB/s
- Write: 11k IOPs, 42.9 MB/s
- large io results:
- Read: 142 IOPs, 142 MB/s
- Write: 49 IOPs, 49.0 MB/s
- small io results:
- Conclusion on reference: SSD drive outperforms a spindle drive on small IO operations. Actual read and write speeds are similar.
From 1 gigabit networking connection
- small io results no encryption on wire:
- Read: 1912 IOPs, 7.7 MB/s
- Write: 639 IOPs, 2.6 MB/s
- large io results no encryption on wire:
- Read: 91 IOPs, 91.5 MB/s
- Write: 31 IOPs, 31.5 MB/s
- small io results encrypted on wire:
- Read: 1929 IOPs, 7.7 MB/s
- Write: 644 IOPs, 2.6 MB/s
- large io results encrypted on wire:
- Read: 91 IOPs, 91.5 MB/s
- Write: 31 IOPs, 33.1 MB/s
- Conclusion: Faster than a local spindle drive for small IO operations and almost identical performance to a local spindle drive for large IO and is network speed limited. Encryption on wire seemed to make very little difference in performance.
From 10 gigabit networking connection
- small io results no encryption on wire:
- Read: 1958 IOPs, 7.8 MB/s
- Write: 654 IOPs, 2.6 MB/s
- large io results no encryption on wire:
- Read: 358 IOPs, 376 MB/s
- Write: 123 IOPs, 130 MB/s
- small io results encrypted on wire:
- Read: 2168 IOPs, 8.7 MB/s
- Write: 724 IOPs, 2.9 MB/s
- large io results encrypted on wire:
- Read: 321 IOPs, 337 MB/s
- Write: 110 IOPs, 111 MB/s
- Conclusion: No real change in performance for small IO operations from 1gig connection, but huge gain in large IO operations. Encrypted was actually faster for small IO operations, not sure why it could be that it is forcing the use of v2 protocol instead of v1.