HowTo:CS Stash: Difference between revisions

From Computer Science Wiki
Jump to navigation Jump to search
Carnold (talk | contribs)
No edit summary
Carnold (talk | contribs)
No edit summary
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==
CS Stash is a shared departmental storage system designed for sharing files to Linux based servers and workstations.  The service is based on Ceph, and more specifically CephFS: https://docs.ceph.com/en/reef/cephfs/  The goal of the service is to provide fault tolerant, secure, high-speed file sharing to Linux machines in the department.  This project is currently in it's pilot state, so space is very limited right now.
CS Stash is a shared departmental storage system designed for sharing files to Linux based servers and workstations.  The service is based on Ceph, and more specifically CephFS: https://docs.ceph.com/en/reef/cephfs/  The goal of the service is to provide fault tolerant, secure, high-speed file sharing to Linux machines in the department.  This project is currently in it's pilot state, so space is limited right now.
 
== How it Works ==
The service uses CephFS to share files primarily to Linux based hosts.  It works like an NFS share in that it allows multiple hosts to mount R/W at the same time.  File permissions also work similar to NFS or a locally mounted file system on the host.


== Creating an Allocation ==
== Creating an Allocation ==
All allocations are created by Techstaff.  [[Contact Techstaff]] to make your request for storage space.
All allocations are created by Techstaff.  [[Contact Techstaff]] to make your request for storage space.
== Ansible Script ==
The techstaff have provided an ansible script to install and setup your central storage allocation at: https://version.cs.vt.edu/techstaff/ansible/-/tree/main/roles/csCentralStorage
You can also install manually following the instructions below.


== Installing Ceph Client ==
== Installing Ceph Client ==
Line 25: Line 33:


You can have the filesystem automatically mounted on boot by adding an entry to your <code>/etc/fstab</code> file, substitute your information as needed.  You can optionally mount a sub-path of your top level path.
You can have the filesystem automatically mounted on boot by adding an entry to your <code>/etc/fstab</code> file, substitute your information as needed.  You can optionally mount a sub-path of your top level path.
* <code><username>@.cephfs=<path>[/<sub-path] <mount_location> ceph mon_addr=stash.cs.vt.edu:3300,secretfile=<path_to_secret_file>,_netdev,noatime,wsync,rbytes,ms_mode=secure 0 0</code>
* <code><username>@.cephfs=<path>[/<sub-path] <mount_location> ceph mon_addr=stash.cs.vt.edu:3300,secretfile=<path_to_secret_file>,_netdev,noatime,rbytes,ms_mode=secure 0 0</code>


You may get notices about files in your /etc/ceph directory, but they can be ignored.
Now you can mount the storage immediately, run <code>sudo mount <mount_location></code>
* You may get notices about files in your /etc/ceph directory, but they can be ignored.
 
== Mounting Your Allocation on CS Launch ==
[[Contact Techstaff]] to have your storage allocation added to your [[HowTo:CS_Launch|CS Launch]] project.  This has the advantage of being R/W from multiple containers on CS Launch versus the built-in CS Launch storage.


== Security ==
== Security ==
Line 34: Line 46:
* All data is stored encrypted at rest
* All data is stored encrypted at rest
* If you use the "ms_mode=secure" mount option, then the data will also be encrypted on wire.  This is the recommended option.
* If you use the "ms_mode=secure" mount option, then the data will also be encrypted on wire.  This is the recommended option.
== Resizing ==
Your storage allocation can be resized quickly and easily without the need to unmount and remount the share.  A resize will be need to done by [[Contact Techstaff|Techstaff]]


== Quotas ==
== Quotas ==
Line 39: Line 54:
Here is a quick example of setting a 1GB quota on a directory of your mount:
Here is a quick example of setting a 1GB quota on a directory of your mount:
* <code>setfattr -n ceph.quota.max_bytes -v $(numfmt --from=iec 1G) /mnt/ceph/test</code>
* <code>setfattr -n ceph.quota.max_bytes -v $(numfmt --from=iec 1G) /mnt/ceph/test</code>
The command uses the value as bytes.  The ''numfmt'' command is just a convenient way to convert larger values into bytes.


== Performance ==
== Performance ==
Line 61: Line 78:
user@localhost:/mnt/ceph/test#
user@localhost:/mnt/ceph/test#
</pre>
</pre>
* Here is a very simple bash script that would keep a one week rolling list of snaphots for a directory
<pre>
#!/bin/bash
DIR=/mnt/ceph/test
NAME=$(date +%A)
rmdir "$DIR/.snap/$NAME"
mkdir "$DIR/.snap/$NAME"
</pre>
* If off-site backup is a priority, then an outside solution will need to be implemented
== Benchmarking ==
* I used the program '''fio''' to benchmark
** ''small io'' Options: <code>fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile</code>
*** This creates a 4 GB file. It performs 4 KB reads and writes using a 75%/25% split in the file, with 64 operations running at a time.
** ''large_io'' Options: <code>fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=1M --iodepth=8 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile</code>
=== Reference ===
* Results from running on a local spindle disk with xfs filesystem
** ''small io'' results:
*** Read: 618 IOPs, 2.5 MB/s
*** Write: 204 IOPs, 819 KB/s
** ''large io'' results:
*** Read: 94 IOPs, 94.7 MB/s
*** Write: 32 IOPs, 32.6 MB/s
* Results from running on a local SSD disk with xfs filesystem
** ''small io'' results:
*** Read: 32.8k IOPs, 128 MB/s
*** Write: 11k IOPs, 42.9 MB/s
** ''large io'' results:
*** Read: 142 IOPs, 142 MB/s
*** Write: 49 IOPs, 49.0 MB/s
* '''Conclusion''' on reference: SSD drive outperforms a spindle drive on small IO operations.  Actual read and write speeds are similar.
=== From 1 gigabit networking connection ===
* ''small io'' results '''no encryption on wire''':
** Read: 1912 IOPs, 7.7 MB/s
** Write: 639 IOPs, 2.6 MB/s
* ''large io'' results '''no encryption on wire''':
** Read: 91 IOPs, 91.5 MB/s
** Write: 31 IOPs, 31.5 MB/s
* ''small io'' results '''encrypted on wire''':
** Read: 1929 IOPs, 7.7 MB/s
** Write: 644 IOPs, 2.6 MB/s
* ''large io'' results '''encrypted on wire''':
** Read: 91 IOPs, 91.5 MB/s
** Write: 31 IOPs, 33.1 MB/s
* '''Conclusion''': Faster than a local spindle drive for small IO operations and almost identical performance to a local spindle drive for large IO and is network speed limited.  Encryption on wire seemed to make very little difference in performance.
=== From 10 gigabit networking connection ===
* ''small io'' results '''no encryption on wire''':
** Read: 1958 IOPs, 7.8 MB/s
** Write: 654 IOPs, 2.6 MB/s
* ''large io'' results '''no encryption on wire''':
** Read: 358 IOPs, 376 MB/s
** Write: 123 IOPs, 130 MB/s
* ''small io'' results '''encrypted on wire''':
** Read: 2168 IOPs, 8.7 MB/s
** Write: 724 IOPs, 2.9 MB/s
* ''large io'' results '''encrypted on wire''':
** Read: 321 IOPs, 337 MB/s
** Write: 110 IOPs, 111 MB/s
* '''Conclusion''': No real change in performance for small IO operations from 1gig connection, but huge gain in large IO operations.  Encrypted was actually faster for small IO operations, not sure why it could be that it is forcing the use of v2 protocol instead of v1.

Latest revision as of 12:52, 11 November 2024

Introduction

CS Stash is a shared departmental storage system designed for sharing files to Linux based servers and workstations. The service is based on Ceph, and more specifically CephFS: https://docs.ceph.com/en/reef/cephfs/ The goal of the service is to provide fault tolerant, secure, high-speed file sharing to Linux machines in the department. This project is currently in it's pilot state, so space is limited right now.

How it Works

The service uses CephFS to share files primarily to Linux based hosts. It works like an NFS share in that it allows multiple hosts to mount R/W at the same time. File permissions also work similar to NFS or a locally mounted file system on the host.

Creating an Allocation

All allocations are created by Techstaff. Contact Techstaff to make your request for storage space.

Ansible Script

The techstaff have provided an ansible script to install and setup your central storage allocation at: https://version.cs.vt.edu/techstaff/ansible/-/tree/main/roles/csCentralStorage

You can also install manually following the instructions below.

Installing Ceph Client

To mount your allocation, you will need to install the ceph client, if it is not already installed. You will need root level access to install ceph. See: https://docs.ceph.com/en/latest/install/ for more details and the latest information on installing ceph. Here are some brief instructions, using the cephadm tool that should work on any supported Linux system (Redhat, Rocky, CentOS, Ubuntu, Debian, Alma, Etc.).

  • Download the cephadm tool
  • Make the tool executable
    • chmod +x cephadm
  • Install the reef ceph repo (Note: Older OSs might have to go down the quincy release, for example Rocky Linux 8)
    • sudo ./cephadm add-repo --release reef
  • Install the ceph client
    • sudo ./cephadm install ceph-common

Mounting Your Allocation

When your allocation is created, you will be given the following information needed to mount your filesystem:

  • <username>
  • <secret_key>
  • <path>

You need to save the contents of the <secret_key> into a file, and secure the file. For example, in the file /etc/.<username>.secret with chmod mode 0600.

You can have the filesystem automatically mounted on boot by adding an entry to your /etc/fstab file, substitute your information as needed. You can optionally mount a sub-path of your top level path.

  • <username>@.cephfs=<path>[/<sub-path] <mount_location> ceph mon_addr=stash.cs.vt.edu:3300,secretfile=<path_to_secret_file>,_netdev,noatime,rbytes,ms_mode=secure 0 0

Now you can mount the storage immediately, run sudo mount <mount_location>

  • You may get notices about files in your /etc/ceph directory, but they can be ignored.

Mounting Your Allocation on CS Launch

Contact Techstaff to have your storage allocation added to your CS Launch project. This has the advantage of being R/W from multiple containers on CS Launch versus the built-in CS Launch storage.

Security

  • Your <secret_key> should be kept secret, anyone with access to the <username> and <secret_key> can mount, and have full access to your filesystem
  • The CephFS mount works like a locally mounted POSIX filesystem with ACLs enabled. You can change file ownership and permissions like you would a local filesystem.
  • All data is stored encrypted at rest
  • If you use the "ms_mode=secure" mount option, then the data will also be encrypted on wire. This is the recommended option.

Resizing

Your storage allocation can be resized quickly and easily without the need to unmount and remount the share. A resize will be need to done by Techstaff

Quotas

CephFS supports quotas. You can set a specific quota on any individual directory that effects that directory and any sub-directories. For full details, see: https://docs.ceph.com/en/latest/cephfs/quota/ Here is a quick example of setting a 1GB quota on a directory of your mount:

  • setfattr -n ceph.quota.max_bytes -v $(numfmt --from=iec 1G) /mnt/ceph/test

The command uses the value as bytes. The numfmt command is just a convenient way to convert larger values into bytes.

Performance

My benchmarks test show that performance from a 1Gbit network connection is about equivalent to a single local spindle drive. Performance from a 10Gbit network connection is much greater than a single local spindle drive.

Backups

Techstaff currently does not do any automatic backups of the data on the CS Stash service.

  • The service is fault tolerant to hardware failures
  • CephFS offers snapshot support that can help mitigate data loss due to accidental deletion. See: https://docs.ceph.com/en/reef/dev/cephfs-snapshots/ for more details.
  • Here is a brief example of creating a snapshot of a subdirectory in your allocation:
user@localhost:/# cd /mnt/ceph/test
user@localhost:/mnt/ceph/test# echo "Version 1" > version.txt
user@localhost:/mnt/ceph/test# cd .snap
user@localhost:/mnt/ceph/test/.snap# mkdir my_snapshot
user@localhost:/mnt/ceph/test/.snap# cd ..
user@localhost:/mnt/ceph/test# echo "Version 2" > version.txt
user@localhost:/mnt/ceph/test# cat version.txt
Version 2
user@localhost:/mnt/ceph/test# cat .snap/my_snapshot/version.txt
Version 1
user@localhost:/mnt/ceph/test#
  • Here is a very simple bash script that would keep a one week rolling list of snaphots for a directory
#!/bin/bash

DIR=/mnt/ceph/test
NAME=$(date +%A)

rmdir "$DIR/.snap/$NAME"
mkdir "$DIR/.snap/$NAME"
  • If off-site backup is a priority, then an outside solution will need to be implemented

Benchmarking

  • I used the program fio to benchmark
    • small io Options: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile
      • This creates a 4 GB file. It performs 4 KB reads and writes using a 75%/25% split in the file, with 64 operations running at a time.
    • large_io Options: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=1M --iodepth=8 --readwrite=randrw --rwmixread=75 --size=4G --filename=/mnt/ceph/testfile

Reference

  • Results from running on a local spindle disk with xfs filesystem
    • small io results:
      • Read: 618 IOPs, 2.5 MB/s
      • Write: 204 IOPs, 819 KB/s
    • large io results:
      • Read: 94 IOPs, 94.7 MB/s
      • Write: 32 IOPs, 32.6 MB/s
  • Results from running on a local SSD disk with xfs filesystem
    • small io results:
      • Read: 32.8k IOPs, 128 MB/s
      • Write: 11k IOPs, 42.9 MB/s
    • large io results:
      • Read: 142 IOPs, 142 MB/s
      • Write: 49 IOPs, 49.0 MB/s
  • Conclusion on reference: SSD drive outperforms a spindle drive on small IO operations. Actual read and write speeds are similar.

From 1 gigabit networking connection

  • small io results no encryption on wire:
    • Read: 1912 IOPs, 7.7 MB/s
    • Write: 639 IOPs, 2.6 MB/s
  • large io results no encryption on wire:
    • Read: 91 IOPs, 91.5 MB/s
    • Write: 31 IOPs, 31.5 MB/s
  • small io results encrypted on wire:
    • Read: 1929 IOPs, 7.7 MB/s
    • Write: 644 IOPs, 2.6 MB/s
  • large io results encrypted on wire:
    • Read: 91 IOPs, 91.5 MB/s
    • Write: 31 IOPs, 33.1 MB/s
  • Conclusion: Faster than a local spindle drive for small IO operations and almost identical performance to a local spindle drive for large IO and is network speed limited. Encryption on wire seemed to make very little difference in performance.

From 10 gigabit networking connection

  • small io results no encryption on wire:
    • Read: 1958 IOPs, 7.8 MB/s
    • Write: 654 IOPs, 2.6 MB/s
  • large io results no encryption on wire:
    • Read: 358 IOPs, 376 MB/s
    • Write: 123 IOPs, 130 MB/s
  • small io results encrypted on wire:
    • Read: 2168 IOPs, 8.7 MB/s
    • Write: 724 IOPs, 2.9 MB/s
  • large io results encrypted on wire:
    • Read: 321 IOPs, 337 MB/s
    • Write: 110 IOPs, 111 MB/s
  • Conclusion: No real change in performance for small IO operations from 1gig connection, but huge gain in large IO operations. Encrypted was actually faster for small IO operations, not sure why it could be that it is forcing the use of v2 protocol instead of v1.