User Tools

Site Tools


cluster:186


Back

Solution: TrueNAS, ZFS, 190T usable, RaidZ2 6 spares, read cache, 800G write cache, self healing, snapshots, compression on, deduplication off, encryption off, dual controllers (high availability), 64G ram, 6x 1Gbe RJ45, SAS drives (not SATA), three year warranty, ssh access​
Henk 2019/11/16 12:15

Bought 3 Juniper EX3300-48P switches to support LACP bonding of Ethernet ports for one public and two private subnets serving up /home in the future.
Henk 2019/12/13 08:34

Home Dir Server

Looking for an appliance to replace my home directory server which is a Supermicro 4U storage box, using XFS and exporting /home via NFS v3 to compute nodes over a 1 Gbe Ethernet network (which seems fine for our usage, we leverage a lot of local scratch spaces).

We could continue that but our biggest problem is point-in-time restores (snapshots). Using rsync to create snapshots of /home roughly takes 3 days and 3 hours (23 millions files). Add to that our DR copy of active users and you have a backup window of 4.5 days.

FreeNAS would be the answer but I don't have time too build that from scratch. Trying to find an appliance in the $40K budget. Here is my “required” and “desired” functionality:

  • Required
    • native snapshots
    • self healing
    • 100T usable (expandable to 300T)
    • NFS v3
    • quotas
    • account ingestion via simple CSV file or something
    • raid 60 or better
  • Desired
    • “snapshots plus replication”, High Availability pair
    • SSD cache
    • start at 300T
    • Shell access
    • CIFS/Samba access

Appliances probably provide lots more functionality we will not be using (cloud sync, video, music, pics, printing, vmware, etc…). Domain/LDAP maybe…

“Snapshot+Replication” vs HyperBackup
https://www.synology.com/en-us/knowledgebase/DSM/tutorial/Backup/How_to_back_up_your_Synology_NAS

DataOnTap/WAFL

  • ConRes Solution
    • FAS2720 (within budget)
    • no SSD just 12x8T
    • very small storage footprint

Supermicro/ZFS

  • 1U server, 2x10G LAN, 500W 1+1
    • dual 960G SSD, Raid 1 for Linux OS
    • single xeon 6242 16-core 2.8 Ghz
    • 384G memory (finally!)
    • needs more NICs, dual bond 10.10 and 192.168 plus 129.133 access
    • also enable Samba/CIFS, Rsync and SSH direct access
    • ZFS 2×1.6T L2ARC SSD cache
  • 4U storage box, direct connected, 1280W 1+1
    • 44x4T 144T raw, 2 hot spares (or 6T for 216T or 8T for 288T)
  • ZFS … 3 vdevs RAIDZ2
    • standard Linux utilities for NFS mounts
    • self healing, 2 hot spares
    • compression (on), deduplication (off for performance), thin provisioning (no need)
    • snapshots (yes)
  • old sharptail as remote, simple backup target
  • Microway Solution:
    • 1U+4U boxes, 1.6T ZFS Cache, ~200T usable (44x8T), ZFS installed and configured

TrueNAS/ZFS

    • 1x120T raw, HA at $38K list price, probably breaks my budget
    • SAS drives though no cache needed
    • 2U, up to 500T (two ESX expansion shelves, 12T disks)
    • hybrid SSD cache
    • 6x1Gbe, single PSU (really?)
    • 64G memory max
  • ZFS
    • FreeNAS compliant
    • self healing
    • compression, deduplication, thin provisioning (no extra costs)
    • Unlimited file version retention, restoration, and replication
  • SMB, AFP, and NFS for file storage,
  • iSCSI and Fibre Channel for block storage, and
  • S3-compatible APIs for object storage.
  • IXsystems Solution:
    • dual single controller boxes with ~120T raw breaks our budget
  • no follow quote for single large box…but they did on 11/09
    • 2U TrueNAS X20 with Dual Controllers (HA)
    • NFS v3/4, CIFS/SMB, iSCSI, AFP, WebDav, FTP, S3 API, WebUI
    • Self-healing, Compression, Thin Provisioning, Snapshots, Clones, Replication
    • SSH access
    • 64GB RAM Cache, 12 vCPU (6-core?)
    • 6x 1GbBaseT (bond for 192.168, 10.10 private networks and public 129.133?)
    • 1x High-Performance SSD Write Cache and 1x 800GB SAS SSD Read Cache (write cache 4 disks?)
    • 4U Rackmount Expansion Shelf (redundant power)
    • TrueNAS 8TB Enterprise Nearline SAS 7200RPM 128MB Cache (total is 36 bays 12+24)
      • 272T raw, 190T usable storage under ZFS
      • Raid profile 9+2 (RaidZ2 needs two hot spares not one?)
    • one year warranty NBD (get year 2+3 out devel fund?)

Synology

    • 3U, 16 bays up to 28 expandable (16×12=192T, 28×12=336T)
    • single four core cpu (little surprising), 2.1 Ghz
    • 4G memory up to 64G (advisable)
    • 4xGbe, 2x PSU 1+1 plus 2x10G for cluster and heartbeat
    • 2 to the 64th power max file size
    • metadata mirroring (2 copies)
    • self-healing on reads (checksums)
    • raid 6 (not 60?)
    • enables snapshots (point in time restores), replication
    • hot spares (yes), cold spares (CA shipping), SSD cache (yes)
  • Snapshots (scheduable, auto delete, 2014 shared folders?, 65,000 system wide)
    • self serve restores via Windows Explore, File Station (linux use Samba?)
    • snapshots go into Shared folder volume (all incremental)
  • User accounts (autoload from CSV username,passwd,group or via GUI: not confirmed yet)
    • StoragePool1 contains Volume1 contains shared folder “home” and “homes”
      • what's the diff, which one gets NFS exported (“homes”?) to compute nodes (chapter 9)
      • either one, will have to see where they popup, also \\servername\%u not confirmed
    • User access via Samba \\homes\%u shares? (self reset passwords) Or…
    • No user access (random password) (music, pics, video uploads…these services can be disabled)
    • No quota as goal…or one huge one as protection (5T/user?)
  • Budget ($40K) can sustain HA (active/passive setup)?
    • “snapshot and replicate”, that would be great
    • auto fail over, auto fail back or reverse roles/replication (yes)
    • high-availability cluster will share a single server name
    • each server in the high-availability cluster has its own IP address, the high-availability cluster will also generate a common IP address (private? public?)
  • Shell access (yes)
  • ExxactCorp Solution (within budget)
    • 2x RS2818RP+ appliance plus 2x RS2017RP expansion shelves
    • max memory at 2x 64G (so low!)
    • fully populated with disks 2x 336T raw
    • allowing for snapshots, replication and fail over
    • no SSD cache, GUI management
  • ConRes Solution (within budget)
    • 2x RS2818RP+ appliances
    • max memory at 2x 64G (so low!)
    • fully populated with disks 16x12T 196T raw
    • allowing for snapshots, replication and fail over
    • no SSD cache, GUI management

QNAP

    • 4U, 24 bays (288G, expansion up to 8 12 bay shelves= 96 bays or 1,152T)
    • single four core (little surprising), 3.4 Ghz
    • 4G memory, up to 32 GB (very, very low for millions of files!)
    • 4x Gbe, 2x PSU 1+1
    • SSD cache acceleration (good!)
  • QTP file system (ready for millions of files?)
    • Raid 60
  • ConRes Solution:
    • NA

Homedirs Update

Starting an HPC project to replace our home directory storage server which is now six years old. We have added several projects which bought their own storage and we'd like to consolidate it all onto a single platform and a grow-in-place solution for the future. The current storage footpring for /home would total 40T. Not knowing snapshot storage requirements we would like to start at 100T and grow-in-place to 300T usable.

We only use NFS mounts on a 1Gbe network (which seems fine for our work load) and maybe new purchases will add 10Gbe. Most data is staged on scratch spaces local to the compute nodes so we are phasing out Infiniband (IpoIB). No replication at the current time (may be added in the future). The old server will be expanded to a 100T backup target using rsync deamons.

We would like to have native snapshot functionality for point in time restores and checksumming on read or writes or both. Account management will remain on our login nodes and some automated pathway will be designed to add user accounts to the storage appliance (if needed).

We loosely are targeting a budget of $40K. Open to suggestions. Our sense is that BeeGFS is a little over the top for us and some form of FreeNAS appliance will be sufficient.


Back

cluster/186.txt · Last modified: 2019/12/13 13:36 by hmeij07