Home | Downloads | Support | About GGY | About AXIS

Article Title:
Recommended Hardware for a GridLink Farm - July 2014
Article Type:
General System  
Article ID #:
677
GGY Contacts :
Phil Gold
Last Modified:
30 Jul 14
Article Summary:

You may be considering a farm with a certain number of cores and job queues to satisfy your current processing requirements. We strongly recommend you to consider your future needs before setting up even a small farm, so that you can continue to use the hardware you choose now as part of the larger farm you may need tomorrow.
 

(Click here for printable version)

Article Detail:

Principles to Consider:

Multi-core Processors

Minimize the number of processors. You can best achieve this today by buying the highest speed multi-core processors available.

With the release of the E5 V2 Xeons,  Intel now offers 4, 6, 8, 10 and 12 cores per processor in two processor servers.   In general, the higher the clock speed the better each core will perform.   Sixteen 3.5 GHz cores, for example, will outperform sixteen 2.7 GHz cores. 3.5 GHz clock speeds, however, are only available on 6 core processors while 2.7 GHz processors are available with 12 cores.  So you would need twice as many 3.5 Ghz processors to get the same number of cores. As a result, other factors such as server and operating system costs, licensing fees, and your datacenter density and power consumption requirements should be considered along with your speed requirements to determine the optimal processors for your environment.  

Our top recommendations are as follows: 

    Xeon E5-2697 v2   2.70 GHz 12 cores
    Xeon E5-2695 v2   2.40 GHz 12 cores
Xeon E5-2690 v2   3.00 GHz 10 cores
    Xeon E5-2680 v2  2.80 GHz 10 cores
Xeon E5-2658 v2 2.40 GHz 10 cores (if low power consumption is a requirement)
    Xeon E5-2667 v2 3.30 GHz 8 cores

We are disappointed by the performance of 4-processor and 8-processor servers for high performance computing so at this time we will not recommend their use in a GridLink farm.

Virtual Servers

We do not recommend virtual machines to be used as GridLink servers, because the virtual layer reduces the performance of the hardware, but we do support virtual machines in those environments where there is no direct access to “bare metal” hardware, such as clouds.

This support is conditional on meeting the following minimum requirements:

  • There is one-to-one correspondence between physical (non-hyperthreaded) cores on the host and virtual cores allocated to the VM running GridLink
  • There is minimum 2GB per core of physical RAM on the host backing the RAM allocated to the VM (4GB per core is recommended)
  • There is sufficient disk space allocated to the VM as a local drive (minimum 15 GB per core, 30 GB recommended)
  • The performance of the disks in the host is on a par with our recommended configuration for a physical machine (min 2 way striping x 15 K, RAID 10 with min 4 15K disks recommended for a master)
  • Minimum 1Gbps network connectivity (single subnet)
  • Performance and stability can only be guaranteed if a single VM is hosted on a physical server with one-to-one correspondence between the size of the host and the size of the VM. If multiple VM’s are running on the same host the effect of their load on the GridLink VM is unpredictable. Note: this requirement from GGY is consistent with the approach chosen by all major cloud providers for compute intensive (HPC) applications like AXIS.
Virtual servers may however be appropriate for EnterpriseLink Servers or Front-End Servers if adequate processing power and memory as per GGY requirements for each concurrent user session is backed up by the identical number of CPU cores and the amount of RAM in the host machine.

Memory and O/S

You will need at least 2GB of memory per processor core and we recommend 4GB per core or more if possible. The extra memory allows the disk subsystem to perform faster.

Please note that each Windows version has its own memory limit. For details, please review Microsoft KB article "Memory Limits for Windows Releases".

Standard 32 bit versions of Windows do not support more than 4GB of memory. To support more than 4GB, therefore, we recommend 64 bit operating systems.

With the release of Windows Server 2012, Microsoft has discontinued Windows HPC Server as a standalone operating system. For new installations, therefore, we recommend 2008 R2 Standard (for configurations with up to 32GB of memory) or 2008 R2 Enterprise (when using more than 32GB of memory).

We also support Windows Server 2012. Windows Server 2012 Standard edition supports up to 4TB.

Master Servers and Helper Servers

You can now choose how many cores you want in a server farm and how many Queues you want on that farm. Note that each Queue requires a separate server. The number of Queues should correspond to how you wish to use the farm. For example, you may wish to set up separate Queues for Pricing, Testing and Valuation. You can buy GridLink Core licenses ($1000 per core) and GridLink Queue licenses ($4000 per Queue) to support the configuration you require. Special pricing is available for Large Farms (256 cores or greater).

A server with a GridLink Queue license is a Master server, and without it, it's a Helper server. The two types of server have different requirements, especially regarding hard drives. The GridLink Queue licenses should always be attached to the servers with the fastest processors and best disk subsystems in the farm.

We strongly recommend that the datasets you run should be stored on a network share with fast and reliable connection between the server hosting this share and the Master servers in your farm. The GridLink service will automatically copy the Dataset from its current location to the drive of the Master server before the job starts, and copy it back once the job is complete. This is a change from our earlier recommendation to store the datasets on the drive of the Master server. The reason for this change is that the end user may be opening datasets located on the Master Server’s drives and running jobs or performing other disk intensive tasks locally on his/her desktop or front-end/EnterpriseLink server before or after the GridLink run This may impose an additional load on the GridLink server's drive system that can cause a significant deterioration in performance or in some cases instability on both sides: the GridLink runs as well as in the users’ interactive AXIS sessions.

Hard Drives for Master Servers

On a Master server you need large capacity, high speed and redundancy. We recommend a minimum of 600GB of capacity. The drives need to support the largest job you will run on that farm as the Master may be receiving massive simultaneous input from all the helpers working on that job. We currently support up to 512 cores per job and if your farm is 256 cores or larger you need a minimum of six 15K 300GB drives in a RAID 10 configuration per server (eight such drives are preferred). If you follow our new recommendation to store the datasets on a network share outside of the GridLink farm then your company may decide it is not necessary to have mirroring in your RAID configurations because the Master only holds a temporary copy of the dataset during the duration of the run. If so you could use three 15k 300GB drives in RAID 0 instead of six 15k 300GB drives in RAID 10, and so on. You may also provide separate drives (RAID 1) for the O/S, although this is not a GridLink requirement or a GGY recommendation. You can save money and drive bays by introducing an O/S partition of say 60GB on the main RAID array.

You can set up drive arrays like this using local disks, or via a direct fiber connection to a suitable SAN.
 
We get a lot of questions about SSD devices instead of traditional spinning disks. They have improved a lot in terms of reliability. They are very fast for random access reads and writes, but this does not necessarily translate into improved AXIS performance since for most of the time we rely on  sequential reads and writes where they offer little advantage. They may be a very good idea if space is limited, but generally the prices are too high for us to recommend in preference to regular disks at this time.


Hard Drives for Helper Servers

AXIS / GridLink supports full failure recovery for Helpers so we do not require you to provide expensive redundant storage. The capacity requirements are lower - we recommend a minimum of 15GB per processor core. For speed we recommend 15K RAID 0. You may also need separate drives (RAID 1) for the O/S, although this is not a GridLink requirement or a GGY recommendation. We strongly recommend against the use of a SAN for Helper servers. The data is temporary and connecting to a SAN adds expense and adds a very large workload to the SAN, potentially slowing down other operations on the SAN.

SSD devices may be quite appropriate especially where drive bay capacity is limited.

Important - Hard Disk Performance and Reliability

Both master and helper servers perform a large number of time sensitive disk reads and writes. As a result, it is vital that you not use any software which interferes with native disk access. This software includes (but is not limited to):

  • Data deduplication software
  • Disk compression software
  • Real-time defragmentation software
  • Real time backup software (e.g. Microsoft Volume Shadow Copy Service)
  • Real-time anti-virus scanners (please see the "Anti-Virus" section of this document for more info on anti-virus exclusions)

These types of software will cause severe instability and performance issues.

Network

Connect up all the boxes in your farm on the same subnet using at least 1 Gbps networking. This applies even if the rest of your data centre is using a 100Mbps network. InfiniBand or 10G Ethernet is preferred but not required and will have the most benefit on the Master servers.

At this time we require a registry entry to disable SMB 2.0 which automatically downgrades to SMB1 level. Click here: AXIS GridLink Advisory - Windows 2008 with SMB 2.0 Severely Impacting AXIS GridLink for more information.

AXIS EnterpriseLink

We now recommend you set up one or more AXIS EnterpriseLink front end servers to support the interactive clients who are using your farm, and also for those who do not require farm access. EnterpriseLink is included in your AXIS license.

For more information please visit AXIS EnterpriseLink web site or contact GGY.

Blades or Rack Mounted Servers

Some Blade servers offer excellent performance and flexible storage options, but many Blade servers provide very limited disk capacity and speed and do not support the fast processors, so you need to be careful that the recommended storage, memory and processors are available in the Blade chassis that you are considering. Some 1U rack servers are also limited to two 3.5" drives or four 2.5" drives. The storage requirements for Helper servers are lower than for Master servers so some blades and 1U servers may be suitable as Helpers but not as Masters unless a SAN is attached. We often recommend 2U rack servers for their storage abilities. If you are using Blade or other small servers you may get the disk speed you need by using SSD drives or Fusion IO cards.

Anti-Virus

Allow the drives to work unimpeded. This means you should not be running other disk intensive applications on these drives or this SAN while AXIS is processing and you must exclude the specific directories AXIS is writing to from any real time antivirus checking. Failure to do this will result in severe performance deterioration and will lead to disk errors. For more details see: www.ggy.com/support/kbase/kbdetails.asp?searchterm=&articleid=271

Backups

If you implement our current recommendations for storing the datasets outside of the GridLink farm you no longer have any data residing on the GridLink master’s or helper’s drives that needs backing up. Taking a snapshot or image of the server to be able to recover the operating system and installed executable files should be quite sufficient. Even this may not be needed as re-installing GridLink is usually a very quick process. You should consider backing up the farm profile after every configuration change for faster recovery. See this page for more information: https://www.ggy.com/gridlink/farm_recovery.htm

For the partitions outside of the GridLink farm that house the datasets and other user data files we now provide intelligent, safe and fully automated backup functionality built into the AXIS EnterpriseLink module. For more information please contact GGY System support.

Do not perform backups using third party tools on GridLink Master or Helper servers while AXIS is running under any circumstances. The backup you perform may lock files that AXIS needs and produce errors and the backup will also be unusable since it may be made while the dataset is changing.

For the farms that still house the datasets on the local drives of the Masters we have built an intelligent backup facility into GridLink itself. It allows you to schedule backups like other backup software but it will not attempt to back up a dataset that is running - it will wait for a suitable time. The backup it produces is simply a zip file for each Dataset or Database located in a safe place for you to backup later using your normal backup software.

AXIS will store only temporary data on the Helper servers so there is no point to backing these up.

You do not need to backup the various AXIS versions installed on the GridLink servers because they are always readily available for download from our website.

Current Hardware Recommendations:

Most major vendors (IBM, HP, Dell etc) offer rack servers designed for the Xeon E-5 2600 series

The Dell PowerEdge R720 is a good example of a 2U server that supports two Xeon E5-2600 series chips and up to eight 3.5" 15K SAS drives.

In the sample configurations below, the hard drives specifications are minimum specs and performance may be improved with faster drives, SSD drives or more drives per RAID array.

Sample Configurations:

Here are some typical configurations (without SAN). You may get better performance with more drives in the RAID 10 array.

2U Configurations based on 8 core chips:

16 Core farm - 1 Queue

One 2U server (e.g. Dell R720 with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory, 4x 300GB 15K SAS drives RAID 10.

32 Core farm - 1 Queue

Two 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. One Master server with 4x 300GB 15K SAS drives RAID 10, one Helper server with 2 x 146GB 15K SAS drives, RAID 0.

64 Core farm - 2 Queues

Four 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. Two Master servers with 4x 300GB 15K SAS drives RAID 10, two Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

128 Core farm - 4 Queues

Eight 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. Four Master servers with 4x 300GB 15K SAS drives RAID 10, four Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

256 Core farm - 8 Queues

Sixteen 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. Four Master servers with 6x 300GB 15K SAS drives RAID 10, twelve Helper servers with 2 x 146GB 15K SAS drives, RAID 0

512 Core farm - 8 Queues

Thirty two 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, twenty four Helper servers with 2 x 146GB 15K SAS drives, RAID 0

1024 Core farm - 8 Queues

Sixty four 2U servers (e.g. Dell R720) each with 2 eight-core 3.3GHz processors (Xeon E5-2667 v2), 64 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, fifty six Helper servers with 2 x 146GB 15K SAS drives, RAID 0

 

2U Configurations based on 10 core chips:

20 Core farm - 1 Queue

One 2U server (e.g. Dell R720 with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory, 4x 300GB 15K SAS drives RAID 10.

40 Core farm - 1 Queue

Two 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. One Master server with 4x 300GB 15K SAS drives RAID 10, one Helper server with 2 x 146GB 15K SAS drives, RAID 0.

80 Core farm - 2 Queues

Four 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. Two Master servers with 4x 300GB 15K SAS drives RAID 10, two Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

160 Core farm - 4 Queues

Eight 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. Four Master servers with 4x 300GB 15K SAS drives RAID 10, four Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

320 Core farm - 8 Queues

Sixteen 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. Four Master servers with 6x 300GB 15K SAS drives RAID 10, twelve Helper servers with 2 x 146GB 15K SAS drives, RAID 0

640 Core farm - 8 Queues

Thirty two 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, twenty four Helper servers with 2 x 146GB 15K SAS drives, RAID 0

1280 Core farm - 8 Queues

Sixty four 2U servers (e.g. Dell R720) each with 2 ten-core 3.0GHz processors (Xeon E5-2690 v2), 96 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, fifty six Helper servers with 2 x 146GB 15K SAS drives, RAID 0

 

2U Configurations based on 12 core chips:

24 Core farm - 1 Queue

One 2U server (e.g. Dell R720 with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory, 4x 300GB 15K SAS drives RAID 10.

48 Core farm - 1 Queue

Two 2U servers (e.g. Dell R720) each with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory. One Master server with 4x 300GB 15K SAS drives RAID 10, one Helper server with 2 x 146GB 15K SAS drives, RAID 0.

96 Core farm - 2 Queues

Four 2U servers (e.g. Dell R720) each with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory. Two Master servers with 4x 300GB 15K SAS drives RAID 10, two Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

192 Core farm - 4 Queues

Eight 2U servers (e.g. Dell R720) each with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory. Four Master servers with 4x 300GB 15K SAS drives RAID 10, four Helper servers with 2 x 146GB 15K SAS drives, RAID 0.

384 Core farm - 8 Queues

Sixteen 2U servers (e.g. Dell R720) each with2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory.  Eight Master servers with 6x 300GB 15K SAS drives RAID 10, twelve Helper servers with 2 x 146GB 15K SAS drives, RAID 0

768 Core farm - 8 Queues

Thirty two 2U servers (e.g. Dell R720) each with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, twenty four Helper servers with 2 x 146GB 15K SAS drives, RAID 0

1536 Core farm - 8 Queues

Sixty four 2U servers (e.g. Dell R720) each with 2 twelve-core 2.7GHz processors (Xeon E5-2697 v2), 96 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, fifty six Helper servers with 2 x 146GB 15K SAS drives, RAID 0

Contact | Client Login / File Upload | E-mail GGY   Search