(Click here for printable version)
Principles to Consider:
Minimize the number of processors. You can best achieve this today by buying the highest speed multi-core processors available. Our current top recommendations are the Xeon E5-2690 and E5-2680 8 core processors. The Xeon X5600 series 6 core processors are also good performers and recommended.
We are disappointed by the performance of 4-processor and 8-processor servers for high performance computing so at this time we will not recommend their use in a GridLink farm.
We do not support Virtual Servers for dedicated GridLink Farms at this time since we are using HPC (High Performance Computing) techniques which rely upon knowledge of the physical environment and can only be slowed down by extra layers of software.
Virtual servers may however be appropriate for EnterpriseLink Servers or Front-End Servers.
Memory and O/S
You will need at least 2GB of memory per processor core and we recommend 4GB per core or more if possible. The extra memory allows the disk subsystem to perform faster.
Please note that each Windows version has its own memory limit. For details, please review Microsoft KB article "Memory Limits for Windows Releases".
Standard 32 bit versions of Windows do not support more than 4GB of memory. To support more than 4GB, therefore, we recommend 64 bit operating systems.
With the release of Windows Server 2012, Microsoft has discontinued Windows HPC Server as a standalone operating system. For new installations, therefore, we recommend 2008 R2 Standard (for configurations with up to 32GB of memory) or 2008 R2 Enterprise (when using more than 32GB of memory).
We also support Windows Server 2012. Windows Server 2012 Standard edition supports up to 4TB.
Master Servers and Helper Servers
You can now choose how many cores you want in a server farm and how many Queues you want on that farm. Note that each Queue requires a separate server. The number of Queues should correspond to how you wish to use the farm. For example, you may wish to set up separate Queues for Pricing, Testing and Valuation. You can buy GridLink Core licenses ($1000 per core) and GridLink Queue licenses ($4000 per Queue) to support the configuration you require. Special pricing is available for Large Farms (256 cores or greater).
A server with a GridLink Queue license is a Master server, and without it, it's a Helper server. The two types of server have different requirements, especially regarding hard drives. The GridLink Queue licenses should always be attached to the servers with the fastest processors and best disk subsystems in the farm.
We strongly recommend that the datasets you run should be stored on a network share with fast and reliable connection between the server hosting this share and the Master servers in your farm. The GridLink service will automatically copy the Dataset from its current location to the drive of the Master server before the job starts, and copy it back once the job is complete. This is a change from our earlier recommendation to store the datasets on the drive of the Master server. The reason for this change is that the end user may be opening datasets located on the Master Server’s drives and running jobs or performing other disk intensive tasks locally on his/her desktop or front-end/EnterpriseLink server before or after the GridLink run This may impose an additional load on the GridLink server's drive system that can cause a significant deterioration in performance or in some cases instability on both sides: the GridLink runs as well as in the users’ interactive AXIS sessions.
Hard Drives for Master Servers
On a Master server you need large capacity, high speed and redundancy. We recommend a minimum of 600GB of capacity. The drives need to support the largest job you will run on that farm as the Master may be receiving massive simultaneous input from all the helpers working on that job. We currently support up to 512 cores per job (this has recently increased from 256), and if your farm is 256 cores or larger you need a minimum of six 15K 300GB drives in a RAID 10 configuration per server (eight such drives are preferred). If you follow our new recommendation to store the datasets on a network share outside of the GridLink farm then your company may decide it is not necessary to have mirroring in your RAID configurations because the Master only holds a temporary copy of the dataset during the duration of the run. If so you could use three 15k 300GB drives in RAID 0 instead of six 15k 300GB drives in RAID 10, and so on. You may also provide separate drives (RAID 1) for the O/S, although this is not a GridLink requirement or a GGY recommendation. You can save money and drive bays by introducing an O/S partition of say 60GB on the main RAID array.
You can set up drive arrays like this using local disks, or via a direct fiber connection to a suitable SAN.
Hard Drives for Helper Servers
AXIS / GridLink supports full failure recovery for Helpers so we do not require you to provide expensive redundant storage. The capacity requirements are lower - we recommend a minimum of 15GB per processor core. For speed we recommend 15K RAID 0. You may also need separate drives (RAID 1) for the O/S, although this is not a GridLink requirement or a GGY recommendation. We strongly recommend against the use of a SAN for Helper servers. The data is temporary and connecting to a SAN adds expense and adds a very large workload to the SAN, potentially slowing down other operations on the SAN.
Important - Hard Disk Performance and Reliability
Both master and helper servers perform a large number of time sensitive disk reads and writes. As a result, it is vital that you not use any software which interferes with native disk access. This software includes (but is not limited to):
- Data deduplication software
- Disk compression software
- Real-time defragmentation software
- Real time backup software (e.g. Microsoft Volume Shadow Copy Service)
- Real-time anti-virus scanners (please see the "Anti-Virus" section of this document for more info on anti-virus exclusions)
These types of software will cause severe instability and performance issues.
Connect up all the boxes in your farm on the same subnet using at least 1 Gbps networking. This applies even if the rest of your data centre is using a 100Mbps network. InfiniBand or 10G Ethernet is preferred but not required and will have the most benefit on the Master servers.
At this time we require a registry entry to disable SMB 2.0 which automatically downgrades to SMB1 level. Click here: AXIS GridLink Advisory - Windows 2008 with SMB 2.0 Severely Impacting AXIS GridLink for more information.
We now recommend you set up one or more AXIS EnterpriseLink front end servers to support the interactive clients who are using your farm, and also for those who do not require farm access. EnterpriseLink is included in your AXIS license.
For more information please visit AXIS EnterpriseLink web site or contact GGY.
Blades or Rack Mounted Servers
Some Blade servers offer excellent performance and flexible storage options, but many Blade servers provide very limited disk capacity and speed and do not support the fast processors, so you need to be careful that the recommended storage, memory and processors are available in the Blade chassis that you are considering. Some 1U rack servers are also limited to two 3.5" drives or four 2.5" drives. The storage requirements for Helper servers are lower than for Master servers so some blades and 1U servers may be suitable as Helpers but not as Masters unless a SAN is attached. We often recommend 2U rack servers for their storage abilities. If you are using Blade or other small servers you may get the disk speed you need by using SSD drives or Fusion IO cards.
Allow the drives to work unimpeded. This means you should not be running other disk intensive applications on these drives or this SAN while AXIS is processing and you must exclude the specific directories AXIS is writing to from any real time antivirus checking. Failure to do this will result in severe performance deterioration and will lead to disk errors. For more details see: www.ggy.com/support/kbase/kbdetails.asp?searchterm=&articleid=271
If you implement our current recommendations for storing the datasets outside of the GridLink farm you no longer have any data residing on the GridLink master’s or helper’s drives that needs backing up. Taking a snapshot or image of the server to be able to recover the operating system and installed executable files should be quite sufficient. Even this may not be needed as re-installing GridLink is usually a very quick process. You should consider backing up the farm profile after every configuration change for faster recovery. See this page for more information: https://www.ggy.com/gridlink/farm_recovery.htm
For the partitions outside of the GridLink farm that house the datasets and other user data files we now provide intelligent, safe and fully automated backup functionality built into the AXIS EnterpriseLink module. For more information please contact GGY System support.
Do not perform backups using third party tools on GridLink Master or Helper servers while AXIS is running under any circumstances. The backup you perform may lock files that AXIS needs and produce errors and the backup will also be unusable since it may be made while the dataset is changing.
For the farms that still house the datasets on the local drives of the Masters we have built an intelligent backup facility into GridLink itself. It allows you to schedule backups like other backup software but it will not attempt to back up a dataset that is running - it will wait for a suitable time. The backup it produces is simply a zip file for each Dataset or Database located in a safe place for you to backup later using your normal backup software.
AXIS will store only temporary data on the Helper servers so there is no point to backing these up.
You do not need to backup the various AXIS versions installed on the GridLink servers because they are always readily available for download from our website.
Current Hardware Recommendations:
Most major vendors (IBM, HP, Dell etc) offer rack servers designed for the Xeon E-5 2600 series
The Dell PowerEdge R720 is a good example of a 2U server that supports two Xeon E5-2600 series chips and up to eight 3.5" 15K SAS drives.
In the sample configurations below, the hard drives specifications are minimum specs and performance may be improved with faster drives, SSD drives or more drives per RAID array.
Here are some typical configurations (without SAN). You may get better performance with more drives in the RAID 10 array.
16 Core farm - 1 Queue
One 2U server (e.g. Dell 720 with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory, 4x 300GB 15K SAS drives RAID 10.
32 Core farm - 1 Queue
Two 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. One Master server with 4x 300GB 15K SAS drives RAID 10, one Helper server with 2 x 146GB 15K SAS drives, RAID 0.
64 Core farm - 2 Queues
Four 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. Two Master servers with 4x 300GB 15K SAS drives RAID 10, two Helper servers with 2 x 146GB 15K SAS drives, RAID 0.
128 Core farm - 4 Queues
Eight 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. Four Master servers with 4x 300GB 15K SAS drives RAID 10, four Helper servers with 2 x 146GB 15K SAS drives, RAID 0.
256 Core farm - 8 Queues
Sixteen 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. Four Master servers with 6x 300GB 15K SAS drives RAID 10, twelve Helper servers with 2 x 146GB 15K SAS drives, RAID 0
512 Core farm - 8 Queues
Thirty two 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, twenty four Helper servers with 2 x 146GB 15K SAS drives, RAID 0
1024 Core farm - 8 Queues
Sixty four 2U servers (e.g. Dell R720) each with 2 eight-core 2.9GHz processors (Xeon E5-2690), 64 GB memory. Eight Master servers with 8x 300GB 15K SAS drives RAID 10, fifty six Helper servers with 2 x 146GB 15K SAS drives, RAID 0