Home | Downloads | Support | About GGY | About AXIS

Failure Recovery

When using GridLink to manage AXIS batch jobs, we have failure recovery mechanism implemented in 3 levels to make sure that the batch run will succeed in most cases.

Please note that Failure Recovery will not help in the case of an AXIS master failure. The user will have to resubmit his/her job if the master has crashed. Also, in the case that a master hangs and has stopped working for a certain period, GridLink will terminate the master so that it can start the next job in the queue.

Failure in AXIS Distributed Processing

This is the same mechanism as implemented in AXIS Distributed Processing. The master monitors whether helpers are still working or not. In case that a helper stopped working, the master will recover all Cells / Targets that this helper already took and redistribute them to other AXIS copies.

Losing AXIS Helpers

In the case that a helper has crashed, GridLink will automatically restart a helper to continue to work on the current job so that the CPU resource will not be wasted.

Failure in GridLink Controllers

If a controller failed in the middle of a batch run (e.g. the server was rebooted or GridLink service was stopped), once it is restarted, it will automatically launch helpers as long as the master is still running and requires helpers.

Contact | Send a File to GGY | E-mail GGY   Search