When Things Go Wrong...
We all run into problems when running software. You will find that AXIS is no exception. It’s pretty demanding and it’s also not perfect. I’d like to take you through the kinds of things that can go wrong, why they happen, and what you can do to fix the problems you’re having and avoid them in the future. Some of the issues will be due to things at your end, under your control or under the control of your IT support, and some will be our fault and up to us to fix.
I’ll attempt to do this without getting overly technical. Let me know if I succeed.
We have a comprehensive set of guidelines on how to set up AXIS (and also GridLink and EnterpriseLink). The guidelines go into a lot of detail about what you need to do to avoid this type of error. These include requirements on real time Antivirus exemptions, safe methods of backing up your computer, data deduplication (don’t ask) and so on. The guidelines are not just advice, they are also requirements. We expect AXIS to fail if these requirements are not met. Contact firstname.lastname@example.org for the latest requirements and guidelines.
What can go wrong?
During the normal course of its operations, AXIS will open and close files repeatedly, either to read from them or to write into them. Some of these operations are carried out explicitly by the AXIS code, and sometimes they are carried out by the database engines that we employ (Microsoft Jet, Microsoft SQL Server, and CodeBase). If we have written our code correctly, then there should be no cases where a file we wish to open is currently locked by another process. But if you are running another process which locks the files AXIS needs to open, then you can get this type of error. For example if you are running a backup on an AXIS dataset while that dataset is running a batch job, you may get an error. Or if the real time Antivirus software is scanning the AXIS dataset while a job is running, it may also lock a vital AXIS file.
Sometimes we can recover from such problems. We can include logic around our file opening code so that if it finds a file is locked it will wait then try again repeatedly. So AXIS may run successfully albeit much slower than it should run. But in other cases we cannot incorporate such logic, perhaps because it is not our code that is trying to open the file but Jet or CodeBase or SQL Server.
So we will most likely ask you to check whether your system is set up in compliance with our minimum requirements before we investigate further. Other software that involves heavy use of relational database engines will often have similar guidelines. You may not have seen them because these applications (MS Exchange, MS SQL Server, Oracle etc.) are most often set up on servers rather than your own desktop. As AXIS users migrate to a server platform through EnterpriseLink and GridLink, these problems will likely disappear.
Out of Disk Space
This is one of the easiest problems to diagnose and fix. You may need to move things to a different drive, to upgrade to a larger drive, or simply to review the directories and files on that drive and remove the ones you don’t need any more. Disk drives are inexpensive and you always need more space than you think you will. We provide guidelines for EnterpriseLink and GridLink installations.
Out of Resources
Your desktop contains a lot of resources provided or managed by the operating system for application software to use. For example there will be a certain number of file handles which limits the number of files that can be open at any one time. There will also be a fixed amount of memory, although the operating system can sometimes recover from a lack of physical memory by using part of the disk drive as memory. OK in emergencies, but not a good idea in the long run because disk drives are very much slower than regular memory.
Sometimes you may get a message about running out of resources, and the message may not be very explicit. The first thing to look at is the list of other programs you are running at the same time. Perhaps some of these can be closed, releasing the resources they use back into the pool. That doesn’t always work because the operating system is not perfect and may not release the resources when you close the applications.
One particularly nasty problem is called a memory leak. When you close an application it is supposed to release all the memory it was using back into the pool. But if that application has a memory leak, each time you open and close the application, a bit more of the computer’s memory is set aside as unusable. The only solution is to reboot the computer. The memory leak could be in AXIS, and if you suspect that’s what you are seeing, please let us know and we will investigate. It is unlikely that AXIS has a memory leak because we have tools to test for this problem which we run on a regular basis. So most likely it is in another program you are running. It may not be one of the programs that you are calling directly but may instead be in some utility called directly by the operating system itself or by another application you are running.
Other resources can leak too, and these problems are very tough to diagnose. You are well advised to apply the operating system service packs and updates to your other applications when they are offered, because all vendors will try to patch their products as soon as the leaks are identified.
It is also possible to adjust the resources on your computer through registry settings. This is too complex an issue for this article, but have your people talk to my people!
If the machine you are running AXIS on is not the one the Data (AXIS Dataset, DataLink Source File, Import/Export database) is sitting on, you will be at the mercy of your network connection. You should have a connection speed of at least 1 Gbps (gigabits per sec). In some cases people are using 10 Mbps (megabits per sec) or 100 Mbps connections or even an internet connection and they will see slow performance or even unreliable operation. You should review this with your IT. EnterpriseLink is an excellent solution. The only data that has to travel through your network connection are your keystrokes and screen images, not the massive datasets themselves.
We provide a safe way to cancel any batch job, either within an interactive AXIS session or in GridLink. It may take some time to stop safely because the current step must be completed before it is safe to stop. You may be tempted to use what we call a Brutal Stop – using the operating system to close the application directly. If you do this, your dataset may be irreparably damaged because the operating system has no idea how to close the files safely and may leave files in an incoherent state. So please avoid this method of terminating a job.
Normally when we write code, we set the initial value at the appropriate place in the code. This value may be reset once a year, once per cell, or once per model point. If we make a mistake in where we set the initial value or worse still, we omit this step, then the results can be unreliable. You might see this as a change in the results for a given cell if the cell was run on its own or as part of a batch run of multiple cells. What can we do to avoid this? Well we have developed a number of tools. One of them sets the initial value of a variable to an infinite number which will cause a crash if the variable is ever used before being set. Another approach uses software to examine the source code searching for this kind of error. I hope our code is now bug free but we certainly have experienced such bugs in the past.
In Formula Tables, you can write your own code, and you can make a similar kind of error. The answer is to take extra care and develop some firm ground rules for your own coding that leave nothing to chance.
Overflow and Underflow Errors
AXIS uses two types of numbers (variables) in its calculations. Integer and Real.
The integer numbers are the whole numbers and they are held exactly. They are subdivided into 4 types.
The first type occupies a single byte of memory and can have a range of 0 to 255 (unsigned char) or -128 to +127 (char).
The second occupies two bytes of memory and has a range of 0 to 65535 (unsigned short) or -32768 to +32767 (short).
The third occupies 4 bytes of memory and has a range of 0 to 4,294,967,295 (unsigned int) or -2,147,483,648 to 2,147,483,647 (int).
The fourth (int 64) occupies 8 bytes of memory and has a much wider range still.
Real (or floating point):
The real numbers include all values including decimals but they are never held exactly. They are held instead to a certain degree of precision. We use two types.
The first type – single precision - occupies 4 bytes and has a range of roughly 3.4 x 10-38 to 3.4 x 1038 and can have positive or negative sign. The values are generally correct to around 7 significant figures.
The second type – double precision - occupies 8 bytes and has a range of roughly 1.7 x 10-308 to 1.7 x 10308 and can have positive or negative sign. The values are generally correct to 15 significant figures.
What can go wrong?
When AXIS divides one number by a zero or a number really close to zero, the result will be larger than the variable type can store. We call that an overflow error. You can also get the same problem by multiplying two large numbers together, or by raising one large number to a sufficiently large power. We see this type of error most frequently when the assumptions in a model are unreasonable. For example, you may have a future scenario that extends for 30 years, but a projection that extends for 100 years. In that case, AXIS may use the last line in your scenario for all later years. If that line contains an equity growth rate of 50%, then the power of compound interest with 70 years of growth at 50% per year may easily lead to an overflow.
An underflow error is almost exactly the same, but instead of the number growing too large, it is growing too small – think of the reciprocal of a number that is growing too large.
Usually these errors can be fixed by reviewing the assumptions, and you may need our help to know where in your assumptions you should be looking. The last row of the scenario is always a good starting point.
Sometimes it is something we can fix in the code. For example if we allow an integer to represent the dataset size in bytes, and your Dataset grows to over 4GB that would give an overflow. And yes, we did make that mistake about 15 years ago, and we had to fix it as a bug.
Sometimes your dataset can become corrupted. It may be the result of a Brutal Stop, or it may be because of a Locked File or other AXIS crash. It may have nothing to do with AXIS. Perhaps you have a faulty hard disk or an unreliable network connection. It may also be due to an AXIS programming error, although this is very unlikely. Once your data is corrupted there may be nothing you can do, but please contact us and we’ll see if we can help. Sometimes the problem can be fixed by a database repair option.
The best practice is to recognize that this could happen to you at any time and to take the necessary precautions by backing up important data on a regular basis. We have built tools into AXIS, EnterpriseLink and GridLink to help you do this in a safe manner, and if you use the Dataset Version Control software built into EnterpriseLink, these backups can be performed in a very space efficient manner since only those files which have been changed are stored.
No hardware is 100% reliable, and neither is the operating system that controls the hardware. You may get a sudden voltage surge like when lightning strikes, and network interruptions are not infrequent. Disk drives and power supplies can fail or become unreliable, even the new solid state devices.
Being prepared for hardware failure requires that you carefully consider the value of the data you have on a given physical system. Ask yourself "if this computer were to fail and be unrecoverable, what would the damage be? Would I be able to recover my data from a backup and continue working?" If the answer is no, then it is important to develop a recovery strategy. Some elements of an effective recovery strategy include:
Don’t wait until a failure occurs to think about these "what-ifs". At that point it is too late.
Datasets Growing in Size and Processing Slowing Down
Some users will set up a dataset and then update that dataset on a regular basis by deleting all the existing Cells, Subfunds and Tables and then recreating them through DataLink. When you delete an object in a database (and the AXIS Dataset contains multiple databases) it leaves a hole where the object was and new objects do not overwrite those holes – they are attached at the end of the database. That’s true of modern relational databases but the early versions of AXIS (up to AXIS 8) used a database that I wrote which really did fill in the holes. That has some advantages but it means a delete can be a very slow process, and all objects have to be the same size. The new way is much better.
So as time passes, your database may become fragmented (another way of saying it has multiple holes in it, and therefore the file size is larger than it has to be) and even worse, may exceed 2 GB that is the size limit of an MS Jet database. Just like you may choose to defrag your hard disk, you may also tell the database to clean itself up. This process is called Compaction (a tool within AXIS), and it does more than just clean up the holes – it also performs many integrity checks and can often fix the errors it finds. After Compaction your dataset will have a smaller footprint and it will run faster. If there were errors, they may now be fixed so jobs that could not run before may now run smoothly.
Better yet, if you chose to run batches that create thousands of objects in a dataset automatically via logic in DataLink or by importing them, then design your processes so that you always start with a copy from a small template dataset containing just the necessary DataLink definitions, batch jobs and base cells etc. but not the thousands of objects that you need to remove anyway at the very beginning of your run sequence.
Exceeding the Maximum Database Size or Width
As mentioned above, the maximum size of a Microsoft Jet Database is 2GB. To support the very large datasets that may be needed for large blocks of business we introduced the CodeBase engine which has no such limitation, and we have moved many of the AXIS objects to CodeBase. CodeBase is not suitable for all database operations in AXIS so we have started a project to introduce a new local version of MS SQL Server (SQL Server 2012 Local DB) to further improve scalability, and we hope that eventually we will be able to use this engine to replace both MS Jet and CodeBase. You should be able to run any number of policies through any number of scenarios in AXIS without running out of memory or exceeding maximum database size. If you run into an issue please let us know. We may advise you to switch the format of the Import Export Database from Jet (limited) to SQL Server or CodeBase format (unlimited).
Another database limit some users run into is the maximum number of columns allowed in a database table. MS Jet supports 255 columns, MS SQL Server supports 1024 columns and CodeBase supports double that. You may need a lot of columns if for example you wish to export reports with monthly information for 80 years. Once again we may advise you to switch the format of your Import/Export database away from Jet.
You Just Don’t Like the Answer
If AXIS gives you an answer that doesn’t match your expectations then one of three things may be happening:
My first suggestion is to simplify the cell as much as possible so it is easier to validate the calculations. Running a single policy, without reinsurance may help. You can even remove lapses or mortality. This can help to isolate the problem. If you believe the problem is at our end, give us a call and we will try to isolate the issue. If it is a bug in AXIS, we will fix it. If it is a missing feature you can request an option be added to AXIS to give you the answer you need. If your settings are wrong, we can help you with that too. Our client support desk is standing by.
AXIS is a powerful and complex system. It incorporates multiple relational databases, supports High Performance Computing methods and deals with massive (virtually unlimited) amounts of computation and data, all within a fixed memory footprint. We want you to have the very best experience possible, and our aim is that you can always run smoothly without ever running into any of these issues. So please follow our minimum requirements in full and, if possible, also follow our recommendations. If something goes wrong, let us know and we’ll help you to sort it out.
If something does go wrong, it may be our fault. Since we started in 1989, we have solved every single problem which is our fault, and we have also found our way around many issues caused by errors in the operating system and even faults in client equipment. Even if the problem is at your end, we will help as much as we can and we will not close a case until the problem is solved.
We just want to get you going again as soon as we can. That’s our job.