CONNECT WITH US
By Phone
By Email
Newsletter 

Industries

Assessing Availability Options for Financial Services Companies

The financial services industry has been a leader in the drive to improve systems availability for many years. That’s not surprising when you look at the applications in a typical financial services business. Payment systems that transfer billions of dollars per day. Trading systems that can make or lose millions based on market movements that happen in less than a second. Online banking for customers who demand access to their money 24 hours a day, 7 days a week, anywhere in the world.

A financial services company’s operations are always under scrutiny. Customers will take their business elsewhere if they lose confidence that the bank is handling their transactions correctly. Competitors will take advantage of any availability issues that become public. And regulators will come down hard on any financial institution that doesn’t have the proper safeguards to ensure it doesn’t lose or corrupt transactions or customer data.

For all of these reasons the financial services industry was one of the first to use proprietary fault-tolerant and high-availabilty solutions. Although these solutions were very expensive to acquire, implement and manage, they could be justified by the fact that just one failure in an unprotected system could have an impact far beyond the cost of deploying and managing these proprietary systems.

FAULT-TOLERANT FOR THE FEW

Fault-tolerant servers provide continuous availability through hardware failures by housing redundant components in a single chassis. These solutions are generally built on proprietary hardware and in some cases proprietary operating systems. This architecture requires additional training and maintenance procedures as well as additional hardware components to manage. It also comes at a cost that is often much higher than other solutions. Not only is initial cost usually high, but replacement modules usually have to be purchased from the system manufacturer rather than on the open market, at a much higher price. Those solutions using proprietary operating systems also severely restrict your ability to use standard software – databases, tools and applications.

Another major limiting factor of fault-tolerant servers is that they can not protect against site failures, as all components are in a single chassis and can not be separated into different locations. Should the datacenter or building suffer an outage, the fault-tolerant server will also become unavailable, resulting in downtime for your applications.

The cost and complexity of these solutions means they are generally restricted to a few mission-critical applications where downtime and data loss means huge risks. But as more and more applications within financial services companies have become business critical – consider email for example – the drawbacks of proprietary fault-tolerant solutions have become more significant.

A BROAD RANGE OF AVAILABILTY OPTIONS

The need to improve systems availability at lower cost has resulted in a proliferation of high availability, and disaster recovery solutions. Although they all may have a place in some organizations, very few of them tackle the two big availability issues faced by financial services companies:

  • How do we ensure all critical systems are up and running when they have to be available - whether this is 7x24, through the business day or just for a few critical hours every day?
  • How do we ensure that if there are failures, we can guarantee the integrity of our data?

There are literally dozens of “high availability” products that provide some level of application protection, and if you read the marketing literature, most of them sound about the same. For ease of understanding the myriad of products in this space, they can generally be categorized into two groups: shared-disk clusters, and replication and failover solutions.

SHARED-DISK CLUSTERS

Shared-disk clusters are a common solution for providing application availability. There are only a few true shared-disk clustering solutions for Windows environments, as many that claim to be “cluster-like” utilize replication and failover architectures. The typical cluster model can consist of multiple servers (although are often configured with just two) connected to a shared storage device. Each server is fully configured, including operating systems and applications; one is “active” and the other is the “standby.” When a failure occurs in the active server, the clustering software will start the standby application to take over processing. While they can’t provide continuous availability, clusters can provide some level of improved availability by restarting applications on a standby server.

Clustering solutions have been around for a number of years, but they still entail substantial downsides. For one, the shared-disk model does not provide data redundancy or any level of protection of the actual data, imposing a single point of failure in the solution. In order to protect the data, you must implement a RAID system or complex and expensive SAN.

Clusters don’t support just any application. Applications must be “cluster aware” to work in the clustered environment and to fail over and restart when necessary. Of course applications can be made “cluster aware” but doing so is very costly. It can take months of development and implementation as well as ongoing testing and maintenance.

Finally, clusters are complex to setup and maintain. They require very specialized expertise to be implemented and properly managed, and this expertise has to be available fulltime whenever the system is running. If this level of expertise does not exist in all locations where the systems will be deployed (e.g. branch offices), cluster solutions are not good option.

REPLICATION AND FAILOVER

This is the category that has caused the most confusion, yet provides the lowest level of availability with a fairly heavy implementation process. Replication products are primarily designed to move data from one server to another server using an asynchronous model to allow for unlimited distances between the two servers. For business needs that require disk-to-disk backup over long-distance WAN, replication is certainly a viable choice. But since we are talking about achieving availability and not simply performing backup, using a replication product for availability will not likely satisfy the requirement. As with clusters, replication products require two fully configured servers and simply restart the application on the standby server should the active server have a failure.

Applications using replication don’t have to be made “cluster aware,” but they do require extensive scripting to allow the failover process to happen and for applications to restart on the standby server. Due to their implementation of the failover mechanism, restart times can take upwards of 90 minutes. A 90 minute interruption is probably not what most financial institutions would consider “high availability.”

Claiming “real-time” and “2 minute failovers” sounds reasonable, but it’s important to understand the realities of replication technologies to be aware of how they will actually perform in your environment. Asynchronous replication solutions inherently lose data upon failure and failover. Even if a few minutes of data loss may be acceptable to the business, it could result in corruption of the underlying database due to missing transactions. This corruption can take hours to manually rebuild, during which time the application is unavailable to your users and customers. While a few minutes of lost data may sound acceptable at first glance, looking a bit deeper unveils the potential for excessive downtime and manual data recovery. Disk-to-disk replication has its place, but it’s not the best option to provide availability for your business applications.

AUTOMATED APPLICATION AVAILABILITY

Marathon’s everRunTM software enables financial services companies of all sizes to improve the availability of key systems without the drawbacks of proprietary hardware, clustering or failover/replication scripting. Whether you are a financial services application developer, or a financial services IT executive running key business applications, everRun offers high-availability, even continuous availability, for standard, unmodified Windows applications running on standard server hardware. No special hardware, no need to develop “cluster-aware” applications, no scripting, no complex fail-over scenarios, no need for extensive special training. By eliminating the costs and complexities associated with other availability solutions, everRun allows you to provide improved availability for a wider range of applications – applications which have become more critical over the past few years, but which have not in the past justified the investment required for improved availability.

The range of financial services Windows applications which can take advantage of the everRun software is almost endless, but can be divided into three main categories. First, applications which need to operate 24 hours a day and 7 days a week. This includes most customer-facing applications such as ATM and POS systems, internet banking and business cash management as well as key internal applications such as Exchange. Second, applications that must have uninterrupted availability during business hours, such as trading systems, payments and message handling systems, trade finance and branch network support. And third, applications which must be available to complete processing by specific times and in particular time windows, including item processing and imaging solutions. Indeed, using everRun you can now cost-effectively improve the availability of almost any financial services application..

Financial services companies around the globe are using everRun to gain these unique advantages:

Superior availability – We’re up. Always up. Client services, trading, payment and message handling systems run non-stop. No loss of customer data.

Automated application availability - Easy to operate and maintain. All fault handling and policy management are automated for you.

Affordability – Getting started is up to 36% less than other availability options. Administration and maintenance costs are up to 55% lower than they are with clusters.

Non-intrusive – Works with standard x86 servers, no application or OS modifications required. Works with any financial services application – no cluster awareness or scripting required.

Remote availability – Install our SplitSite option to geographically disperse your SQL servers.


Want to keep your financial services applications up and running through fault, failures and disasters? Contact us for more information or to take test drive.


Download this white paper (PDF, 467K)

back to top