Success metrics for shared infrastructure in...

Success metrics for shared infrastructure in a BPM CoE

The success metrics of this shared infrastructure organization are similar to most traditional IT success metrics that handle shared infrastructure for enterprise platforms. The key difference is that this organization focuses on the success of the business solutions that are hosted on these platforms, and not only the platform itself.
The success metrics are as follows:

Solution uptime/availability
Solution responsiveness
Scalability of operational services including logging and error handling

Areas of responsibility in depth

One of the initial steps of planning for a BPM Center of Excellence is to design the eventual platform that will host your BPM process server. This step can be difficult because at this early stage, you might not have firm requirements for the applications that you will host. A BPM CoE must prepare to support current process applications and plan to expand the BPMS to support future applications.

Availability

Availability is related to the concept that our platform can survive system level failures. Availability is a measure of the time that a system is functioning normally, and also a measure of the time the recovery process requires after a system component fails. Reducing downtime is the most critical aspect of highly available systems. A highly available system is therefore one that can quickly recover from system failures and can show little or no impact to users during such events. Availability depends on the ability of replicated components to efficiently fail over.

Topology

The first step is to determine the topology of the deployment. Topology often indicates availability. Topology can include clustering, load balancers, web servers, proxies, database replication, and other items. Regardless of system load, you want to create an environment that can sustain quality of service, even in the event of a component failure.

IBM BPM provides a network deployment installation option, sometimes called the golden topology, that provides a high level of availability. A network deployment environment contains a collection of interconnected servers and clusters to run your business process applications.

Clustering across more than one node provides availability in the event that a node stops for any reason. A best practice is to have nodes running on separate hardware, or separate LPARS or VMs that are running on separate hardware, to ensure availability in the event of a hardware error.

Mitigating other points of failure

Availability is not limited to the application tier. There are many other potential areas for failure in a typical enterprise application server infrastructure.

Deploying your BPM platform within an infrastructure that is tolerant to loss of components is important. For example, we must be prepared for the loss of an IP sprayer, HTTP server, or firewall or router. Building in redundancies for these components can eliminate single points of failure and reduce subsequent downtime.

Scalability

From the beginning, invest in enough initial system capacity. False economies in the early stages can cause exponentially more cost and pain later. A two-node cluster running on 4-core hardware provides most organizations more than enough initial capacity to handle a successful CoE program.

A goal is to be able to increase system capacity when new intensive process applications are onboarded into the CoE and when existing process applications become widely adopted and demand more resources. Scalability refers to a system’s ability to readily adapt to these increasing demands while still meeting business objectives. Taking scalability into account when you initially design a business process management platform is critical. You must select appropriate hardware, operating system, topologies, and virtualization technologies to optimize your potential to meet future load requirements.

The two types of scalability are vertical and horizontal:

Vertical scalability is the ability to add more resources (cores, CPUs, memory) to gain performance.
Horizontal scalability is the ability to add more hardware (machines) to gain performance.

By its nature, the network deployment (ND) option for IBM BPM lends itself to horizontal scalability. The clustering of nodes allows you to expand system capacity by adding more machines. Ideally, scaling up a topology arbitrarily to match the required load is possible. The WebSphere Application Server Network Deployment infrastructure provides this capability.

Vertical scalability is more often achieved by using features of the underlying operating system or virtual machine. Vertical scalability is an attractive option because it does not involve adding servers in response to demand for new capacity and can reduce the need for more datacenter space, power, cooling, network cabling, data storage and administrative resources.