Ensuring Continuous Availability

Most businesses have a Disaster Recovery Plan that they can execute if a crisis occurs at the primary site (such as replicating all updates to a secondary site and creating gold copies of those updates periodically so that they can recover to a certain point in time.) Budget Tradeoffs determine the Recovery Time Objective (RTO) and the Recovery Point Objective (RPO).

However, no matter how good your Disaster Recovery Plan is, your Disaster Recovery Execution determines the outcome.

        
    

        
    

Customized Disaster Recovery Assessment for your z/OS Infrastructure

It is essential to first determine the disaster recovery requirements of your organization. For starters, in some countries you must have controls in place to ensure that (financial) data is accurate and protected against loss. Other requirements include topics as:

  • What is the Recovery Point Objective (RPO): the maximum amount of data loss?
  • What is the Recovery Time Objective (RTO): the maximum recovery time?
  • What are the minimum I/O response times?
  • What are the requirements for the application performance?
  • What are the maximum costs of the disaster recovery technology?

At the beginning of the disaster recovery assessment the IntelliMagic expert will formulate the disaster recovery requirements for your z/OS infrastructure that need to be assessed.

Request a Free Consultation

At the beginning of the disaster recovery assessment the IntelliMagic expert will formulate the disaster recovery requirements for your z/OS infrastructure that need to be assessed.

Synchronous Data Replication Health

In a synchronous replication environment such as IBM Metro Mirror, Dell-EMC Symmetrix Remote Copy Data Facility/Synchronous, or Hitachi TrueCopy, there will be a direct performance impact to the primary application I/O’s. Specifically, write operations will suffer from disconnect time while the writes are sent to the secondary storage system and acknowledged back.

Synchronous replication allows for a zero RPO at the expense of this performance impact and is, for all practical purposes, limited to a ‘metro’ area (i.e. less than 100 km).

IntelliMagic Vision can help determine issues such as where bottlenecks exist, when the storage system is reaching the limit of its capability on links, issues related to bandwidth, secondary controllers, and saturated host adapters, among several other performance exceptions.

        
    

zOS Synchronous Data Replication Health

        
    

 

zOS Asynchronous Data Replication Health

        
    

Asynchronous Data Replication Health

In an asynchronous replication environment such as IBM Global Mirror or Dell-EMC Symmetrix Remote Copy Data Facility/Asynchronous, some data loss is accepted in case of a disaster in return for minimal direct impact on the performance of the primary applications and the ability for greater distance between sites. The goal is to maintain a reasonable RPO of something like 5 to 30 seconds.

IntelliMagic Vision supports several detailed metrics such as the specified and achieved RPO times and can help determine issues such as where bottlenecks exist, when the storage system is reaching the limit of its capability on links, issues related to bandwidth, secondary controllers, and saturated host adapters, among several other performance exceptions and response time degradation.

        
    

Recovery Point Objective

IntelliMagic Vision monitors all the vital statistics of an IBM Global Mirror configuration, such as the average RPO for all GM Sessions.

As an asynchronous data mirroring solution, Global Mirror has the potential to save bandwidth for certain workloads. The IntelliMagic consultant will use IntelliMagic Vision to explore specific workload characteristics to determine if bandwidth savings are possible and what the net bandwidth capacity requirement is.

        
    

Average Recovery Point Objective over Interval

        
    

 


Write Throughput and Global Mirror Optimized Write Throughput

        
    

Global Mirror for zSeries

Global Mirror for zSeries (zGM), previously known as XRC, is the only form of mainframe data replication that is not vendor-proprietary and IBM, HDS and Dell-EMC all sell disk storage systems that support the zGM architecture. While zGM is asynchronous like Global Mirror, it is designed for a much tighter RPO – not zero like synchronous replication, but better than controller-based asynchronous replication.

For zGM sites, IntelliMagic will verify whether the inter-site bandwidth is constrained, in which case zGM will actually slow down selected workloads at the primary site to maintain tight consistency.

        
    

Sizing the Inter-Site Bandwidth

One of the most significant cost factors involves data mirrored long distance. Sizing the inter-site bandwidth takes on a high economical priority. Without sufficient bandwidth, the data mirroring process will be unstable and may cause a disruption for the production systems.

IntelliMagic will investigate the throughput for write operations, down to the level of individual devices. In stringent replication environments such as any synchronous replication, zGM or HDS environments, we will verify that sufficient bandwidth is planned for to avoid costly performance delays.

In SRDF/A or GM environments, IntelliMagic Vision can calculate where there may be opportunities for lower bandwidth, while maintaining consistency, and lowering network costs.

        
    

 

 

 

Sizing Inter-Site Bandwidth

        
    

 

 

 

zOS GDPS Global Mirror Sessions

        
    

GDPS Active/Active

GDPS Active/Active reduces RTO on a site failure but has some data loss because it is asynchronous. There is a zero data loss (ZDL) option for GDPS Active/Active that uses synchronous PPRC to get changed data to the other site. At this site the application is reading the PPRC secondary logs and applying updates. This is supported up to 300km without an RPQ.

For Active/Active sites, consultants will use application-based monitoring together with TCP/IP and storage monitoring in IntelliMagic Vision to examine bottlenecks and early warning indicators for GDPS Active/Active installations. IntelliMagic closely follows the announcements of new options that IBM provides for GDPS Active/Active.

        
    

3-Site and 4-Site Solutions

As availability requirements become more and more stringent, 3-site solutions such as Metro Global Mirror (MGM) or Metro z/OS Global Mirror (MzGM) have become more prevalent with GDPS. The idea is to use local synchronous replication with HyperSwap for high availability for storage failures and long-distance asynchronous replication for disaster recovery.

Additionally, one can have symmetric configurations in both local and remote locations, referred to as 4-site solutions for both MGM and MzGM.

In these complex solutions, performance issues may occur at the source, secondary, tertiary or even the quaternary storage. IntelliMagic will search for potential bottlenecks in this configuration in order to proactively avoid performance problems.

        
    

 

 

 

3 Site and 4 Site Solutions

        
    

 

 

TS7700 Tape Data Replication

        
    

Tape Data Replication

For TS7700 grids it is not sufficient to set up the replication, and replication goals require constant monitoring.

IntelliMagic Vision processes BVIR statistics and SMF data using built-in intelligence about specific hardware and workloads.

For TS7700 sites, IntelliMagic will assess replication including:

  • The replication backlog
  • The average immediate queue age
  • The average deferred queue age
  • The average deferred copy throttle

We will also monitor the Outbound and Inbound data rate. These data rates can drop zero before replication thresholds are exceeded. When this occurs, remote cluster availability can be investigated to avoid replication problems.

        
    

Third-Party Expert Assessment to Ensure Proper Disaster Recovery Implementation and Execution

IntelliMagic’s renown z/OS experts have several decades of experience working with disaster recovery, performance, and availability. Combined with IntelliMagic Vision – software Cheryl Watson says “We [Watson and Walker] wouldn’t go anywhere without” –  our experts support your existing team and assess the relative effectiveness and health of an installation’s existing or proposed replication method and disaster recovery plan of the z/OS environment.

Sizing and monitoring the replication environment is often one of the most challenging parts of any Disaster Recovery Plan. In this tailored services engagement, our expert consultants will investigate the effectiveness and efficiency of your current or proposed replication method and automation software in your z/OS environment. If needed, the consultant will also determine the impact of a planned configuration change on the replication performance, such as relocating a data center.

IntelliMagic experts have experience in all major data replication techniques and automation software, such as:

  • Dell-EMC Symmetrix Remote Data Facility (SRDF), synchronous and asynchronous
  • Hitachi TrueCopy (HTC)
  • Hitachi Universal Replicator (HUR)
  • IBM Global Mirror (GM)
  • IBM Metro Mirror (MM)
  • IBM MGM
  • IBM z/OS Global Mirror (zGM)
  • Geographically Dispersed Parallel Sysplex (GDPS)
  • IBM TS7700 Replication Monitoring

Speak to a Consultant or Book a Free Consultation

Discuss Your Technical or Sales-Related Questions with Our Mainframe Experts Today