The deceptive simplicity of the save-while-active feature

The most accurate description of the Save-While-Active (SWA) function that I could locate on the web is as follows: “The Save-While-Active function enables you to utilize your system while concurrently saving your IBM i system objects, in addition to integrating them seamlessly into your broader backup and recovery protocols.”

For some reason, comparisons between SWA and the SAN FlashCopy function are scarce, despite the fact that both essentially serve the same purpose and, more importantly, share very similar implementation methods.

While the save-while-active feature for backups has been available for many years, a significant number of customers remain hesitant to incorporate this parameter into their save commands. This hesitance was evident in 2010 when Tom Huntington made this observation in his article for MCPress Online, and regrettably, it continues to persist today.

There are at least two reasons for this hesitance. Firstly, there is a lack of understanding regarding the “point-in-time” copy concept. Many IBM i administrators believe that, although SWA backups can run concurrently with database updates by applications, the resulting backup doesn’t accurately represent the data at the specific point in time when the backup operation started. The second reason is the deceptive simplicity of the SWA feature.

Let’s consider an example of a traditional, long-running batch process.

/* BEFORE backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx)
/* Run batch proces */
CALL PGM(BATCH)
/* AFTER backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx)

If our primary concern is the duration of batch runtime, and a substantial portion of this duration is attributable to the time consumed by backups, then, setting aside any unwarranted doubts about the backup content, we might be tempted to merely add the SWA parameter to our backup requests:

/* BEFORE backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB)
/* Run batch proces */
CALL PGM(BATCH)
/* AFTER backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB)

Unfortunately, this wouldn’t yield any improvement, and the batch runtime would remain unchanged. Although each of the backup commands may employ different algorithms internally, the batch components would still execute sequentially. The issue lies in the fact that while SWA aids in running backups asynchronously in relation to OTHER jobs, it does not alter the sequence of operations within the jobs from which they are invoked.

So, would a solution like this be effective then?

/* BEFORE backup */
SBMJOB CMD(SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB))
/* Run batch proces */
CALL PGM(BATCH)
/* AFTER backup */
SBMJOB CMD(SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB))

This approach wouldn’t work either, but for an entirely different reason. The problem stems from the fact that our BATCH program is likely to start running before the first backup. This delay occurs because it takes some time for a newly submitted job to initialize. Consequently, even if the backup completes without any issues, there’s no guarantee it will capture the authentic “before” image of the objects being saved. Furthermore, in a scenario where BATCH’s runtime is relatively short during one of the runs, the second backup might potentially clash with the first, which could still be in progress when the second backup is submitted.

The key lies in establishing proper synchronization among the three components of the process. A version of the batch job capable of preserving the integrity of the process while reducing its runtime might take on the following form:

/* Create SWA message queue */
CRTMSGQ MSGQ(SWA)
/* Clear if exists */
MONMSG CPF2112 EXEC(DO)
CLRMSGQ SWA
ENDDO
/* BEFORE backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn)
LIB(APPLIB)
DEV(xxxxxxx)
SAVACT(*LIB)
SAVACTMSGQ(SWA)
/* Wait for the SWA checkpoint and process possible error conditions */
CALL WAITANDER1
/* Run batch proces */
CALL PGM(BATCH)
/* Wait for the BEFORE backup yo complete and process possible error conditions */
CALL WAITANDER2
./* AFTER backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn)
LIB(APPLIB)
DEV(xxxxxxx)
SAVACT(*LIB)
SAVACTMSGQ(SWA)
/* Wait for the SWA checkpoint and process possible error conditions */

This approach would indeed be effective. However, it’s worth noting that, on one hand, the script has significantly grown in size and may have lost its original clarity. On the other hand, creating the WAITANDER1 and WAITANDER2 programs is not a trivial task. In essence, while SWA can be quite valuable, its implementation is far more intricate than simply adding a related parameter to the backup commands.

Fortunately, the story doesn’t end here. The iSTREAM option 1 (Flash Execution) comes to the rescue, facilitating the establishment of the requisite level of parallelism while upholding the integrity of backup data.

To set up SWA in iSTREAM, the SAVOBJ command needs to be configured for transformation:

ENAASYEXE COMMAND(SAVOBJ)

Also, the iSTREAM mode (see https://cyprolics.co.uk/rmvjrnchg-rollback-dilemma/  for the description) for the job would have to be defined as follows

STRISTMOD UNIT(BKP) ASYEXE(*YES)

The first parameter identifies a group of jobs that share similar command transformation requirements. The second parameter activates SWA. The subsequent sequence of events involves iSTREAM generating a message queue each time SAVOBJ is run by the command processor. It appends the necessary SWA parameters to the command, initiates the backup process as a batch job, and continuously monitors the message queue for the “checkpoint taken” message. Only after receiving this message does control return to the next command in the program currently being executed.

The key point is that the original CL program can remain unaltered, while iSTREAM takes care of all the essential transformations, monitoring, and error processing. iSTREAM also offers the flexibility of the WAITASYRQS command, which allows jobs to wait for the completion of previously submitted SWA backups, and it provides a DSPASYRQS dashboard to manage active backups.

Of course, iSTREAM provides a significantly broader functional range when it comes to SWA. For instance, it allows you to submit multiple backups either serially or in parallel mode. You can also run BRMS and ROBOT/SAVE backups in SWA mode with the same level of control as regular IBM i backups. Furthermore, you can even send backups from one partition to another for execution, although this function is exclusively supported for configurations using Assure MIMIX for High Availability and/or Distarter Recovery.

All of the functionalities mentioned above can be achieved without making any changes to the original batch, which can remain completely untouched:

/* BEFORE backup */
SAVOBJ OBJ(OBJ1 OBJ2… OBJn) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB)
/* Run batch proces */
CALL PGM(BATCH)
/* AFTER backup */
SAVOBJ OBJ(OBJm OBJo… OBJp) LIB(APPLIB) DEV(xxxxxx) SAVACT(*LIB)

iSTREAM essentially transforms the above program into a multifunctional logical script that can be interpreted in various ways at runtuime, depending on the configuration defined.