RMVJRNCHG rollback dilemma

RMVJRNCHG is a nice little line command. The first time IBM i administrators see it, they can’t help but think “Great! Down with backups!” …ok, ok, maybe they don’t – there’s more to backups than a simple ability to roll back database updates to a point in the past.

But let’s think about it. Imagine that you are an application or database administrator on IBM i and the system is in the middle of executing a long-running daily batch. Then, as sometimes happens, the batch crashes. You call in an expert, they find the cause of the problem and fix it. Now it’s time to restart the batch. But how do you do it?

Although some applications are intelligent enough to be restarted from a point of failure, most are not. More importantly, even the most intelligent of applications can hardly be expected to be able to restart after ANY error.  This is exactly where backups come in: you restore and rerun. The issue is, this may take quite a long time, presenting a problem for the business.

In an ideal world, RMVJRNCHG should help solve such problems. Naturally, you would need to make sure that your application objects are properly journaled, but that’s quite common these days. Then, you would have to find the target journal entry, i.e. the point in time that you would like to recover the system to. It could be either a moment immediately preceding the kick-off of the batch, or a checkpoint that the batch can restart from.

Finding the related journal entry may be a non-trivial task, given that there could be millions of them. But suppose you find it, select the objects for the rollback, type the RMVJRNCHG command and press Enter. Well, I can think of at least two reasons you may not want to do this. Firstly, there’s a chance that the RMVJRNCHG command would fail, leaving you with no option but to restore from backup. This is because your batch may include unrecoverable commands, such as CLRPFM, CPYF with MBROPT(*REPLACE) parameter, RGZPFM, or similar.  Each of those leaves only a simple audit entry in the journal, crippling the potential for a subsequent rollback. The second reason is performance-related. If your batch is well-designed and capable of using the multiprocessor architecture of IBM POWER servers, RMVJRNCHG, being a single-threaded process, could spend up to n times longer to roll back than it took the batch to reach the point of failure. The n here is the number of cores in the configuration. In the end, you may still be better off restoring from backups.

So, the “to use or not to use” dilemma in relation to the RMVJRNCHG command is very real.

However, there is a way to avoid this dilemma altogether by using a piece of software called iSTREAM. It is a multifunctional tool, and the full range of its functionality will be explored in other articles, but for now, let’s focus on how it can help make the RMVJRNCHG command more usable.

iSTREAM can define a special job execution mode, which gives system administrators more control over the way CL commands are executed in any selected group of jobs (a good technical description of the iSTREAM mode can be found in the iSTREAM CL Command Transformer (CCT) Guide). This mode can, for example, be defined as follows:

/* Configure and start iSTREAM mode */

STRISTMOD UNIT(TST) CTLLIB(LIB1) JRN(LIB1/JRN) LIBLIST(LIB1 LIB2 LIB3)

TST here is the name of the configuration, LIB1, LIB2 and LIB3 are application libraries, and JRN is the name of the journal used by the application to record both before and after images of files and data areas used by the application.

For a job executing in the iSTREAM mode, commands of the above “compromising” type, e.g. CLRPFM, CPYF with MBROPT(*REPLACE) parameter, or RGZPFM would be intercepted and replaced with their recoverable equivalents. For CLRPFM, for example, this would be a function explicitly deleting all the file records. There is much more to this command transformation than could be addressed in a short article, but, hopefully, the concept is clear: iSTREAM helps prevent commands such as those above to disable the RMVJRNCHG functionality.

Symbolic checkpoints could be defined to make the definition of targets for the rollback easier:

/* Create checkpoint for rollback */

CRTCKPRLB UNIT(TST) CKPT(BATCH)

The performance problem related to the batch multiprocessing on IBM i could also be resolved, if for the rollback the administrator made use of the related iSTREAM command as follows:

/* Rollback to checkpoint */

ISTSSYS/RLBTOCKP UNIT(TST) CKPT(BATCH) STREAMS(10)

The above command quotes the name of the earlier created configuration, defines the target checkpoint for the rollback, and breaks the rollback into 10 parallel streams (multistreaming of the rollback process for a single file/table is also supported).

The bottom line is, the long-standing RMVJRNCHG dilemma, which is often resolved negatively by  IBM i application administrators, may well be overturned with the help of additional instrumental software.