DR ( Disaster Recovery ) fascinate me - God forbid, I hope no one ever faces an actual disaster of the mainframe/storage systems. I am always involved in the DR activity each year for a small shop that supports a DB size of < 100 gigs. Yes, you may wonder why they need mainframes - I might think of posting another blog topic regarding it, but for now I am going to share my experience during DR drill 2013 which commenced couple of days back.
The proceeding started as usual - The MML bridge and people in a huddle trying to perform the tasks in the DR schedule queue as per the agenda. It was a normal day and a normal monitoring task for me as my team had completed the pre drill activities in advance ( we don't keep any tasks on the pending queue ) . I was not hoping to see any errors during the drill, but less that I knew I faced one of the most wired errors in my DBA career.
The error occurred for one particular table which was of size 1.8 G with row count of 11M.
Here is the error message :
DSN1989I DSN1COPY IS PROCESSED WITH THE FOLLOWING OPTIONS:
NO CHECK/NO PRINT/ 4K/FULLCOPY /NON-SEGMENT/NUMPARTS = 0/ OBIDXLAT/NO VALUE/NO RESET/ /LOB/PIECESIZ= /
DSSIZE=
DSN1998I INPUT DSNAME = DISASTER.TABLE001.IMAGE.COPY , SEQ
DSN1997I OUTPUT DSNAME = VCATNAME.DSNDBC.PRDNDB00.PRDNTS00.I0001.A001 , VSAM
DSN1992I VSAM PUT ERROR, RPLERREG = 008, RPLERRCD = 028
DSN1993I DSN1COPY TERMINATED, 0019835 PAGES PROCESSED
Ok - VSAM PUT ERROR , name itself suggests that there was a problem copying over the DR image copy onto the LDS - Error right ? So why and how did it manage to copy 19835 pages ? This number would change every time I restart the job step, but the VSAM put error would show up. Strange was the fact that the table had data in it as well but not the complete data, just a part of it, this indicates that DSN1COPY does not rollback operations on errors. I had never seen this error in my life time, I quickly moved it to google ( I refer the online publib site for manuals too ) to see if anyone else faced this error - The first 10 pages of google gave me pretty good information and I was able to hunt down the problem to a SPACE issue.
Resolution :
The tablespace was SMS managed and DSN1COPY does not have a mechanism to bump extends or allocate space ( volumes ) to the LDS. You will have to manually issue an ALTER statement on the LDS using the IDCAMS utility as shown below :
ALTER VCATNAME.DSNDBD.PRDNDB00.PRDNTS00.I0001.A001 ADDVOLUMES(* * * * *)
The more the *'s, the more the volumes ( free space ) are added. I recommend you to keep the *'s to a count of 6. In case you are able to calculate the size of the TS and you know your volume/storage settings you may increase/decrease the *'s. At my shop 1 SMS volume = 2200 cyls ; just an example it may vary in your shop.
Resubmit the job from the abended step and it should go fine as it did for me. Since its a DR site I did not release the volumes as they would be released as a part of the checklist. If you are performing DSN1COPies in your local/live site then I recommend you to release the volumes from the LDS by issuing
ALTER VCATNAME.DSNDBD.PRDNDB00.PRDNTS00.I0001.A001 REMOVEVOLUMES(*)
Special thanks to the IDUG thread and my colleagues who helped me out on this
Note : Restarting the job without following the above resolution will not fix your issue , it will only cost you your precious time.
No comments:
Post a Comment