catalog损坏情况下的数据库恢复实例
一.环境描述
1.运行环境:hp小型机,hpux操作系统,oracle 9.2.0.1,openview data protector备份管理软件,esl9000带库。
数据库全备脚本:
run {
allocate channel dev_0 type sbt_tape
parms env=(ob2bartype=oracle8,ob2appname=oratest,ob2barlist=test2oratest);
backup incremental level <incr_level> filesperset 1
format test2oratest<oratest_%s:%t:%p>.dbf
database
include current controlfile;
sql alter system archive log current;
}
归档日志备份脚本:
run {
allocate channel dev_0 type sbt_tape
parms env=(ob2bartype=oracle8,ob2appname=oratest,
ob2barlist=test2oratestarchivelog);
backup incremental level <incr_level> filesperset 1
format test2oratestarchivelog<oratest_%s:%t:%p>.dbf
archivelog all delete input;
}
2.故障描述:oracle数据库(测试库)运行在归档模式下。正常情况下,备份管理软件调用oracle rman进行全库和归档日志备份。备份文件存储在esl9000磁带库上。由于磁盘阵列故障,造成目标数据库控制文件和数据文件全部损坏;同时,存放catalog的数据库也被损坏。
由于catalog损坏,rman无法通过catalog找到备份集,restore无法成功。由于没有控制文件,数据库仅能启动到nomount状态下。
二.背景知识
在oracle 816 以后的版本中,oracle提供了一个包:dbms_backup_restore。 dbms_backup_restore 包是由dbmsbkrs.sql 和 prvtbkrs.plb 这两个脚本创建的。 catproc.sql 脚本运行后会调用这两个包。所以每个数据库都有的这个包作为oracle服务器和操作系统之间io操作的接口,由rman直接调用。我们可以在数据库nomount 情况下调用这些包 ,来达到数据库恢复的目的。
三.恢复步骤
1.查找备份集
备份集可以在data protector的internal database中或日志记录中找到。具体内容(根据需要做了剪裁)如下:
sbt-25032 (?) 10/08/04 10:42:54 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 10:42:54
starting ob2bar backup: 05 test2:test2oratest<oratest_1:539001683:1>.dbf // oracle8
sbt-25032 (?) 10/08/04 10:46:14 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 10:46:14
completed ob2bar backup: 05 test2:test2oratest<oratest_1:539001683:1>.dbf // oracle8
sbt-25032 (?) 10/08/04 10:46:22 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 10:46:22
starting ob2bar backup: 05 test2:test2oratest<oratest_2:539001979:1>.dbf // oracle8
sbt-25032 (?) 10/08/04 11:02:45 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:02:45
completed ob2bar backup: 05 test2:test2oratest<oratest_2:539001979:1>.dbf // oracle8
sbt-28813 (?) 10/08/04 11:34:57 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:34:57
starting ob2bar backup: 05 test2:test2oratestarchivelog<oratest_3:539004793:1>.dbf // oracle8
sbt-28813 (?) 10/08/04 11:35:17 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:35:17
completed ob2bar backup: 05 test2:test2oratestarchivelog<oratest_3:539004793:1>.dbf // oracle8
。。。
sbt-28813 (?) 10/08/04 11:41:57 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:41:57
starting ob2bar backup: 05 test2:test2oratestarchivelog<oratest_26:539005316:1>.dbf // oracle8
sbt-28813 (?) 10/08/04 11:42:08 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:42:08
completed ob2bar backup: 05 test2:test2oratestarchivelog<oratest_26:539005316:1>.dbf // oracle8
sbt-28813 (?) 10/08/04 11:42:13 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:42:13
starting ob2bar backup: 05 test2:test2oratestarchivelog<oratest_27:539005332:1>.dbf // oracle8
sbt-28813 (?) 10/08/04 11:42:16 [normal] from: ob2bar@test2 “oratest” time: 10/08/04 11:42:16
completed ob2bar backup: 05 test2:test2oratestarchivelog<oratest_27:539005332:1>.dbf // oracle8
从上述内容可以看到存储在带库上的全备的备份集文件为:
test2:test2oratest<oratest_1:539001683:1>.dbf
test2:test2oratest<oratest_2:539001979:1>.dbf
archivelog的备份集文件为:
test2:test2oratestarchivelog<oratest_3:539004793:1>.dbf
。。。
test2:test2oratestarchivelog<oratest_26:539005316:1>.dbf
test2:test2oratestarchivelog<oratest_27:539005332:1>.dbf
2.恢复控制文件
test2:/backup/test$sqlplus /nolog
sql*plus: release 9.2.0.1.0 – production on sat oct 9 14:30:54 2004
copyright (c) 1982, 2002, oracle corporation. all rights reserved.
sql> connect / as sysdba
connected.
sql>startup nomount
sql>declare
2 devtype varchar2(256);
3 done boolean;
4 begin
5 devtype:=sys.dbms_backup_restore.deviceallocate(type=>sbt_tape,
ident=>t1,params=>env=(ob2bartype=oracle8,ob2appname=oratest,ob2barlist=test2oratest));
6 sys.dbms_backup_restore.restoresetdatafile;
7 sys.dbms_backup_restore.restorecontrolfileto(
cfname=>/backup/test/control01.ctl);
8 sys.dbms_backup_restore.restorebackuppiece(done=>done,
handle=>test2oratest<oratest_2:539001979:1>.dbf, params=>null);
9 sys.dbms_backup_restore.devicedeallocate;
10 end;
11 /
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:23:28
starting ob2bar restore: 05 test2:test2oratest<oratest_2:539001979:1>.dbf // oracle8
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:23:32
completed ob2bar restore: 05 test2:test2oratest<oratest_2:539001979:1>.dbf // oracle8
sql>
由于rman做全库备份的时候,最后备份控制文件,应此可以从最后一个全备的备份集文件中恢复控制文件。恢复控制文件时,需将数据库启动到nomount状态。恢复的控制文件应与init文件中定义的控制文件的路径、文件名和数量相一致,否则无法把数据库启动到mount状态。
下面介绍程序包的内容:
第五行 分配一个device channel ,”sbt_tape”说明是从磁带上恢复。params参数与data protector中rman备份脚本中的参数一致。
第六行 初始化恢复过程,准备进行控制文件或数据文件恢复。
第七行 指出待恢复文件和恢复文件的存储位置。本处说明是恢复控制文件及存放存放控制文件的路径、文件名。
第八行 说明从哪个备份片中恢复
第九行 释放设备通道
3.恢复数据文件
首先需要知道每个备份集中包含哪些数据文件:
sql>alter database mount;
sql>select file# from v$backup_datafile where set_stamp=539001683;
file#
———-
2
0
2 rows selected.
sql>
其中,set_stamp是备份集文件名中的时间戳;
0表示控制文件;
2是数据文件号。
文件号对应的文件名可以在v$datafile中查到。
知道备份集中包含的数据文件号和文件名,就可以恢复数据文件了:
sql>declare
2 devtype varchar2(256);
3 done boolean;
4 begin
5 devtype:=sys.dbms_backup_restore.deviceallocate(type=>sbt_tape,
ident=>t1,params=>env=(ob2bartype=oracle8,ob2appname=oratest,ob2barlist=test2oratest));
6 sys.dbms_backup_restore.restoresetdatafile;
7 sys.dbms_backup_restore.restoredatafileto(dfnumber=>2,
toname=>/backup/test/undotbsldb01.dbf);
8 sys.dbms_backup_restore.restorebackuppiece(done=>done,
handle=>test2:test2oratest<oratest_1:539001683:1>.dbf, params=>null);
9 sys.dbms_backup_restore.devicedeallocate;
10 end;
11 /
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:20:10
starting ob2bar restore: 05test2:test2oratest<oratest_1:539001683:1>.dbf // oracle8
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:20:24
completed ob2bar restore: 05 test2:test2oratest<oratest_2:539001683:1>.dbf // oracle8
pl/sql procedure successfully completed.
sql>
重复上述操作,可以恢复所有数据文件。恢复的数据文件的存储目录可以不是原存储目录,因此,甚至可以在数据库open状态下从备份集中restore数据文件。
4.恢复archive log文件
sql>declare
2 devtype varchar2(256);
3 done boolean;
4 begin
5 devtype:=sys.dbms_backup_restore.deviceallocate(type=>sbt_tape,ident=>t1,
params=>env=(ob2bartype=oracle8,ob2appname=oratest,
ob2barlist=test2oratestarchivelog));
6 sys.dbms_backup_restore.restoresetarchivedlog;
7 sys.dbms_backup_restore.restorearchivedlogrange;
8 sys.dbms_backup_restore.restorebackuppiece(done=>done,
handle=>test2:test2oratestarchivelog<oratest_26:539005316:1>.dbf, params=>null);
9 sys.dbms_backup_restore.devicedeallocate;
10 end;
11 /
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:40:13
starting ob2bar restore: 05 test2:test2oratestarchivelog<oratest_27:539005316:1>.dbf // oracle8
[normal] from: ob2bar@test2 “oratest” time: 10/09/04 14:40:13
completed ob2bar restore: 05 test2:test2oratestarchivelog<oratest_27:539005316:1>.dbf // oracle8
sql>
根据第一步中获得的archive log备份集的文件名,恢复所有archive log.
5. recover数据库
sql> recover database until cancel using backup controlfile;
ora-00279: change 54285 generated at 10/08/2004 10:41:24 needed for thread 1
ora-00289: suggestion : /backup/test/archive/1_24.dbf
ora-00280: change 54285 for thread 1 is in sequence #24
specify log: {<ret>=suggested | filename | auto | cancel}
ora-00279: change 55331 generated at 10/08/2004 11:02:48 needed for thread 1
ora-00289: suggestion : /backup/test/archive/1_25.dbf
ora-00280: change 55331 for thread 1 is in sequence #25
ora-00278: log file /backup/test/archive/1_24.dbf no longer needed for this recovery
specify log: {<ret>=suggested | filename | auto | cancel}
ora-00279: change 56789 generated at 10/08/2004 11:33:12 needed for thread 1
ora-00289: suggestion : /backup/test/archive/1_26.dbf
ora-00280: change 56789 for thread 1 is in sequence #26
ora-00278: log file /backup/test/archive/1_25.dbf no longer needed for this recovery
specify log: {<ret>=suggested | filename | auto | cancel}
ora-00308: cannot open archived log /backup/test/archive/1_26.dbf
ora-27037: unable to obtain file status
hp-ux error: 2: no such file or directory
additional information: 3
sql> alter database open resetlogs;
database altered.
sql>
现在数据库恢复已经完成了。由于online redo log也损坏了,数据库仅恢复到最后一个归档的日志文件。
一个小技巧:在上一步恢复archive log时,archive log文件可能非常多。如果不知道最早应该恢复到哪一个archive log文件,可以先执行一次本步操作,系统会提示最先需要的archive log文件,然后从最后的一个archive log备份集往前恢复直到该archive log文件被恢复。示例如下:
sql> recover database until cancel using backup controlfile;
ora-00279: change 54285 generated at 10/08/2004 10:41:24 needed for thread 1
ora-00289: suggestion : /backup/test/archive/1_24.dbf
ora-00280: change 54285 for thread 1 is in sequence #24
specify log: {<ret>=suggested | filename | auto | cancel}
ora-00308: cannot open archived log /backup/test/archive/1_24.dbf
ora-27037: unable to obtain file status
hp-ux error: 2: no such file or directory
additional information: 3
ora-01547: warning: recover succeeded but open resetlogs would get error below
ora-01152: file 1 was not restored from a sufficiently old backup
ora-01110: data file 1: /backup/test/oradata/system01.dbf
sql>
显然,第一个需要的archive log文件是/backup/test/archive/1_24.dbf
五.小结
1. 只要保存了完整的全库备份和完整的归档日志备份,即使控制文件和恢复目录全部损坏,数据库还是可以恢复的。
2. 根据上面一条可以看出,无论备份时是否采用了恢复目录,恢复目录在数据库恢复时,并不是必须的。
3. rman做的备份可以不用rman恢复。
4. hp data protector的备份脚本在做全库备份时,是一个一个备份数据文件和日志文件并形成单独的备份集。数据库全备期间的日志文件必须保存完整。
参考资料:
dba工作备忘录之三:rman备份,没用catalog ,但控制文件丢失,怎么解决? fenng