KingbaseES数据库 通过sys_resetwal恢复被删除数据()

KingbaseES恢复被删除数据

生产环境操作请先备份整个data目录或者cp 个新的data目录在进行此操作恢复被删除数据,通过备份的新路径恢复数据之后,确认数据完整性。在把数据从备份环境导出重新导入到生产环境。

使用到的系统工具:
sys_resetwal
sys_waldump

准备测试环境:

[kingbase@postgres ~]$ sys_ctl -D data1 start
waiting for server to start....2022-11-11 10:40:58.095 CST [30813] LOG:  sepapower extension initialized
2022-11-11 10:40:58.103 CST [30813] LOG:  starting KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-11-11 10:40:58.103 CST [30813] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2022-11-11 10:40:58.103 CST [30813] LOG:  listening on IPv6 address "::", port 54321
2022-11-11 10:40:58.106 CST [30813] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2022-11-11 10:40:58.219 CST [30813] LOG:  redirecting log output to logging collector process
2022-11-11 10:40:58.219 CST [30813] HINT:  Future log output will appear in directory "sys_log".
 done
server started
[kingbase@postgres ~]$ ksql -Usystem -dtest
ksql (V8.0)
Type "help" for help.

WARNING:License file will expire in 14 days.


test=# create table t2 (id int);
CREATE TABLE
test=# insert into t2 values (generate_series(1,5));        
INSERT 0 5
test=# insert into t2 values (generate_series(1,1000));        
INSERT 0 1000
test=# checkpoint;
CHECKPOINT
test=# 
test=# select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/730F790          | 000000010000000000000007 | (000000010000000000000007,3209104)
(1 row)

test=# 
test=# checkpoint;
CHECKPOINT
test=# select count(*) from t2;
 count 
-------
  1005
(1 row)

test=# delete from t2 where id>995;
DELETE 5
test=# select count(*) from t2;    
 count 
-------
  1000
(1 row)

test=# select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/7311F80          | 000000010000000000000007 | (000000010000000000000007,3219328)
(1 row)

确认当前日志文件 + 事物(LSN)号

select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/730F790          | 000000010000000000000007 | (000000010000000000000007,3209104)
(1 row)

根据查询的事务号以及日志文件,查找WAL日志确定恢复数据的范围

lsn数据类型可以被用来存储 LSN(日志序列号)数据,LSN 是一个指向WAL中的位置的指针。
这个类型是XLogRecPtr的一种表达并且是数据库内部系统类型。

在数据库内部,一个 LSN 是一个 64 位整数,表示在预写式日志流中的一个字节位置。
它由两个长度为8位的十六进制数组成,中间用斜线分隔,如6/7311F80。 
lsn类型支持标准的比较操作符 = 和  >
两个 LSN 可以做相减操作, 结果将是分隔两个预写式日志位置的字节数


通过wal日志事物号找到delete的大概位置,这里位置是 1034

sys_waldump: fatal: could not find a valid record after 0/7311F80
[kingbase@postgres sys_wal]$ sys_waldump 000000010000000000000007 00000001000000000000000C -s 0/730F790
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/0730F790, prev 0/0730F760, desc: RUNNING_XACTS nextXid 1034 latestCompletedXid 120649616 oldestRunningXid 1034
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/0730F7C0, prev 0/0730F790, desc: CHECKPOINT_ONLINE redo 0/730F790; tli 1; prev tli 1; fpw true; xid 0:1034; oid 24669; multi 1; offset 0; oldest xid 912 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1034; online
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/0730F838, prev 0/0730F7C0, desc: RUNNING_XACTS nextXid 1034 latestCompletedXid 120649664 oldestRunningXid 1034
rmgr: XLOG        len (rec/tot):     49/  5917, tx:          0, lsn: 0/0730F868, prev 0/0730F838, desc: FPI_FOR_HINT , blkref #0: rel 1663/16269/2619 blk 23 FPW
rmgr: Heap        len (rec/tot):     59/  3751, tx:       1034, lsn: 0/07310FA0, prev 0/0730F868, desc: DELETE off 97 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311E48, prev 0/07310FA0, desc: DELETE off 98 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311E80, prev 0/07311E48, desc: DELETE off 99 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311EB8, prev 0/07311E80, desc: DELETE off 100 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311EF0, prev 0/07311EB8, desc: DELETE off 101 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Transaction len (rec/tot):     34/    34, tx:       1034, lsn: 0/07311F28, prev 0/07311EF0, desc: COMMIT 2022-11-11 12:15:09.240177 CST
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/07311F50, prev 0/07311F28, desc: RUNNING_XACTS nextXid 1035 latestCompletedXid 120659752 oldestRunningXid 1035
sys_waldump: fatal: error in WAL record at 0/7311F50: invalid record length at 0/7311F80: wanted 24, got 0	
通过以下命令进行在wal日志精确定位事务号

[kingbase@postgres sys_wal]$ sys_waldump 000000010000000000000007 00000001000000000000000C | grep DELETE

停止数据库

[kingbase@postgres ~]$ sys_ctl -D /home/kingbase/data1/ stop
waiting for server to shut down.... done
server stopped

通过指定事务号1034,进行恢复数据

[kingbase@postgres sys_wal]$ sys_resetwal -x 1034 -D /home/kingbase/data1/
Write-ahead log reset
[kingbase@postgres sys_wal]$

启动数据库并且查看数据是否恢复

[kingbase@postgres ~]$ sys_ctl -D /home/kingbase/data1/ start
waiting for server to start....2022-11-11 13:12:39.468 CST [7239] LOG:  sepapower extension initialized
2022-11-11 13:12:39.481 CST [7239] LOG:  starting KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-11-11 13:12:39.481 CST [7239] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2022-11-11 13:12:39.481 CST [7239] LOG:  listening on IPv6 address "::", port 54321
2022-11-11 13:12:39.485 CST [7239] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2022-11-11 13:12:39.615 CST [7239] LOG:  redirecting log output to logging collector process
2022-11-11 13:12:39.615 CST [7239] HINT:  Future log output will appear in directory "sys_log".
 done
server started
[kingbase@postgres ~]$ ksql -Usystem -dtest
ksql (V8.0)
Type "help" for help.

WARNING:License file will expire in 14 days.

test=# select count(*) from t2;
 count 
-------
  1005
(1 row)

test=# 
经确认数据已经恢复.

在进行了sys_resetwal操作后,设置的LSN 号后的数据都取消了,相当于回滚到指定的事务LSN 号的位置.

WAL 日志整体被清理,原有的日志都会被清除

恢复之前的wal日志

 [kingbase@postgres sys_wal]$ ll
total 98304
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:15 000000010000000000000007
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 000000010000000000000008
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 000000010000000000000009
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000A
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000B
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000C
drwx------ 2 kingbase kingbase        6 Nov 11 12:08 archive_status

恢复之后的日志:

[kingbase@postgres sys_wal]$ ll
total 16384
-rw------- 1 kingbase kingbase 16777216 Nov 11 13:17 00000001000000000000000D
drwx------ 2 kingbase kingbase        6 Nov 11 12:08 archive_status
[kingbase@postgres sys_wal]$ 

WAL日志切换

自之前的WAL日志文件切换以后如果没有新的WAL日志生成,就不会创建新的 WAL 日志文件,也就是说手动切换wal日志,需要有新数据生成才会发生切换,如果没有新数据生成,手动执行sys_switch_wal也无法进行wal日志切换。
使 WAL 文件活动,会发生 WAL 文件切换:
SELECT sys_walfile_name(sys_switch_wal()), now(), sys_walfile_name(sys_switch_wal());
————————

KingbaseES恢复被删除数据

生产环境操作请先备份整个data目录或者cp 个新的data目录在进行此操作恢复被删除数据,通过备份的新路径恢复数据之后,确认数据完整性。在把数据从备份环境导出重新导入到生产环境。

使用到的系统工具:
sys_resetwal
sys_waldump

准备测试环境:

[kingbase@postgres ~]$ sys_ctl -D data1 start
waiting for server to start....2022-11-11 10:40:58.095 CST [30813] LOG:  sepapower extension initialized
2022-11-11 10:40:58.103 CST [30813] LOG:  starting KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-11-11 10:40:58.103 CST [30813] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2022-11-11 10:40:58.103 CST [30813] LOG:  listening on IPv6 address "::", port 54321
2022-11-11 10:40:58.106 CST [30813] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2022-11-11 10:40:58.219 CST [30813] LOG:  redirecting log output to logging collector process
2022-11-11 10:40:58.219 CST [30813] HINT:  Future log output will appear in directory "sys_log".
 done
server started
[kingbase@postgres ~]$ ksql -Usystem -dtest
ksql (V8.0)
Type "help" for help.

WARNING:License file will expire in 14 days.


test=# create table t2 (id int);
CREATE TABLE
test=# insert into t2 values (generate_series(1,5));        
INSERT 0 5
test=# insert into t2 values (generate_series(1,1000));        
INSERT 0 1000
test=# checkpoint;
CHECKPOINT
test=# 
test=# select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/730F790          | 000000010000000000000007 | (000000010000000000000007,3209104)
(1 row)

test=# 
test=# checkpoint;
CHECKPOINT
test=# select count(*) from t2;
 count 
-------
  1005
(1 row)

test=# delete from t2 where id>995;
DELETE 5
test=# select count(*) from t2;    
 count 
-------
  1000
(1 row)

test=# select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/7311F80          | 000000010000000000000007 | (000000010000000000000007,3219328)
(1 row)

确认当前日志文件 + 事物(LSN)号

select sys_current_wal_lsn(),sys_walfile_name(sys_current_wal_lsn()),sys_walfile_name_offset(sys_current_wal_lsn());
 sys_current_wal_lsn |     sys_walfile_name      |       sys_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 0/730F790          | 000000010000000000000007 | (000000010000000000000007,3209104)
(1 row)

根据查询的事务号以及日志文件,查找WAL日志确定恢复数据的范围

lsn数据类型可以被用来存储 LSN(日志序列号)数据,LSN 是一个指向WAL中的位置的指针。
这个类型是XLogRecPtr的一种表达并且是数据库内部系统类型。

在数据库内部,一个 LSN 是一个 64 位整数,表示在预写式日志流中的一个字节位置。
它由两个长度为8位的十六进制数组成,中间用斜线分隔,如6/7311F80。 
lsn类型支持标准的比较操作符 = 和  >
两个 LSN 可以做相减操作, 结果将是分隔两个预写式日志位置的字节数


通过wal日志事物号找到delete的大概位置,这里位置是 1034

sys_waldump: fatal: could not find a valid record after 0/7311F80
[kingbase@postgres sys_wal]$ sys_waldump 000000010000000000000007 00000001000000000000000C -s 0/730F790
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/0730F790, prev 0/0730F760, desc: RUNNING_XACTS nextXid 1034 latestCompletedXid 120649616 oldestRunningXid 1034
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/0730F7C0, prev 0/0730F790, desc: CHECKPOINT_ONLINE redo 0/730F790; tli 1; prev tli 1; fpw true; xid 0:1034; oid 24669; multi 1; offset 0; oldest xid 912 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1034; online
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/0730F838, prev 0/0730F7C0, desc: RUNNING_XACTS nextXid 1034 latestCompletedXid 120649664 oldestRunningXid 1034
rmgr: XLOG        len (rec/tot):     49/  5917, tx:          0, lsn: 0/0730F868, prev 0/0730F838, desc: FPI_FOR_HINT , blkref #0: rel 1663/16269/2619 blk 23 FPW
rmgr: Heap        len (rec/tot):     59/  3751, tx:       1034, lsn: 0/07310FA0, prev 0/0730F868, desc: DELETE off 97 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311E48, prev 0/07310FA0, desc: DELETE off 98 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311E80, prev 0/07311E48, desc: DELETE off 99 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311EB8, prev 0/07311E80, desc: DELETE off 100 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Heap        len (rec/tot):     54/    54, tx:       1034, lsn: 0/07311EF0, prev 0/07311EB8, desc: DELETE off 101 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/16269/16410 blk 4
rmgr: Transaction len (rec/tot):     34/    34, tx:       1034, lsn: 0/07311F28, prev 0/07311EF0, desc: COMMIT 2022-11-11 12:15:09.240177 CST
rmgr: Standby     len (rec/tot):     42/    42, tx:          0, lsn: 0/07311F50, prev 0/07311F28, desc: RUNNING_XACTS nextXid 1035 latestCompletedXid 120659752 oldestRunningXid 1035
sys_waldump: fatal: error in WAL record at 0/7311F50: invalid record length at 0/7311F80: wanted 24, got 0	
通过以下命令进行在wal日志精确定位事务号

[kingbase@postgres sys_wal]$ sys_waldump 000000010000000000000007 00000001000000000000000C | grep DELETE

停止数据库

[kingbase@postgres ~]$ sys_ctl -D /home/kingbase/data1/ stop
waiting for server to shut down.... done
server stopped

通过指定事务号1034,进行恢复数据

[kingbase@postgres sys_wal]$ sys_resetwal -x 1034 -D /home/kingbase/data1/
Write-ahead log reset
[kingbase@postgres sys_wal]$

启动数据库并且查看数据是否恢复

[kingbase@postgres ~]$ sys_ctl -D /home/kingbase/data1/ start
waiting for server to start....2022-11-11 13:12:39.468 CST [7239] LOG:  sepapower extension initialized
2022-11-11 13:12:39.481 CST [7239] LOG:  starting KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-11-11 13:12:39.481 CST [7239] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2022-11-11 13:12:39.481 CST [7239] LOG:  listening on IPv6 address "::", port 54321
2022-11-11 13:12:39.485 CST [7239] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2022-11-11 13:12:39.615 CST [7239] LOG:  redirecting log output to logging collector process
2022-11-11 13:12:39.615 CST [7239] HINT:  Future log output will appear in directory "sys_log".
 done
server started
[kingbase@postgres ~]$ ksql -Usystem -dtest
ksql (V8.0)
Type "help" for help.

WARNING:License file will expire in 14 days.

test=# select count(*) from t2;
 count 
-------
  1005
(1 row)

test=# 
经确认数据已经恢复.

在进行了sys_resetwal操作后,设置的LSN 号后的数据都取消了,相当于回滚到指定的事务LSN 号的位置.

WAL 日志整体被清理,原有的日志都会被清除

恢复之前的wal日志

 [kingbase@postgres sys_wal]$ ll
total 98304
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:15 000000010000000000000007
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 000000010000000000000008
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 000000010000000000000009
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000A
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000B
-rw------- 1 kingbase kingbase 16777216 Nov 11 12:11 00000001000000000000000C
drwx------ 2 kingbase kingbase        6 Nov 11 12:08 archive_status

恢复之后的日志:

[kingbase@postgres sys_wal]$ ll
total 16384
-rw------- 1 kingbase kingbase 16777216 Nov 11 13:17 00000001000000000000000D
drwx------ 2 kingbase kingbase        6 Nov 11 12:08 archive_status
[kingbase@postgres sys_wal]$ 

WAL日志切换

自之前的WAL日志文件切换以后如果没有新的WAL日志生成,就不会创建新的 WAL 日志文件,也就是说手动切换wal日志,需要有新数据生成才会发生切换,如果没有新数据生成,手动执行sys_switch_wal也无法进行wal日志切换。
使 WAL 文件活动,会发生 WAL 文件切换:
SELECT sys_walfile_name(sys_switch_wal()), now(), sys_walfile_name(sys_switch_wal());