合 MSSQL数据库一直显示“正在恢复”或“可疑”的解决办法
Tags: 故障处理MSSQLSQL Server可疑正在恢复
现象
正在恢复
由于磁盘空间满了,数据库处于“恢复挂起”状态,清理磁盘后,重启SQL Server服务器后,数据库一直显示“正在恢复”,用sp_who2查询显示该库处于“DB STARTUP”状态。
查询数据库阻塞,发现有个系统进程9在阻塞,然后使用kill 9
报错:只能终止用户进程
1 2 | 消息 6107,级别 14,状态 1,第 3 行 只能终止用户进程。 |
数据库可疑
若一直“正在恢复”,或显示“可疑”(Suspect),在 SQL Server 中,如果某个数据库被标记为“可疑”(Suspect),这通常意味着该数据库的事务日志文件(.ldf)可能损坏或丢失。当一个数据库的日志文件不可用时,SQL Server 无法保证数据库的一致性,因此会将数据库状态设置为“可疑”。
可以查询告警日志是否有如下输出:
1 2 3 4 | An error occurred during recovery, preventing the database 'AB_STORE' (database ID 5) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support. During redoing of a logged operation in database 'AB_STORE', an error occurred at log record ID (1886659:12025:60). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database. |
若有,则说明恢复出现问题,可以使用本文的方法2进行恢复。
分析
1 | SELECT name, state_desc FROM sys.databases; |
恢复状态通常是“RECOVERING”。
首先查看进程的状态变化,物理IO和逻辑IO,如果没有变化,则先做1次SQL Server实例的关闭和启动,然后再查看进程的状态变化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | SELECT top 1000 (CASE WHEN ActualElapsedTime < 36000000 THEN '0' ELSE '' END + RTRIM(ActualElapsedTime/1000/3600) + ':' + RIGHT('0' + RTRIM((ActualElapsedTime/1000)%3600/60), 2) + ':' + RIGHT('0' + RTRIM((ActualElapsedTime/1000)%60), 2)) ActualElapsedTime, a.wait_time2, a.spid, a.thread_cnt, -- a.dop, blocking_session_id blocking_spid, a.database_id, db_name(a.database_id) dbname, start_time, sp.last_batch, wait_type, last_wait_type, wait_resource, open_transaction_count, a.status, a.command, a.total_elapsed_time/1000 total_elapsed_time_s, a.cpu_time/1000 cpu_time, a.writes, a.logical_reads, sp.physical_io, a.row_count, a.granted_query_memory*8/1024 granted_query_memory_MB, a.percent_complete, a.Time_Remaining, -- a.sql_handle, SUBSTRING (b.text,a.statement_start_offset/2, (CASE WHEN a.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX), b.text)) * 2+ 2 ELSE a.statement_end_offset END - a.statement_start_offset)/2+ 2) Internal_SQL_TEXT, b.text Parent_SQL_TEXT, -- a.plan_handle, -- c.query_plan, (select query_plan from sys.dm_exec_query_plan(a.plan_handle) n1) query_plan, @@spid current_spid, 'kill '+cast(a.spid as varchar) kill1 FROM ( select nb.session_id spid, nb.total_elapsed_time/1000 total_elapsed_time_s, (CASE WHEN estimated_completion_time < 36000000 THEN '0' ELSE '' END + RTRIM(estimated_completion_time/1000/3600) + ':' + RIGHT('0' + RTRIM((estimated_completion_time/1000)%3600/60), 2) + ':' + RIGHT('0' + RTRIM((estimated_completion_time/1000)%60), 2)) AS [Time_Remaining], (SELECT count(*) FROM master.dbo.sysprocesses sp where sp.spid=nb.session_id) thread_cnt, DATEDIFF(MILLISECOND, nb.start_time, getdate()) ActualElapsedTime, (CASE WHEN wait_time < 36000000 THEN '0' ELSE '' END + RTRIM(wait_time/1000/3600) + ':' + RIGHT('0' + RTRIM((wait_time/1000)%3600/60), 2) + ':' + RIGHT('0' + RTRIM((wait_time/1000)%60), 2)) AS wait_time2, nb.* from sys.dm_exec_requests nb WITH(NOLOCK) ) a LEFT JOIN master.dbo.sysprocesses sp WITH(NOLOCK) on sp.spid=a.spid and sp.ecid=0 LEFT JOIN ( SELECT n1.session_id ,n2.text FROM sys.dm_exec_requests n1 WITH(NOLOCK) CROSS apply sys.dm_exec_sql_text ( n1.sql_handle ) n2 ) b on a.spid=b.session_id WHERE a.spid <> @@spid and (a.status not in ('sleeping','background') or (a.status in ('sleeping','background') and a.spid IN (select blocking_session_id from sys.dm_exec_requests nb)) or (a.status in ('sleeping','background') and a.open_transaction_count > 0 ) or ( a.status in ('background') and a.command='DB STARTUP' ) ) -- and a.spid in (141,135) order by cast(ActualElapsedTime as numeric) desc GO |
如果物理IO和逻辑IO有变化,则说明在恢复中,只需要等待即可,不需要做任何操作。
恢复进入到后期时,percent_complete列也会指示恢复的进度。
若没有变化,则可能需要按照如下的几种方式来恢复。
请不要轻易使用如下方法。
解决
方法1
1 2 3 4 | -- RESTORE database <数据库名> with norecovery; -- 执行以下命令来强制结束数据库的恢复: ALTER DATABASE [数据库名] SET OFFLINE; |