This article about fixing tempdb growth due to Version Store in Always On Secondary replica. This is in continuation of the article Tempdb space issue in AlwaysOn Secondary Replica. Hence please go through the first article and then comeback here so that it make more sense to you.
To summarize till now we have found that version store is consuming most of the space in tempdb. Hence we tried to find out the transaction consuming version store and detected an old system session id having lastwaittype as “REDO_THREAD_PENDING_WORK”.
As I mentioned in the first article this is a hint of some transaction in primary causing this. Furthermore you can check the below performance counters:
- Longest transaction that’s currently running: SQLServer:Transactions\Longest Transaction Running Time.
- The current size of all the version stores: SQLServer:Transactions\Version Store Size (KB) .
- Rate of version store cleanup in tempdb: SQLServer:Transactions\Version Cleanup rate (KB/s).
- Rate of version store generation: SQLServer:Transactions\Version Generation rate (KB/s).
Conversely you can use the below query in secondary replica facing the issue.
Query to check the above mentioned performance counters
SELECT * FROM sys.dm_os_performance_counters WHERE counter_name IN ( 'Longest Transaction Running Time' ,'Version Store Size (KB)' ,'Version Cleanup rate (KB/s)' ,'Version Generation rate (KB/s)' )
Sample output:
As you can see the “Longest Transaction Running Time” counter has a very high Cntr_Value. This is another key point and compelling evidence that there is long running transaction in primary replica.
Additionally the other three counters related to Version store shows that the the long running transaction is consuming the version store.
Now execute the below query again. This time in Primary Replica.
Query to find the transaction utilizing version store:
SELECT GETDATE() AS runtime ,a.* ,b.kpid ,b.blocked ,b.lastwaittype ,b.waitresource ,db_name(b.dbid) AS database_name ,b.cpu ,b.physical_io ,b.memusage ,b.login_time ,b.last_batch ,b.open_tran ,b.STATUS ,b.hostname ,b.program_name ,b.cmd ,b.loginame ,request_id FROM sys.dm_tran_active_snapshot_database_transactions a INNER JOIN sys.sysprocesses b ON a.session_id = b.spid
Sample Output:
In the above sample output the most important columns are marked in red. If you have noticed, the “login_time and “last_batch” time is very old. Additionally the status is “sleeping”. Now you need to kill the corresponding session ids one by one and monitor the version store utilization as shown in the first article.
Points often overlooked is to kill one session at a time and then monitor the version store utilization in secondary replica. The reason behind this is, next time the version clean up process runs it will be able to clear the version store if you have killed the right one. This way you will be able to find the culprit transaction as well.
Also not to mention that you may need to contact the application team before killing any spid depending upon your environment.
Hope this helps. If you find any new scenario please put that in the comment section.