Skip to content

postgresql building lock tree from text log

postgresql building lock tree from text log published on Комментариев к записи postgresql building lock tree from text log нет

1) собрать все логи с бд по локам

[postgres@p38rmisdb01 postgres]$ grep -i 'lock:' postgresql-2018-04-06.log

2) made a with query enclosing every row with ‘ and with ; at the end
—-best of all do this on different server

with b as (
select unnest(array[
 
 
'2018-06-05 00:03:33 KRAT [22266]: [17-1] db=lsd,appname=[unknown],user=app_group_master,client=192.168.5.111 DETAIL:  Process holding the lock: 24957. Wait queue: 22266.',
'2018-06-05 09:00:36 KRAT [5922]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 17343. Wait queue: 5922.',
'2018-06-05 09:01:10 KRAT [5922]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 13309. Wait queue: 5922.',
'2018-06-05 09:06:56 KRAT [9196]: [6-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 10136. Wait queue: .',
'2018-06-05 09:09:55 KRAT [28905]: [4-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL:  Process holding the lock: 24188. Wait queue: 28905.',
'2018-06-05 09:10:08 KRAT [28905]: [11-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL:  Process holding the lock: 24188. Wait queue: 28905.',
'2018-06-05 09:11:07 KRAT [28905]: [15-1] db=lsd,appname=192.168.4.22,user=lsd,client=192.168.5.111 DETAIL:  Process holding the lock: 24188. Wait queue: 28905.',
'2018-06-05 10:00:06 KRAT [342]: [5-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 23155. Wait queue: 342.',
'2018-06-05 10:00:41 KRAT [342]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 9430. Wait queue: 342.',
'2018-06-05 10:01:21 KRAT [342]: [17-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 1941. Wait queue: 342.',
'2018-06-05 10:09:23 KRAT [27511]: [411-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 18429. Wait queue: .',
'2018-06-05 10:09:24 KRAT [18429]: [6-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 30049. Wait queue: .',
'2018-06-05 10:09:24 KRAT [30049]: [4-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 18429. Wait queue: 30049.',
'2018-06-05 11:00:43 KRAT [29866]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Processes holding the lock: 2667, 27024. Wait queue: 29866.',
'2018-06-05 11:00:56 KRAT [29866]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 29718. Wait queue: 29866.',
'2018-06-05 11:14:49 KRAT [13709]: [902-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 4172. Wait queue: .',
'2018-06-05 11:14:49 KRAT [4172]: [4-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 13719. Wait queue: .',
'2018-06-05 12:00:07 KRAT [24431]: [5-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 26119. Wait queue: 24431.',
'2018-06-05 12:00:41 KRAT [24431]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 23751. Wait queue: 24431.',
'2018-06-05 12:01:02 KRAT [24431]: [17-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 11215. Wait queue: 24431.',
'2018-06-05 13:00:43 KRAT [19265]: [6-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 7261. Wait queue: 19265.',
'2018-06-05 13:00:46 KRAT [19265]: [12-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 22841. Wait queue: 19265.',
'2018-06-05 14:00:02 KRAT [13762]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 32362. Wait queue: 13762.',
'2018-06-05 14:00:06 KRAT [13762]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 17264. Wait queue: 13762.',
'2018-06-05 14:00:34 KRAT [13762]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 28253. Wait queue: 13762.',
'2018-06-05 14:00:37 KRAT [13762]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 14070. Wait queue: 13762.',
'2018-06-05 14:12:13 KRAT [30079]: [702-1] db=lsd,appname=192.168.4.72,user=n2o,client=192.168.5.111 DETAIL:  Process holding the lock: 4418. Wait queue: .',
'2018-06-05 15:00:02 KRAT [8188]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 17264. Wait queue: 8188.',
'2018-06-05 15:00:07 KRAT [8188]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 1981. Wait queue: 8188.',
'2018-06-05 15:00:40 KRAT [8188]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Processes holding the lock: 13534, 27378. Wait queue: 8188.',
'2018-06-05 15:01:16 KRAT [8188]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 16850. Wait queue: 8188.',
'2018-06-05 16:00:02 KRAT [2756]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Processes holding the lock: 1981, 25953. Wait queue: 2756.',
'2018-06-05 16:00:07 KRAT [2756]: [10-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 23839. Wait queue: 2756.',
'2018-06-05 16:00:34 KRAT [2756]: [16-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 8448. Wait queue: 2756.',
'2018-06-05 16:01:07 KRAT [2756]: [22-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 6456. Wait queue: 2756.',
'2018-06-05 17:00:02 KRAT [29439]: [4-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 23839. Wait queue: 29439.',
'2018-06-05 17:00:29 KRAT [29439]: [11-1] db=lsd,appname=psql,user=postgres,client=[local] DETAIL:  Process holding the lock: 3378. Wait queue: 29439.'
 
              ])
as txt),
analyze_lock as (
select substring(txt, 1,19)::timestamp without time zone as ttime,
split_part(split_part(txt, 'holding the lock:', 2), '. Wait', 1) as holding_lock,
split_part(split_part(txt, 'Wait queue: ', 2), '.', 1) as wait_queue,
txt from b
---where txt like '%3103%'
--- здесь можно оганичить по пользователю, по процессу
),
analyze_lock_normalize as ( --нормализованное предсталвение блокировок (все перечисленные через запятую разбиты на отдельные строки)
SELECT ttime,
       unnest(string_to_array(regexp_replace(regexp_replace(cast((holding_lock) AS CHARACTER VARYING), '^\(',''), '\)$', ''), ','))::int AS holding_lock,
       unnest(string_to_array(regexp_replace(regexp_replace(cast((wait_queue) AS CHARACTER VARYING), '^\(',''), '\)$', ''), ','))::int AS wait_queue,
             txt
from analyze_lock
--WHERE ttime::time between '14:45:00'::time and  '15:15:00'::time
--- здесь можно оганичить по времени
)
,
--- обычная паррент таблица
pp as (
select distinct holding_lock::int as id, null::int as parent_id, null::timestamp without time zone as ttime from analyze_lock_normalize t where not exists (select from analyze_lock_normalize t2 where t2.wait_queue = t.holding_lock )
union all
select distinct wait_queue::int as h_id, holding_lock::int as parrent_id, ttime from analyze_lock_normalize t --where not exists (select from analyze_lock t2 where t2.wait_queue like '%'||t.holding_lock||'%' )
),
 
ltr as (
                 WITH RECURSIVE ltr_p AS (
                         SELECT id,
                            parent_id,
                            ttime,
                                                        id::text::ltree as ltree_lock,
                                                        id::text::ltree as main_lock,
                            1 as llevel
                           FROM pp
                          WHERE parent_id IS NULL
                        UNION ALL
                         SELECT n.id,
                            n.parent_id,
                            n.ttime,
                                                        r.ltree_lock || (n.id::text::ltree),
                                                        r.main_lock,
                            r.llevel + 1
                           FROM pp n
                             JOIN ltr_p r ON n.parent_id = r.id
                        where r.llevel < 10 -- ограничиваем глубину погружения в локи (главное понять от кого они идут а не кого блокируют)
                        )
                 SELECT *, min(ttime) over (partition by main_lock) as main_lock_time, max(ttime) over (partition by main_lock) as main_lock_time_max
                   FROM ltr_p
                  ORDER BY ltree_lock
)
 
select replace(ltree_lock::text,'.',' --> ') as ltree_lock, min(ttime)/*-INTERVAL '5h'*/ as time_lock, main_lock, main_lock_time/*-INTERVAL '5h'*/, main_lock_time_max/*-INTERVAL '5h'*/ from ltr group by 1, 3, 4, 5 order by 4, 3, min(ttime) nulls first, 1
--- you can change  " -INTERVAL '5h'" to any suitable value

Result is:

10040	
10040	06.04.2018 4:38	06.04.2018 4:38
10040 --> 10018	06.04.2018 4:38	10040	06.04.2018 4:38	06.04.2018 4:38
10040 --> 10100	06.04.2018 4:38	10040	06.04.2018 4:38	06.04.2018 4:38
10040 --> 27005	06.04.2018 4:38	10040	06.04.2018 4:38	06.04.2018 4:38
20930	
20930	06.04.2018 5:07	06.04.2018 5:07
20930 --> 21075	06.04.2018 5:07	20930	06.04.2018 5:07	06.04.2018 5:07
8548	
8548	06.04.2018 6:41	06.04.2018 6:41
8548 --> 30636	06.04.2018 6:41	8548	06.04.2018 6:41	06.04.2018 6:41
27429	
27429	06.04.2018 7:01	06.04.2018 7:01
27429 --> 3837	06.04.2018 7:01	27429	06.04.2018 7:01	06.04.2018 7:01
30857	
30857	06.04.2018 7:02	06.04.2018 7:02
30857 --> 3837	06.04.2018 7:02	30857	06.04.2018 7:02	06.04.2018 7:02
10219	
10219	06.04.2018 7:49	06.04.2018 7:57
10219 --> 31116	06.04.2018 7:49	10219	06.04.2018 7:49	06.04.2018 7:57
10219 --> 31116 --> 9138	06.04.2018 7:53	10219	06.04.2018 7:49	06.04.2018 7:57
10219 --> 31116 --> 31119	06.04.2018 7:55	10219	06.04.2018 7:49	06.04.2018 7:57
10219 --> 31116 --> 31119 --> 3410	06.04.2018 7:56	10219	06.04.2018 7:49	06.04.2018 7:57
10219 --> 31116 --> 9138 --> 31118	06.04.2018 7:57	10219	06.04.2018 7:49	06.04.2018 7:57
30944	
30944	06.04.2018 7:56	06.04.2018 7:56
30944 --> 31114	06.04.2018 7:56	30944	06.04.2018 7:56	06.04.2018 7:56
31110	
31110	06.04.2018 7:56	06.04.2018 7:56
31110 --> 12204	06.04.2018 7:56	31110	06.04.2018 7:56	06.04.2018 7:56
31110 --> 12204 --> 3411	06.04.2018 7:56	31110	06.04.2018 7:56	06.04.2018 7:56
15286	
15286	06.04.2018 8:03	06.04.2018 8:03
15286 --> 13436	06.04.2018 8:03	15286	06.04.2018 8:03	06.04.2018 8:03
18662	
18662	06.04.2018 8:04	06.04.2018 8:04
18662 --> 13436	06.04.2018 8:04	18662	06.04.2018 8:04	06.04.2018 8:04
22066	
22066	06.04.2018 8:04	06.04.2018 8:04
22066 --> 13436	06.04.2018 8:04	22066	06.04.2018 8:04	06.04.2018 8:04
28885	
28885	06.04.2018 9:02	06.04.2018 9:02
28885 --> 22676	06.04.2018 9:02	28885	06.04.2018 9:02	06.04.2018 9:02

as you can see the PID 10219 is the cause of cascade locks

4) also you can check what that PID do before locking

[postgres@p38rmisdb01 postgres]$ grep -i ‘\[10219\]’ postgresql-2018-04-06.log

ps: thx to Amir for material !

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *

Этот сайт использует Akismet для борьбы со спамом. Узнайте, как обрабатываются ваши данные комментариев.

Primary Sidebar