合 PG并发创建和重建索引之CONCURRENTLY

发布日期 2021年7月16日 · 已更新 2023年6月25日

0 1906 5

Tags： PG PostgreSQL 优化索引性能并发 CONCURRENTLY reindex

👉 本文共约3758个字，系统预计阅读时间或需15分钟。

简介
示例
总结
从pg12开始，REINDEX语句带有CONCURRENTLY参数
总结

简介

创建索引可能会干扰数据库的常规操作。通常 PostgreSQL会锁住要被索引的表，让它不能被写入，并且用该表上的一次扫描来执行整个索引的创建。其他事务仍然可以读取表，但是如果它们尝试在该表上进行插入、更新或者删除，它们会被阻塞直到索引创建完成。如果系统是一个生产数据库，这可能会导致严重的后果。索引非常大的表可能会需要很多个小时，而且即使是较小的表，在创建索引过程中阻塞写入者一段时间在生产系统中也是不能接受的。

PostgreSQL支持创建索引时不阻塞写入。这种方法通过指定CREATE INDEX的CONCURRENTLY选项实现。当使用这个选项时，PostgreSQL必须执行该表的两次扫描，此外它必须等待所有现有可能会修改或者使用该索引的事务终止。因此这种方法比起标准索引创建过程来说要做更多工作并且需要更多时间。不过，由于它允许在创建索引时继续普通操作，这种方式对于在生产环境中增加新索引很有用。当然，由索引创建带来的额外 CPU 和 I/O 开销可能会拖慢其他操作。

当使用了CONCURRENTLY这个选项时，PostgreSQL在创建索引时不会取得任何会阻止该表上并发插入、更新或者删除的锁。而标准的索引创建将会把表锁住以阻止对表的写（但不阻塞读），这种锁定会持续到索引创建完毕。

对于临时表，CREATE INDEX始终是非并发的，因为没有其他会话可以访问它们，并且创建非并发索引的成本更低。

在并发索引创建中，索引实际上在一个事务中被录入到系统目录，然后在两个事务中发生两次表扫描。在每一次表扫描之前，索引创建必须等待已经修改了表的现有事务终止。在第二次扫描之后，索引创建必须等待任何持有早于第二次扫描的快照（见第 13 章）的事务终止。然后该索引最终能被标记为准备好使用，并且CREATE INDEX命令终止。不过即便那样，该索引也不是立刻可以用于查询：在最坏的情况下，只要早于索引创建开始时存在的事务存在，该索引就无法使用。

如果在扫描表时出现问题，例如死锁或者唯一索引中的唯一性被违背， CREATE INDEX将会失败，但留下一个“不可用” 的索引。这个索引会被查询所忽略，因为它可能不完整。不过它仍将消耗更新开销。psql的\d命令将把这类索引报告为 INVALID：

postgres=# \d tab
       Table "public.tab"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 col    | integer |           |          | 
Indexes:
    "idx" btree (col) INVALID

postgres=# \d tab

Table "public.tab"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+---------

col | integer | | |

Indexes:

"idx" btree (col) INVALID

这种情况下推荐的恢复方法是删除该索引并且尝试再次执行CREATE INDEX CONCURRENTLY。（另一种可能性是用REINDEX INDEX CONCURRENTLY重建该索引）。

如果是索引重建，可以再在原基础上建立一个不同名的相同索引，然后取消老的索引。

并发创建一个唯一索引时需要注意的另一点是，当第二次表扫描开始时，唯一约束已经被强制在其他事务上。这意味着在该索引变得可用之前，其他查询中可能就会报告该约束被违背，或者甚至在索引创建最终失败的情况中也是这样。还有，如果在第二次扫描时发生失败，“无效的”索引也会继续强制它的唯一性约束。

表达式索引和部分索引的并发创建也被支持。在这些表达式计算过程中发生的错误可能导致和上述唯一约束违背类似的行为。

常规索引创建允许在同一个表上同时创建其他常规索引，但是在一个表上同时只能有一个并发索引创建发生。在两种情况下，在索引被创建时不允许表的模式修改。另一个不同是，一个常规CREATE INDEX 命令可以在一个事务块中执行，但是 CREATE INDEX CONCURRENTLY不行。

当前不支持在分区表上并发生成索引。然而，你可以在每个分区上单独的并发创建索引，然后最终以非并发的方式创建分区索引，以减少对分区表的写入被锁定的时间。在这种情况下，生成分区索引仅是元数据操作。

示例

PG从较早的版本就支持在线并发创建索引（不长时间阻塞DML），这对于如今停机时间越来越少的OLTP系统来说是非常重要的特性之一：

1、先不使用concurrently参数创建索引，此时insert操作会阻塞，但是创建索引所消耗的时间较短：

表数据量约为500万

postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
postgres=#
postgres=# \timing
postgres=# create table test_idx(id serial primary key, note text);
CREATE TABLE
Time: 59.128 ms
Timing is on.
postgres=# insert into test_idx(note) select generate_series(1,5000000);
INSERT 0 5000000
Time: 37967.783 ms (00:37.968)

postgres=# create index idx_test_idx on test_idx(note);
CREATE INDEX
Time: 19801.654 ms (00:19.802)

postgres=# select version();

version

---------------------------------------------------------------------------------------------------------

PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

(1 row)

postgres=#

postgres=# \timing

postgres=# create table test_idx(id serial primary key, note text);

CREATE TABLE

Time: 59.128 ms

Timing is on.

postgres=# insert into test_idx(note) select generate_series(1,5000000);

INSERT 0 5000000

Time: 37967.783 ms (00:37.968)

postgres=# create index idx_test_idx on test_idx(note);

CREATE INDEX

Time: 19801.654 ms (00:19.802)

可见创建索引消耗时间为19秒。

但是在创建索引的过程中，另开一个窗口来执行的insert操作会被阻塞，直到索引创建完成才会执行：

delete  from test_idx where id<=10 ;

1 2	delete from test_idx where id<=10 ;

2、使用concurrently参数，并发创建索引：

postgres=# create index concurrently idx_test_idx2 on test_idx(note);
CREATE INDEX
Time: 21853.658 ms (00:21.854)

postgres=# create index concurrently idx_test_idx2 on test_idx(note);

CREATE INDEX

Time: 21853.658 ms (00:21.854)

可见所消耗的时间约为21秒，相比非concurrently方式来说，时间变长。

但是在创建索引的过程中，另开一个窗口来执行的insert操作等均是无阻塞的：

postgres=#  insert into test_idx(note) select generate_series(5000000,5000010);
INSERT 0 11

1 2	postgres=# insert into test_idx(note) select generate_series(5000000,5000010); INSERT 0 11

两种方法创建的索引大小并无区别：

postgres=# select pg_size_pretty(pg_relation_size('idx_test_idx')),pg_size_pretty(pg_relation_size('idx_test_idx2'));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 107 MB         | 107 MB
(1 row)

postgres=# select pg_size_pretty(pg_relation_size('idx_test_idx')),pg_size_pretty(pg_relation_size('idx_test_idx2'));

pg_size_pretty | pg_size_pretty

----------------+----------------

107 MB | 107 MB

(1 row)

3、在使用concurrently选项时，若强行取消创建操作，那么会留下一个无效的索引。

ostgres=# \d test_idx
                            Table "public.test_idx"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 id     | integer |           | not null | nextval('test_idx_id_seq'::regclass)
 note   | text    |           |          |
Indexes:
    "test_idx_pkey" PRIMARY KEY, btree (id)

postgres=# create unique index concurrently idx_test_idx6 on test_idx(note);
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2110.970 ms (00:02.111)
postgres=# \d test_idx
                            Table "public.test_idx"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 id     | integer |           | not null | nextval('test_idx_id_seq'::regclass)
 note   | text    |           |          |
Indexes:
    "test_idx_pkey" PRIMARY KEY, btree (id)
    "idx_test_idx6" UNIQUE, btree (note) INVALID

postgres=# insert into test_idx(note) select generate_series(1,10);
INSERT 0 10    

postgres=# select * from test_idx where note::int=10;
    id    | note
----------+------
 14777480 | 10
 19777480 | 10
(2 rows)

ostgres=# \d test_idx

Table "public.test_idx"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+--------------------------------------

id | integer | | not null | nextval('test_idx_id_seq'::regclass)

note | text | | |

Indexes:

"test_idx_pkey" PRIMARY KEY, btree (id)

postgres=# create unique index concurrently idx_test_idx6 on test_idx(note);

Cancel request sent

ERROR: canceling statement due to user request

Time: 2110.970 ms (00:02.111)

postgres=# \d test_idx

Table "public.test_idx"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+--------------------------------------

id | integer | | not null | nextval('test_idx_id_seq'::regclass)

note | text | | |

Indexes:

"test_idx_pkey" PRIMARY KEY, btree (id)

"idx_test_idx6" UNIQUE, btree (note) INVALID

postgres=# insert into test_idx(note) select generate_series(1,10);

INSERT 0 10

postgres=# select * from test_idx where note::int=10;

id | note

----------+------

14777480 | 10

19777480 | 10

(2 rows)

这种情况下推荐的恢复方法是

1、删除该索引并且尝试再次执行CREATE INDEX CONCURRENTLY。如果是索引重建，可以再在原基础上建立一个不同名的相同索引，然后删除老的索引。

2、用REINDEX INDEX CONCURRENTLY重建该索引。

postgres=# REINDEX INDEX CONCURRENTLY idx_test_idx6;
ERROR:  could not create unique index "idx_test_idx6_ccnew"
DETAIL:  Key (note)=(1) is duplicated.
Time: 2813.730 ms (00:02.814)

-- 有重复数据，我们删除重复数据后，再重建索引
postgres=# select note,count(*) from test_idx group by note having count(*) > 1;
 note | count
------+-------
 6    |     2
 7    |     2
 2    |     2
 4    |     2
 10   |     2
 1    |     2
 9    |     2
 8    |     2
 5    |     2
 3    |     2
(10 rows)

Time: 8327.027 ms (00:08.327)

# 注意这里的优化写法，加上note过滤
postgres=# delete from test_idx where ctid not in (select min(ctid) from test_idx where note::int<=10 group by note) and note::int<=10;
DELETE 10
Time: 3069.750 ms (00:03.070)

postgres=# REINDEX INDEX CONCURRENTLY idx_test_idx6;
REINDEX
Time: 21578.902 ms (00:21.579)
postgres=# \d test_idx
                            Table "public.test_idx"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 id     | integer |           | not null | nextval('test_idx_id_seq'::regclass)
 note   | text    |           |          |
Indexes:
    "test_idx_pkey" PRIMARY KEY, btree (id)
    "idx_test_idx6" UNIQUE, btree (note)
    "idx_test_idx6_ccnew" UNIQUE, btree (note) INVALID

postgres=# drop index idx_test_idx6_ccnew;
DROP INDEX
Time: 78.543 ms
postgres=# \d test_idx
                            Table "public.test_idx"
 Column |  Type   | Collation | Nullable |               Default
--------+---------+-----------+----------+--------------------------------------
 id     | integer |           | not null | nextval('test_idx_id_seq'::regclass)
 note   | text    |           |          |
Indexes:
    "test_idx_pkey" PRIMARY KEY, btree (id)
    "idx_test_idx6" UNIQUE, btree (note)

postgres=# REINDEX INDEX CONCURRENTLY idx_test_idx6;

ERROR: could not create unique index "idx_test_idx6_ccnew"

DETAIL: Key (note)=(1) is duplicated.

Time: 2813.730 ms (00:02.814)

-- 有重复数据，我们删除重复数据后，再重建索引

postgres=# select note,count(*) from test_idx group by note having count(*) > 1;

note | count

------+-------

6 | 2

7 | 2

2 | 2

4 | 2

10 | 2

1 | 2

9 | 2

8 | 2

5 | 2

3 | 2

(10 rows)

Time: 8327.027 ms (00:08.327)

# 注意这里的优化写法，加上note过滤

postgres=# delete from test_idx where ctid not in (select min(ctid) from test_idx where note::int<=10 group by note) and note::int<=10;

DELETE 10

Time: 3069.750 ms (00:03.070)

postgres=# REINDEX INDEX CONCURRENTLY idx_test_idx6;

REINDEX

Time: 21578.902 ms (00:21.579)

postgres=# \d test_idx

Table "public.test_idx"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+--------------------------------------

id | integer | | not null | nextval('test_idx_id_seq'::regclass)

note | text | | |

Indexes:

"test_idx_pkey" PRIMARY KEY, btree (id)

"idx_test_idx6" UNIQUE, btree (note)

"idx_test_idx6_ccnew" UNIQUE, btree (note) INVALID

postgres=# drop index idx_test_idx6_ccnew;

DROP INDEX

Time: 78.543 ms

postgres=# \d test_idx

Table "public.test_idx"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+--------------------------------------

id | integer | | not null | nextval('test_idx_id_seq'::regclass)

note | text | | |

Indexes:

"test_idx_pkey" PRIMARY KEY, btree (id)

"idx_test_idx6" UNIQUE, btree (note)

记得，需要删除INVALID状态的索引。

总结

最后对并发创建索引需要注意的地方做一个总结：

1.并发创建索引需要扫描表两次，等待表事务三次，需要消耗更多的资源以及等待更长的时间。

本人提供Oracle(OCP、OCM)、MySQL(OCP)、PostgreSQL(PGCA、PGCE、PGCM)等数据库的培训和考证业务，私聊QQ646634621或微信dbaup66，谢谢！

后续精彩内容已被站长无情隐藏，请输入验证码解锁本文！

获取验证码：请先关注本站微信公众号，然后回复“验证码”，获取验证码。在微信里搜索“AiDBA”或者“dbaup6”或者微信扫描右侧二维码都可以关注本站微信公众号。

打赏赞(5)

标签： PG PostgreSQL 优化索引性能并发 CONCURRENTLY reindex

小麦苗

学习或考证，均可联系麦老师，请加微信db_bao或QQ646634621

发表回复取消回复

要发表评论，您必须先登录。

合 PG并发创建和重建索引之CONCURRENTLY

简介

示例

总结

相关文章

您可能还喜欢...

发表回复取消回复

网站公告

网站寄语

本站其它工具

搜索本网站

标签云☁

网站日历

网站归档

网站分类

2024 年 10 月
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

合 PG并发创建和重建索引之CONCURRENTLY

简介

示例

总结

相关文章

您可能还喜欢...

使用OGG传统模式将PG同步到kafka(全量+增量)

PG修改参数文件并重启后参数不生效

【PG】小麦苗PGCM第4期证书邮寄

发表回复 取消回复

网站公告

网站寄语

本站其它工具

搜索本网站

标签云☁

网站日历

网站归档

网站分类

发表回复取消回复