Skip to content

Commit 08f54d1

Browse files
author
xiongcc
committed
添加第29章:How to work with arrays, part 2
1 parent 56c5eb8 commit 08f54d1

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

How to work with arrays, part 2.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
## How to work with arrays, part 2
2+
3+
> 我每天都会发布一篇新的 PostgreSQL "howto" 文章。加入我的旅程吧 — 订阅、提供反馈、分享!
4+
5+
第一部分可以在[此处](https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0028_how_to_work_with_arrays_part_1.md)找到。
6+
7+
## 如何搜索
8+
9+
数组搜索最有趣的操作符之一是 `<@` — 表示"被包含":
10+
11+
```sql
12+
nik=# select array[2, 1] <@ array[1, 3, 2];
13+
?column?
14+
----------
15+
t
16+
(1 row)
17+
```
18+
19+
当你需要检查某个标量值是否存在于数组中时,只需创建一个单元素数组并将 `<@` 应用于两个数组。例如,检查 "2" 是否包含在要分析的数组中:
20+
21+
```sql
22+
nik=# select array[2] <@ array[1, 3, 2];
23+
?column?
24+
----------
25+
t
26+
(1 row)
27+
```
28+
29+
你可以通过创建索引来加速使用 `<@` 的搜索操作。让我们创建一个 GIN 索引,并比较一个包含一百万行的表的查询计划:
30+
31+
```sql
32+
nik=# create table t1 (val int8[]);
33+
CREATE TABLE
34+
35+
nik=# insert into t1
36+
select array(
37+
select round(random() * 1000 + i)
38+
from generate_series(1, 10)
39+
limit (1 + random() * 10)::int
40+
)
41+
from generate_series(1, 1000000) as i;
42+
INSERT 0 1000000
43+
44+
nik=# select * from t1 limit 3;
45+
val
46+
-----------------------
47+
{390,13,405,333,358,592,756,677}
48+
{463,677,585,191,425,143}
49+
{825,918,303,602}
50+
(3 rows)
51+
52+
nik=# vacuum analyze t1;
53+
VACUUM
54+
```
55+
56+
我们创建了一个有 100 万行的单列表,每行包含具有各种数字的 `int8[]` 数组。
57+
58+
现在,搜索数组值中包含 "123" 的所有行:
59+
60+
```sql
61+
nik=# explain (analyze, buffers) select * from t1 where array[123]::int8[] <@ val;
62+
QUERY PLAN
63+
---------------------------------------------------------------------------------------------------------------------
64+
Gather (cost=1000.00..18950.33 rows=5000 width=68) (actual time=0.212..100.572 rows=2 loops=1)
65+
Workers Planned: 2
66+
Workers Launched: 2
67+
Buffers: shared hit=12554
68+
-> Parallel Seq Scan on t1 (cost=0.00..17450.33 rows=2083 width=68) (actual time=61.293..94.212 rows=1 loops=3)
69+
Filter: ('{123}'::bigint[] <@ val)
70+
Rows Removed by Filter: 333333
71+
Buffers: shared hit=12554
72+
Planning:
73+
Buffers: shared hit=6
74+
Planning Time: 0.316 ms
75+
Execution Time: 100.586 ms
76+
(12 rows)
77+
```
78+
79+
共有 12554 次缓冲区命中,约为 12554 * 8 / 1024 ~= 98 MiB,只需找到包含 "123" 的两行数据 — 注意 "rows=2"。效率不高,因为这里是 Seq Scan。
80+
81+
现在,创建一个 GIN 索引:
82+
83+
```sql
84+
nik=# create index on t1 using gin(val);
85+
CREATE INDEX
86+
```
87+
88+
然后再运行相同的查询:
89+
90+
```sql
91+
nik=# explain (analyze, buffers) select * from t1 where array[123]::int8[] <@ val;
92+
QUERY PLAN
93+
-----------------------------------------------------------------------------------------------------------------------
94+
Bitmap Heap Scan on t1 (cost=44.75..4260.25 rows=5000 width=68) (actual time=0.021..0.022 rows=2 loops=1)
95+
Recheck Cond: ('{123}'::bigint[] <@ val)
96+
Heap Blocks: exact=1
97+
Buffers: shared hit=5
98+
-> Bitmap Index Scan on t1_val_idx (cost=0.00..43.50 rows=5000 width=0) (actual time=0.016..0.016 rows=2 loops=1)
99+
Index Cond: (val @> '{123}'::bigint[])
100+
Buffers: shared hit=4
101+
Planning:
102+
Buffers: shared hit=16
103+
Planning Time: 0.412 ms
104+
Execution Time: 0.068 ms
105+
(11 rows)
106+
```
107+
108+
没有顺序扫描,并且只有 5 次缓冲区命中,大约 40 KiB 的内存就能找到所需的 2 行数据。这解释了为什么执行时间从 ~100ms 缩短到 ~0.07ms,这快了 ~1400 倍。
109+
110+
更多操作符请参见[官方文档](https://www.postgresql.org/docs/current/functions-array.html#FUNCTIONS-ARRAY)

0 commit comments

Comments
 (0)