|
| 1 | +## How to work with arrays, part 2 |
| 2 | + |
| 3 | +> 我每天都会发布一篇新的 PostgreSQL "howto" 文章。加入我的旅程吧 — 订阅、提供反馈、分享! |
| 4 | +
|
| 5 | +第一部分可以在[此处](https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0028_how_to_work_with_arrays_part_1.md)找到。 |
| 6 | + |
| 7 | +## 如何搜索 |
| 8 | + |
| 9 | +数组搜索最有趣的操作符之一是 `<@` — 表示"被包含": |
| 10 | + |
| 11 | +```sql |
| 12 | +nik=# select array[2, 1] <@ array[1, 3, 2]; |
| 13 | + ?column? |
| 14 | +---------- |
| 15 | + t |
| 16 | +(1 row) |
| 17 | +``` |
| 18 | + |
| 19 | +当你需要检查某个标量值是否存在于数组中时,只需创建一个单元素数组并将 `<@` 应用于两个数组。例如,检查 "2" 是否包含在要分析的数组中: |
| 20 | + |
| 21 | +```sql |
| 22 | +nik=# select array[2] <@ array[1, 3, 2]; |
| 23 | + ?column? |
| 24 | +---------- |
| 25 | + t |
| 26 | +(1 row) |
| 27 | +``` |
| 28 | + |
| 29 | +你可以通过创建索引来加速使用 `<@` 的搜索操作。让我们创建一个 GIN 索引,并比较一个包含一百万行的表的查询计划: |
| 30 | + |
| 31 | +```sql |
| 32 | +nik=# create table t1 (val int8[]); |
| 33 | +CREATE TABLE |
| 34 | + |
| 35 | +nik=# insert into t1 |
| 36 | +select array( |
| 37 | + select round(random() * 1000 + i) |
| 38 | + from generate_series(1, 10) |
| 39 | + limit (1 + random() * 10)::int |
| 40 | +) |
| 41 | +from generate_series(1, 1000000) as i; |
| 42 | +INSERT 0 1000000 |
| 43 | + |
| 44 | +nik=# select * from t1 limit 3; |
| 45 | + val |
| 46 | +----------------------- |
| 47 | + {390,13,405,333,358,592,756,677} |
| 48 | + {463,677,585,191,425,143} |
| 49 | + {825,918,303,602} |
| 50 | +(3 rows) |
| 51 | + |
| 52 | +nik=# vacuum analyze t1; |
| 53 | +VACUUM |
| 54 | +``` |
| 55 | + |
| 56 | +我们创建了一个有 100 万行的单列表,每行包含具有各种数字的 `int8[]` 数组。 |
| 57 | + |
| 58 | +现在,搜索数组值中包含 "123" 的所有行: |
| 59 | + |
| 60 | +```sql |
| 61 | +nik=# explain (analyze, buffers) select * from t1 where array[123]::int8[] <@ val; |
| 62 | + QUERY PLAN |
| 63 | +--------------------------------------------------------------------------------------------------------------------- |
| 64 | + Gather (cost=1000.00..18950.33 rows=5000 width=68) (actual time=0.212..100.572 rows=2 loops=1) |
| 65 | + Workers Planned: 2 |
| 66 | + Workers Launched: 2 |
| 67 | + Buffers: shared hit=12554 |
| 68 | + -> Parallel Seq Scan on t1 (cost=0.00..17450.33 rows=2083 width=68) (actual time=61.293..94.212 rows=1 loops=3) |
| 69 | + Filter: ('{123}'::bigint[] <@ val) |
| 70 | + Rows Removed by Filter: 333333 |
| 71 | + Buffers: shared hit=12554 |
| 72 | + Planning: |
| 73 | + Buffers: shared hit=6 |
| 74 | + Planning Time: 0.316 ms |
| 75 | + Execution Time: 100.586 ms |
| 76 | +(12 rows) |
| 77 | +``` |
| 78 | + |
| 79 | +共有 12554 次缓冲区命中,约为 12554 * 8 / 1024 ~= 98 MiB,只需找到包含 "123" 的两行数据 — 注意 "rows=2"。效率不高,因为这里是 Seq Scan。 |
| 80 | + |
| 81 | +现在,创建一个 GIN 索引: |
| 82 | + |
| 83 | +```sql |
| 84 | +nik=# create index on t1 using gin(val); |
| 85 | +CREATE INDEX |
| 86 | +``` |
| 87 | + |
| 88 | +然后再运行相同的查询: |
| 89 | + |
| 90 | +```sql |
| 91 | +nik=# explain (analyze, buffers) select * from t1 where array[123]::int8[] <@ val; |
| 92 | + QUERY PLAN |
| 93 | +----------------------------------------------------------------------------------------------------------------------- |
| 94 | + Bitmap Heap Scan on t1 (cost=44.75..4260.25 rows=5000 width=68) (actual time=0.021..0.022 rows=2 loops=1) |
| 95 | + Recheck Cond: ('{123}'::bigint[] <@ val) |
| 96 | + Heap Blocks: exact=1 |
| 97 | + Buffers: shared hit=5 |
| 98 | + -> Bitmap Index Scan on t1_val_idx (cost=0.00..43.50 rows=5000 width=0) (actual time=0.016..0.016 rows=2 loops=1) |
| 99 | + Index Cond: (val @> '{123}'::bigint[]) |
| 100 | + Buffers: shared hit=4 |
| 101 | + Planning: |
| 102 | + Buffers: shared hit=16 |
| 103 | + Planning Time: 0.412 ms |
| 104 | + Execution Time: 0.068 ms |
| 105 | +(11 rows) |
| 106 | +``` |
| 107 | + |
| 108 | +没有顺序扫描,并且只有 5 次缓冲区命中,大约 40 KiB 的内存就能找到所需的 2 行数据。这解释了为什么执行时间从 ~100ms 缩短到 ~0.07ms,这快了 ~1400 倍。 |
| 109 | + |
| 110 | +更多操作符请参见[官方文档](https://www.postgresql.org/docs/current/functions-array.html#FUNCTIONS-ARRAY)。 |
0 commit comments