105
105
the explanation below.
106
106
</entry>
107
107
</row>
108
+ <row>
109
+ <entry>
110
+ <function>strict_word_similarity(text, text)</function>
111
+ <indexterm><primary>strict_word_similarity</primary></indexterm>
112
+ </entry>
113
+ <entry><type>real</type></entry>
114
+ <entry>
115
+ Same as <function>word_similarity(text, text)</function>, but forces
116
+ extent boundaries to match word boundaries.
117
+ </entry>
118
+ </row>
108
119
<row>
109
120
<entry><function>show_limit()</function><indexterm><primary>show_limit</primary></indexterm></entry>
110
121
<entry><type>real</type></entry>
157
168
a part of the word.
158
169
</para>
159
170
171
+ <para>
172
+ At the same time, <function>strict_word_similarity(text, text)</function>
173
+ has to select an extent that matches word boundaries. In the example above,
174
+ <function>strict_word_similarity(text, text)</function> would select the
175
+ extent <literal>{" w"," wo","wor","ord","rds", ds "}</literal>, which
176
+ corresponds to the whole word <literal>'words'</literal>.
177
+
178
+ <programlisting>
179
+ # SELECT strict_word_similarity('word', 'two words'), similarity('word', 'words');
180
+ strict_word_similarity | similarity
181
+ ------------------------+------------
182
+ 0.571429 | 0.571429
183
+ (1 row)
184
+ </programlisting>
185
+ </para>
186
+
187
+ <para>
188
+ Thus, the <function>strict_word_similarity(text, text)</function> function
189
+ is useful for finding similar subsets of whole words, while
190
+ <function>word_similarity(text, text)</function> is more suitable for
191
+ searching similar parts of words.
192
+ </para>
193
+
160
194
<table id="pgtrgm-op-table">
161
195
<title><filename>pg_trgm</filename> Operators</title>
162
196
<tgroup cols="3">
196
230
Commutator of the <literal><%</literal> operator.
197
231
</entry>
198
232
</row>
233
+ <row>
234
+ <entry><type>text</type> <literal><<%</literal> <type>text</type></entry>
235
+ <entry><type>boolean</type></entry>
236
+ <entry>
237
+ Returns <literal>true</literal> if its second argument has a continuous
238
+ extent of an ordered trigram set that matches word boundaries,
239
+ and its similarity to the trigram set of the first argument is greater
240
+ than the current strict word similarity threshold set by the
241
+ <varname>pg_trgm.strict_word_similarity_threshold</varname> parameter.
242
+ </entry>
243
+ </row>
244
+ <row>
245
+ <entry><type>text</type> <literal>%>></literal> <type>text</type></entry>
246
+ <entry><type>boolean</type></entry>
247
+ <entry>
248
+ Commutator of the <literal><<%</literal> operator.
249
+ </entry>
250
+ </row>
199
251
<row>
200
252
<entry><type>text</type> <literal><-></literal> <type>text</type></entry>
201
253
<entry><type>real</type></entry>
223
275
Commutator of the <literal><<-></literal> operator.
224
276
</entry>
225
277
</row>
278
+ <row>
279
+ <entry>
280
+ <type>text</type> <literal><<<-></literal> <type>text</type>
281
+ </entry>
282
+ <entry><type>real</type></entry>
283
+ <entry>
284
+ Returns the <quote>distance</quote> between the arguments, that is
285
+ one minus the <function>strict_word_similarity()</function> value.
286
+ </entry>
287
+ </row>
288
+ <row>
289
+ <entry>
290
+ <type>text</type> <literal><->>></literal> <type>text</type>
291
+ </entry>
292
+ <entry><type>real</type></entry>
293
+ <entry>
294
+ Commutator of the <literal><<<-></literal> operator.
295
+ </entry>
296
+ </row>
226
297
</tbody>
227
298
</tgroup>
228
299
</table>
@@ -322,12 +393,19 @@ SELECT t, t <-> '<replaceable>word</replaceable>' AS dist
322
393
323
394
<para>
324
395
Also you can use an index on the <structfield>t</structfield> column for word
325
- similarity. For example :
396
+ similarity or strict word similarity . Typical queries are :
326
397
<programlisting>
327
398
SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
328
399
FROM test_trgm
329
400
WHERE '<replaceable>word</replaceable>' <% t
330
401
ORDER BY sml DESC, t;
402
+ </programlisting>
403
+ and
404
+ <programlisting>
405
+ SELECT t, strict_word_similarity('<replaceable>word</replaceable>', t) AS sml
406
+ FROM test_trgm
407
+ WHERE '<replaceable>word</replaceable>' <<% t
408
+ ORDER BY sml DESC, t;
331
409
</programlisting>
332
410
This will return all values in the text column for which there is a
333
411
continuous extent in the corresponding ordered trigram set that is
@@ -337,11 +415,17 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
337
415
</para>
338
416
339
417
<para>
340
- A variant of the above query is
418
+ Possible variants of the above queries are:
341
419
<programlisting>
342
420
SELECT t, '<replaceable>word</replaceable>' <<-> t AS dist
343
421
FROM test_trgm
344
422
ORDER BY dist LIMIT 10;
423
+ </programlisting>
424
+ and
425
+ <programlisting>
426
+ SELECT t, '<replaceable>word</replaceable>' <<<-> t AS dist
427
+ FROM test_trgm
428
+ ORDER BY dist LIMIT 10;
345
429
</programlisting>
346
430
This can be implemented quite efficiently by GiST indexes, but not
347
431
by GIN indexes.
0 commit comments