An example implementation of autocomplete using the FTS engine of PostgreSQL tsearch2

Introduction


Once upon a time I made a calendar for doing business, notes, and the recording of movements in different tasks. It was made originally on a bunch of PHP + Kohana 2 + PostgreSQL. Eventually I rewrote everything in Yii (the first and only version). For full-text search was involved the built-in PostgreSQL tsearch2 engine. For many years I used the system, it gradually developed and came to the conclusion that the volume of texts it has amassed a decent. The search has to be used very often to enhance its convenience, I decided to tie it to autocomplete from the JQuery UI package.

the

Implementation


To do it right, the choice of tips should be based on the same index as the search. All the texts I have stored in a separate table called "texts". Here its structure:

the
 Table "public.texts"
Column | Type | Modifiers 
-------------+-----------------------------+----------------------------------------------------------
txt_id | bigint | not null default nextval(('gen_txt_id'::text)::regclass)
user_id | integer | not null
txt | text | not null
fti_txt | tsvector | 
last_update | timestamp without time zone | default now()
format | textformat | default 'wiki'::textformat
Indexes:
"texts_pkey" PRIMARY KEY, btree (txt_id)
"texts_txt_id_key" UNIQUE CONSTRAINT, btree (txt_id)
"fti_texts_idx" gist (fti_txt)
"last_update_idx" btree (last_update)
"texts_uid_idx" btree (user_id)

For the task of forming the list of suggestions for the current search string was written Action in separate, connected action. The source protected/extensions/actions/SearchAutocompleteAction.php:

the
<?php

SearchAutocompleteAction class extends CAction
{
public $model;
public $attribute;
public $fts_field;

public function run()
{
// Initialize variables
$_uid = Yii::app()->user->id;
$_model = new $this->model;
$_tableName = $_model->tableName();

// Split the search query into words, separated from him the last word
// and store separately the word and the  rest  of the query
$_query_array = explode(' ', trim(Yii::app()->db->quoteValue($_GET['term']), " '\t\n\r\0\x0B"));
$_word = array_pop($_query_array);
$_preQuery = implode(' ', $_query_array);
$_suggestions = array();

/*
* The request of obtaining tsvector of the desired records. The recordset must belong to the current user
* and it includes only records that match the first part of the query (without the last word).
*/
$_sub_sql = "SELECT $this->fts_field FROM $_tableName WHERE user_id="$_uid"";
if (count($_query_array) > 0)
$_sub_sql .= "AND $this->fts_field @@ to_tsquery("English", "$_preQuery")";

/*
* The final query that returns the list of words to complement the last word of the query.
* Use the function ts_stat of tsearch2. It returns a list of the words in the records selected by the subquery above
* sorted in descending order of frequency of occurrence of words in texts. You can add the sorting attribute of the ndoc describing
* the number of documents where the word.
*/
$_sql = "SELECT word AS $this->attribute FROM ts_stat('$_sub_sql') WHERE word LIKE '$_word%' ORDER BY nentry DESC LIMIT 15;";

foreach(Yii::app()->db->createCommand($_sql)->query() as $_m)
$_suggestions[] = count($_query_array) > 0 ? $_preQuery.' '.$_m[$this->attribute] : $_m[$this- > attribute];

echo CJSON::encode($_suggestions);
}
}

To parse the sequence of actions here is an example SQL query for the search string "Hello hub", formed by Action-Ohm:

the
SELECT 
word AS txt 
FROM 
ts_stat('SELECT fti_txt FROM texts WHERE user_id="1" AND fti_txt @@ to_tsquery("English", "Hello")') 
WHERE 
word LIKE 'hub%' 
ORDER BY nentry DESC 
LIMIT 15;

The essence of tsearch2 in General is the creation of a record of type tsvector in addition to the text, in our example, this field fti_txt. In it are recorded the words of the text, indicating their positions and their appearance in the text. On this record construction index (gin or gist) and further searches. For debugging and monitoring the status of tsearch2 index is a function ts_stat. As a parameter it takes the text of a SQL query that returns a set of fields of type tsvector. For this set of statistics is constructed in the form of a list of words indicating the number of entries (nentry) and the number of documents (records) where the word occurs (ndoc).

In my example, if the word in the search query one — the search is similar to him in all user accounts. If the words in the query a few — the last word is removed from the query, the recordset is limited to full-text search on the first part of the query (without the last word).
the

joining the project


This part is Yii 1 specific, no magic here. Provides for the integrity of the notes. Just be two steps. The first step is the connection of Action-and to the controller, in my case DiaryController. To do this, its the actions() method add the line:

the
 public function actions()
{
return array(
...
'acsearch' => array(
'class' => 'application.extensions.actions.SearchAutocompleteAction',
'model' => 'Texts',
'attribute' => 'txt',
'fts_field' => 'fti_txt',
),
...
);
}

Now in the corresponding view model old search text box:

the
<?php echo CHtml::textField('sh', $search->sh, array('size' = > 60,'maxlength' = > 255)); ?>

JQuery UI widget:

the
 <?php $this->widget('zii.widgets.jui.CJuiAutoComplete', array(
'attribute'=> 'sh',
'sourceUrl' => array('acsearch'),
'name' => 'sh',
'value' => $search->sh,
'options' => array(
'minLength' = > '2',
),
'htmlOptions' => array(
'size' = > 60,
'maxlength' = > 255,
),
)); ?>

The result is something similar to the image:

image

the

Disadvantages


The whole system has one major drawback — the words in a field of type tsvector is written after stemming. Simply put, most of the words "cut off" the end to consider in the search for their different shapes. Look at the picture above and note the word "organized". Therefore, this solution is applicable for personal/internal use. Without a solution to this problem is to show this to people. Perhaps someone will find a decent solution or at least idea. Welcome to the comments.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Fresh hay from the cow, or 3000 icons submitted!

Knowledge base. Part 2. Freebase: make requests to the Google Knowledge Graph

Group edit the resources (documents) using MIGXDB