An example implementation of autocomplete using the FTS engine of PostgreSQL tsearch2
Introduction
Once upon a time I made a calendar for doing business, notes, and the recording of movements in different tasks. It was made originally on a bunch of PHP + Kohana 2 + PostgreSQL. Eventually I rewrote everything in Yii (the first and only version). For full-text search was involved the built-in PostgreSQL tsearch2 engine. For many years I used the system, it gradually developed and came to the conclusion that the volume of texts it has amassed a decent. The search has to be used very often to enhance its convenience, I decided to tie it to autocomplete from the JQuery UI package.
the
Implementation
To do it right, the choice of tips should be based on the same index as the search. All the texts I have stored in a separate table called "texts". Here its structure:
the
Table "public.texts"
Column | Type | Modifiers
-------------+-----------------------------+----------------------------------------------------------
txt_id | bigint | not null default nextval(('gen_txt_id'::text)::regclass)
user_id | integer | not null
txt | text | not null
fti_txt | tsvector |
last_update | timestamp without time zone | default now()
format | textformat | default 'wiki'::textformat
Indexes:
"texts_pkey" PRIMARY KEY, btree (txt_id)
"texts_txt_id_key" UNIQUE CONSTRAINT, btree (txt_id)
"fti_texts_idx" gist (fti_txt)
"last_update_idx" btree (last_update)
"texts_uid_idx" btree (user_id)
For the task of forming the list of suggestions for the current search string was written Action in separate, connected action. The source protected/extensions/actions/SearchAutocompleteAction.php:
the
<?php
SearchAutocompleteAction class extends CAction
{
public $model;
public $attribute;
public $fts_field;
public function run()
{
// Initialize variables
$_uid = Yii::app()->user->id;
$_model = new $this->model;
$_tableName = $_model->tableName();
// Split the search query into words, separated from him the last word
// and store separately the word and the rest of the query
$_query_array = explode(' ', trim(Yii::app()->db->quoteValue($_GET['term']), " '\t\n\r\0\x0B"));
$_word = array_pop($_query_array);
$_preQuery = implode(' ', $_query_array);
$_suggestions = array();
/*
* The request of obtaining tsvector of the desired records. The recordset must belong to the current user
* and it includes only records that match the first part of the query (without the last word).
*/
$_sub_sql = "SELECT $this->fts_field FROM $_tableName WHERE user_id="$_uid"";
if (count($_query_array) > 0)
$_sub_sql .= "AND $this->fts_field @@ to_tsquery("English", "$_preQuery")";
/*
* The final query that returns the list of words to complement the last word of the query.
* Use the function ts_stat of tsearch2. It returns a list of the words in the records selected by the subquery above
* sorted in descending order of frequency of occurrence of words in texts. You can add the sorting attribute of the ndoc describing
* the number of documents where the word.
*/
$_sql = "SELECT word AS $this->attribute FROM ts_stat('$_sub_sql') WHERE word LIKE '$_word%' ORDER BY nentry DESC LIMIT 15;";
foreach(Yii::app()->db->createCommand($_sql)->query() as $_m)
$_suggestions[] = count($_query_array) > 0 ? $_preQuery.' '.$_m[$this->attribute] : $_m[$this- > attribute];
echo CJSON::encode($_suggestions);
}
}
To parse the sequence of actions here is an example SQL query for the search string "Hello hub", formed by Action-Ohm:
the
SELECT
word AS txt
FROM
ts_stat('SELECT fti_txt FROM texts WHERE user_id="1" AND fti_txt @@ to_tsquery("English", "Hello")')
WHERE
word LIKE 'hub%'
ORDER BY nentry DESC
LIMIT 15;
The essence of tsearch2 in General is the creation of a record of type tsvector in addition to the text, in our example, this field fti_txt. In it are recorded the words of the text, indicating their positions and their appearance in the text. On this record construction index (gin or gist) and further searches. For debugging and monitoring the status of tsearch2 index is a function ts_stat. As a parameter it takes the text of a SQL query that returns a set of fields of type tsvector. For this set of statistics is constructed in the form of a list of words indicating the number of entries (nentry) and the number of documents (records) where the word occurs (ndoc).
In my example, if the word in the search query one — the search is similar to him in all user accounts. If the words in the query a few — the last word is removed from the query, the recordset is limited to full-text search on the first part of the query (without the last word).
the
joining the project
This part is Yii 1 specific, no magic here. Provides for the integrity of the notes. Just be two steps. The first step is the connection of Action-and to the controller, in my case DiaryController. To do this, its the actions() method add the line:
the
public function actions()
{
return array(
...
'acsearch' => array(
'class' => 'application.extensions.actions.SearchAutocompleteAction',
'model' => 'Texts',
'attribute' => 'txt',
'fts_field' => 'fti_txt',
),
...
);
}
Now in the corresponding view model old search text box:
the
<?php echo CHtml::textField('sh', $search->sh, array('size' = > 60,'maxlength' = > 255)); ?>
JQuery UI widget:
the
<?php $this->widget('zii.widgets.jui.CJuiAutoComplete', array(
'attribute'=> 'sh',
'sourceUrl' => array('acsearch'),
'name' => 'sh',
'value' => $search->sh,
'options' => array(
'minLength' = > '2',
),
'htmlOptions' => array(
'size' = > 60,
'maxlength' = > 255,
),
)); ?>
The result is something similar to the image:
the
Disadvantages
The whole system has one major drawback — the words in a field of type tsvector is written after stemming. Simply put, most of the words "cut off" the end to consider in the search for their different shapes. Look at the picture above and note the word "organized". Therefore, this solution is applicable for personal/internal use. Without a solution to this problem is to show this to people. Perhaps someone will find a decent solution or at least idea. Welcome to the comments.
Комментарии
Отправить комментарий