Search in Postgres using ZomboDb and elasticsearch

At some point in the development of the project the question arose of the large number of texts. Moreover, the texts are of different length: from tweets to large articles. First, the main search engine was chosen as the built-in Postgres _tsvector. To search for simple rules it was quite enough. An array of texts grew with great speed, and search rules become more complex, so a built engine is not covered claims.


Yes, there is a sphinx, it has great integration with Postgres, but the goal was to find a solution to use elasticsearch with Postgres. Why? elasticsearch has shown good results in some case-Ah project. Yes, and were server with him to keep logs logstash. It was also the desire to find a tool that will take data synchronization.


as a result of all the vast network has been found, the project ZomboDb, which fit the requirements.


the project Page on github.


the

install the extension


This section is a paraphrase of the official instructions.


the currently Supported versions of the packages:


the the the
Package Version
Elasticsearch 1.7.1+ (not 2.0)
Postgres 9.3, 9.4, 9.5

My configuration is: Postgres 9.4, elasticsearch 1.7.5


    the
  1. With page you need to download and install a package with a plugin for Postgres (deb or rpm)
  2. the
  3. In postgresql.conf to add the line:


    the
    local_preload_libraries = 'zombodb.so"

  4. the
  5. Restart the database and create the extension:


    the
    psql db_name -c "CREATE EXTENSION zombodb;"

  6. the
  7. Next, with the same page you need to download a plugin for elasticsearch and install it:


    the
    bin/plugin -i zombodb -u file:///path/to/zombodb-plugin-X.X.X.zip

  8. the
  9. Add elasticsearch.yml:


    the
    threadpool.bulk.queue_size: 1024
    threadpool.bulk.size: 12
    http.max_content_length: 1024mb
    index.query.bool.max_clause_count: 1000000

  10. the
  11. to Restart elasticsearch.

the installation is finished.


the

Creating test index


Suppose there is a table which contains the tags:


the
CREATE TABLE public.tags (
id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('tags_id_seq'::regclass),
word CHARACTER VARYING(100) NOT NULL
);

Create index:


the
CREATE INDEX tags_zdb_search_idx
ON tags
USING zombodb (zdb('tags', tags.ctid), zdb(tags))
WITH (url='http://localhost:9200/');

the result of this query to create the index and the data go straight to elasticsearch.
A query that will find the word mom and daddy:


the
SELECT *
FROM tags
WHERE zdb('tags', ctid) ==> 'word:(mother,father)';

Where word is the name of the field that will be searched. Search is implemented using operator ==>.


Also, ZomboDb provides domains phrase and fulltext, based on the type text. Using your own domains, you can define a mapping to elasticsearch.


the

query Language


by using queries, you can search on individual fields of the indexed table, and all fields.
Queries support logical operations (and, or, not), parentheses.
It is possible to use various search operators. For example, the query


the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'text:dad';

where the operator is a colon, and returns texts that contain the word dad.
Also supports more like this and fuzzy like this using the operators@ and :@~, respectively.
Example:


the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> '(text:@папа and title:@мама) or text:aunt';

Also, there is support for the comparison operators:


the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'comments > 10';

a Detailed description of the query language in documentation.


the

Insights


the Project is a good product that works out of the box. Well documented, updated (the latest supported version of Postgres, the last commit at the time of writing, 27-day-old). If well and consistently will show itself in production, write a wrapper for sqlalchemy.


UPD Written in haste sqlalchemy extension

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Tactoom. How about the middle of blogging?

SumIT Weekend of 18-19 February, the idea for iPad and Hackathon

Knowledge base. Part 2. Freebase: make requests to the Google Knowledge Graph