Search in Postgres using ZomboDb and elasticsearch
At some point in the development of the project the question arose of the large number of texts. Moreover, the texts are of different length: from tweets to large articles. First, the main search engine was chosen as the built-in Postgres _tsvector. To search for simple rules it was quite enough. An array of texts grew with great speed, and search rules become more complex, so a built engine is not covered claims.
Yes, there is a sphinx, it has great integration with Postgres, but the goal was to find a solution to use elasticsearch with Postgres. Why? elasticsearch has shown good results in some case-Ah project. Yes, and were server with him to keep logs logstash. It was also the desire to find a tool that will take data synchronization.
as a result of all the vast network has been found, the project ZomboDb, which fit the requirements.
the project Page on github.
the
install the extension
This section is a paraphrase of the official instructions.
the currently Supported versions of the packages:
Package | Version |
---|---|
Elasticsearch | 1.7.1+ (not 2.0) |
Postgres | 9.3, 9.4, 9.5 |
My configuration is: Postgres 9.4, elasticsearch 1.7.5
-
the
- With page you need to download and install a package with a plugin for Postgres (deb or rpm) the
In postgresql.conf to add the line:
thelocal_preload_libraries = 'zombodb.so"
the Restart the database and create the extension:
thepsql db_name -c "CREATE EXTENSION zombodb;"
the Next, with the same page you need to download a plugin for elasticsearch and install it:
thebin/plugin -i zombodb -u file:///path/to/zombodb-plugin-X.X.X.zip
the Add elasticsearch.yml:
thethreadpool.bulk.queue_size: 1024 threadpool.bulk.size: 12 http.max_content_length: 1024mb index.query.bool.max_clause_count: 1000000
the - to Restart elasticsearch.
the installation is finished.
the
Creating test index
Suppose there is a table which contains the tags:
the
CREATE TABLE public.tags (
id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('tags_id_seq'::regclass),
word CHARACTER VARYING(100) NOT NULL
);
Create index:
the
CREATE INDEX tags_zdb_search_idx
ON tags
USING zombodb (zdb('tags', tags.ctid), zdb(tags))
WITH (url='http://localhost:9200/');
the result of this query to create the index and the data go straight to elasticsearch.
A query that will find the word mom and daddy:
the
SELECT *
FROM tags
WHERE zdb('tags', ctid) ==> 'word:(mother,father)';
Where word is the name of the field that will be searched. Search is implemented using operator ==>.
Also, ZomboDb provides domains phrase and fulltext, based on the type text. Using your own domains, you can define a mapping to elasticsearch.
the
query Language
by using queries, you can search on individual fields of the indexed table, and all fields.
Queries support logical operations (and, or, not), parentheses.
It is possible to use various search operators. For example, the query
the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'text:dad';
where the operator is a colon, and returns texts that contain the word dad.
Also supports more like this and fuzzy like this using the operators@ and :@~, respectively.
Example:
the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> '(text:@папа and title:@мама) or text:aunt';
Also, there is support for the comparison operators:
the
SELECT *
FROM texts
WHERE zdb('texts', ctid) ==> 'comments > 10';
a Detailed description of the query language in documentation.
the
Insights
the Project is a good product that works out of the box. Well documented, updated (the latest supported version of Postgres, the last commit at the time of writing, 27-day-old). If well and consistently will show itself in production, write a wrapper for sqlalchemy.
UPD Written in haste sqlalchemy extension
Комментарии
Отправить комментарий