The decrease in the volume occupied by the data in PostgreSQL on disk

Typically, when compiling data structures and tables no bother order of columns. Actually, what's the point? If necessary, you can change the order of columns in the SELECT, so nothing to worry about? So, nothing to worry about, as the order of the columns can significantly influence the size of the table. Yes, the table size may depend on the order of the columns, even if the data is the same.

What is the explanation? There is such a thing as alignment data to the CPU, and it depends on the size of the low-level data structures. A conscious choice of the order of the columns makes it possible to optimize the size of the data. Don't believe? Let's try:

the
test=# CREATE TABLE t_test (
int i1,
i2 int,
i3 int,
v1 varchar(100),
v2 varchar(100),
v3 varchar(100)
);
CREATE TABLE

In this example, 6 columns. 3 integer column and 3 varchar column, one after the other. Add the table with 10 million rows:

the
test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd'
FROM generate_series(1, 10000000);
INSERT 0 10000000

Overall table size — 574 MB

the
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
574 MB
(1 row)

Try to change the location of the columns. In the following example, after the varchar column is integer column. It is repeated three times:

the
test=# CREATE TABLE t_test (
v1 varchar(100),
int i1,
v2 varchar(100),
i2 int,
v3 varchar(100),
i3 int
);
CREATE TABLE

Now add 10 million rows...

the
test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30
FROM generate_series(1, 10000000);
INSERT 0 10000000

... and the table considerably will increase:

the
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
651 MB
(1 row)

The data in the table has not changed – they are specially selected for illustration of this effect. If I write “abc” instead of “abcd”, we would not see the difference in size, but the string 4 characters do not fit in the smaller buffer.

the

Conclusion


An important conclusion that can be drawn from this experiment is packaging the same types of data next to each other definitely makes sense. In addition, I found that it makes sense to pack the columns integer to the beginning of the tables. Often this can add another few percent to the performance, simply due to the fact that the data structure is slightly less than if this is not done.

From the translator:

Author: Hans-Jürgen Schönig. The original is available at the link.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Tactoom. How about the middle of blogging?

SumIT Weekend of 18-19 February, the idea for iPad and Hackathon

Knowledge base. Part 2. Freebase: make requests to the Google Knowledge Graph