The decrease in the volume occupied by the data in PostgreSQL on disk

ноября 09, 2017

Typically, when compiling data structures and tables no bother order of columns. Actually, what's the point? If necessary, you can change the order of columns in the SELECT, so nothing to worry about? So, nothing to worry about, as the order of the columns can significantly influence the size of the table. Yes, the table size may depend on the order of the columns, even if the data is the same.

What is the explanation? There is such a thing as alignment data to the CPU, and it depends on the size of the low-level data structures. A conscious choice of the order of the columns makes it possible to optimize the size of the data. Don't believe? Let's try:

the

test=# CREATE TABLE t_test (
int i1,
i2 int,
i3 int,
v1 varchar(100),
v2 varchar(100),
v3 varchar(100)
);
CREATE TABLE

In this example, 6 columns. 3 integer column and 3 varchar column, one after the other. Add the table with 10 million rows:

the

test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd'
FROM generate_series(1, 10000000);
INSERT 0 10000000

Overall table size — 574 MB

the

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
574 MB
(1 row)

Try to change the location of the columns. In the following example, after the varchar column is integer column. It is repeated three times:

the

test=# CREATE TABLE t_test (
v1 varchar(100),
int i1,
v2 varchar(100),
i2 int,
v3 varchar(100),
i3 int
);
CREATE TABLE

Now add 10 million rows...

the

test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30
FROM generate_series(1, 10000000);
INSERT 0 10000000

... and the table considerably will increase:

the

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
651 MB
(1 row)

The data in the table has not changed – they are specially selected for illustration of this effect. If I write “abc” instead of “abcd”, we would not see the difference in size, but the string 4 characters do not fit in the smaller buffer.

the

Conclusion

An important conclusion that can be drawn from this experiment is packaging the same types of data next to each other definitely makes sense. In addition, I found that it makes sense to pack the columns integer to the beginning of the tables. Often this can add another few percent to the performance, simply due to the fact that the data structure is slightly less than if this is not done.

From the translator:

Author: Hans-Jürgen Schönig. The original is available at the link.

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express

The decrease in the volume occupied by the data in PostgreSQL on disk

Conclusion

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

Kiddy.me — diary of your baby

Tactoom. How about the middle of blogging?

SumIT Weekend of 18-19 February, the idea for iPad and Hackathon