Fun query with Django ORM Annotate and Query Expressions

there Was once a time when the Django ORM was considered very cute, but totally stupid. Although the ability to produce Annotate and Aggregate was in her from time immemorial. Version 1.8 added ability to use database functions within Query Expressions. And, of course, if a novice diarist was not afraid and read the entry to these lines, it can safely read on: the article is focused on beginners.


Some time ago I faced a problem: to select from the table values by user. Moreover, these values should match the specified regular expression. But this is not the end conditions of the selected expressions have to pull out a substring. Again, for the regular season. I did it pretty fast and wanted to share the experience with those who cannot use Annotate and Query Expressions in practice


Try to describe the situation more precisely:


we Have almost the standard model Users. Some users have different usernames. For example, manager, vasyaTheDirector, vovaProg, etc. But commercial users have names in the format {CountryCode}{RandomUniqueNumber}. For example, RU2525 or ES1672. Here we need to pull from the database all commercial users, but not to get all the information, and only the unique numbers without country codes.


the Problem, of course, interesting for beginners djangirov. Although, for developers, middle managers, it may not be quite typical.


we will Start with simple: to retrieve all users whose names begin with the two-letter country code, you can use a simple filter operation with a key __iregex a field name.


the
from django.contrib.auth import get_user_model

User = get_user_model()

queryset = User.objects.filter(username__iregex=r'^[A-Z]{2}\d+$')

Get this list:


[<User: RU123>, <User: RU124>, <User: RU125>, <User: EN123>, <User: EN124>, <User: EN125>, <User: EN126>, <User: UK123>, <User: UK124>, <User: UK1234>, <User: UK12345>]


More interesting. Django allows you to create annotations for the obtained values. For example, we need to count the number of Books that are associated with the User by ForeignKey. We can perform User.books.all()count () or get the value directly in a Queryset, using Annotate. We will announce the field books_count to be available to us as a property of the received instance of User, or as a dictionary key. Let's see how it will look not at the abstract example with books, and in the context of our task.


the
from django.db.models import Func

queryset = User.objects.annotate(username_index=Func()).filter(username__iregex=r'^[A-Z]{2}\d+$')

In Django there are different functions for annotation values. For Example, Max, Min, Avg, Count. They are part of the mechanism of Query Expressions. These special expressions can be used to describe the request and to change the values when they are received. With version 1.8 we have the opportunity to use built-in functions of the database. For example, we need to modify the received string. So, we will use the functions associated with regular expressions.


I am using PostgreSQL version 9.5, therefore I need to find a function that will get me the substring from a string. Find this function in the official documentation. The function is called: substring.


the
from django.db.models import Func, F, Value

queryset = User.objects.annotate(username_index=Func(F('username'), Value('(\d+)'), function='substring'))).filter(username__iregex=r'^[A-Z]{2}\d+$')

As you can see, the Func takes three arguments:


    the
  1. Wrapped in F() the name of the field that we modify (actually, the value of this field will be passed to substring)
  2. the
  3. Template on which occurs the substring
  4. the
  5. the Name of a function in PostgreSQL which will be transferred to the previous arguments

Well, we have to get the values in a list:


the
from django.db.models import Func, F, Value

queryset = User.objects.annotate(username_index=Func(F('username'), Value('(\d+)'), function='substring'))).filter(username__iregex=r'^[A-Z]{2}\d+$').values_list('username_index', flat=True)

Received the following output:


['123', '124', '125', '123', '124', '125', '126', '123', '124', '1234', '12345']


the
username__iregex=r'^[A-Z]{2}\d+$'


the
username__iregex=r'^EN\d+$'.

and now the most interesting. Do you think any SQL query executes our code?


the
SELECT substring("my_users_user"."username", (\d+)) AS "username_index" FROM "my_users_user WHERE my_users_user"."username"::text ~* ^[A-Z]{2}\d+$

As you can see, the query is beautiful and is in urgent need intensive care optimization.


Returning to the subject of DJango ORM problems identified in the beginning of this article, I want to emphasize that Annotate and Aggregate exist in Django for a very long time. And, it turns out, just not all know how to cook them. Although, the ability to perform the functions of the Database without writing SQL queries appeared relatively recently. And we can do even more beautiful things.


P. S.
If you want to get data in a specific format, you can modify the code as follows:


the
from django.db.models import IntegerField, ExpressionWrapper
from django.db.models import Func, F, Value

queryset = User.objects.annotate(username_index=ExpressionWrapper(Func(F('username'), Value('(\d+)'), function='substring'), output_field=IntegerField()))).filter(username__iregex=r'^[A-Z]{2}\d+$').values_list('username_index', flat=True)

the Output will be:


[123, 124, 125, 123, 124, 125, 126, 123, 124, 1234, 12345]


We wrapped Func() ExpressionWrapper and pointed to the expected data type in the output_field=IntegerField(). As a result, received a list of integers, not strings.

Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Tactoom. How about the middle of blogging?

SumIT Weekend of 18-19 February, the idea for iPad and Hackathon

Knowledge base. Part 2. Freebase: make requests to the Google Knowledge Graph