在Django中,检查空查询集最有效的方法是什么?

[英]In Django, what is the most efficient way to check for an empty query set?


I've heard suggestions to use the following:

我听过以下建议:

if qs.exists():
    ...

if qs.count():
    ...

try:
    qs[0]
except IndexError:
    ...

Copied from comment below: "I'm looking for a statement like "In MySQL and PostgreSQL count() is faster for short queries, exists() is faster for long queries, and use QuerySet[0] when it's likely that you're going to need the first element and you want to check that it exists. However, when count() is faster it's only marginally faster so it's advisable to always use exists() when choosing between the two."

从下面的注释复制:“我正在寻找一个语句”,在MySQL中,PostgreSQL count()对于短查询更快,exist()对于长查询更快,当您可能需要第一个元素并希望检查它是否存在时,使用QuerySet[0]。但是,当count()更快时,它只是稍微快一点,所以在选择两者之间时总是使用exists()是明智的。

5 个解决方案

#1


14  

query.exists() is the most efficient way.

exist()是最有效的方法。

Especially on postgres count() can be very expensive, sometimes more expensive then a normal select query.

特别是在postgres count()上,它可能非常昂贵,有时比普通的select查询更昂贵。

exists() runs a query with no select_related, field selections or sorting and only fetches a single record. This is much faster then counting the entire query with table joins and sorting.

exist()运行一个查询,没有select_related、字段选择或排序,只获取一条记录。这比通过表连接和排序来计算整个查询要快得多。

qs[0] would still includes select_related, field selections and sorting; so it would be more expensive.

qs[0]仍然包含select_related、字段选择和排序;所以会更贵。

The Django source code is here (django/db/models/sql/query.py RawQuery.has_results):

Django源代码在这里(Django /db/模型/sql/查询)。py RawQuery.has_results):

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py#L499

https://github.com/django/django/blob/60e52a047e55bc4cd5a93a8bd4d07baed27e9a22/django/db/models/sql/query.py L499

def has_results(self, using):
    q = self.clone()
    if not q.distinct:
        q.clear_select_clause()
    q.clear_ordering(True)
    q.set_limits(high=1)
    compiler = q.get_compiler(using=using)
    return compiler.has_results()

Another gotcha that got me the other day is invoking a QuerySet in an if statement. That executes and returns the whole query !

另一个让我在前几天遇到的问题是在if语句中调用一个QuerySet。执行并返回整个查询!

If the variable query_set may be None (unset argument to your function) then use:

如果变量query_set可能为None(函数的未设置参数),则使用:

if query_set is None:
    # 

not:

不是:

if query_set:
   # you just hit the database

#2


13  

It looks like qs.count() and qs.exists() are effectively equivalent. Therefore I have not discovered a reason to use exists() over count(). The latter is not slower and it can be used to check for both existence and length. It's possible that both exists() and count() evaluate to the same query in MySQL.

它看起来像qs.count()和qs.exists()是等效的。因此,我没有发现使用exist()胜过count()的原因。后者并不慢,它可以用来检查存在性和长度。可能存在()和count()都对MySQL中的相同查询求值。

Only use qs[0]if you actually need the object. It's significantly slower if you're just testing for existence.

如果您确实需要该对象,请仅使用qs[0]。如果你只是为了生存而测试的话,速度会慢得多。

On Amazon SimpleDB, 400,000 rows:

在Amazon SimpleDB上,有400,000行:

  • bare qs: 325.00 usec/pass
  • 裸露的qs:325.00美国铀浓缩公司/通过
  • qs.exists(): 144.46 usec/pass
  • qs.exists():144.46购买/通过
  • qs.count() 144.33 usec/pass
  • qs.count(144.33)购买/通过
  • qs[0]: 324.98 usec/pass
  • qs[0]:324.98美国铀浓缩公司/通过

On MySQL, 57 rows:

在MySQL,57行:

  • bare qs: 1.07 usec/pass
  • 裸露的qs:1.07美国铀浓缩公司/通过
  • qs.exists(): 1.21 usec/pass
  • qs.exists():1.21购买/通过
  • qs.count(): 1.16 usec/pass
  • qs.count():1.16购买/通过
  • qs[0]: 1.27 usec/pass
  • qs[0]:1.27美国铀浓缩公司/通过

I used a random query for each pass to reduce the risk of db-level caching. Test code:

我对每次传递都使用一个随机查询来降低db级缓存的风险。测试代码:

import timeit

base = """
import random
from plum.bacon.models import Session
ip_addr = str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))+'.'+str(random.randint(0,256))
try:
    session = Session.objects.filter(ip=ip_addr)%s
    if session:
        pass
except:
    pass
"""

query_variatons = [
    base % "",
    base  % ".exists()",
    base  % ".count()",
    base  % "[0]"
    ]

for s in query_variatons:
    t = timeit.Timer(stmt=s)
    print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100000)

#3


7  

It depends on use context.

这取决于使用上下文。

According to documentation:

根据文档:

Use QuerySet.count()

...if you only want the count, rather than doing len(queryset).

…如果您只想要计数,而不是执行len(queryset)。

Use QuerySet.exists()

...if you only want to find out if at least one result exists, rather than if queryset.

…如果您只想知道是否存在至少一个结果,而不是查询集。

But:

但是:

Don't overuse count() and exists()

If you are going to need other data from the QuerySet, just evaluate it.

如果需要查询集中的其他数据,只需计算它。

So, I think that QuerySet.exists() is the most recommended way if you just want to check for an empty QuerySet. On the other hand, if you want to use results later, it's better to evaluate it.

所以,如果您只想检查一个空的QuerySet,那么我认为QuerySet.exist()是最推荐的方法。另一方面,如果您希望以后使用结果,最好对其进行评估。

I also think that your third option is the most expensive, because you need to retrieve all records just to check if any exists.

我还认为您的第三个选项是最昂贵的,因为您需要检索所有记录,以检查是否存在任何记录。

#4


4  

@Sam Odio's solution was a decent starting point but there's a few flaws in the methodology, namely:

@Sam Odio的解决方案是一个不错的起点,但在方法论上有一些缺陷,即:

  1. The random IP address could end up matching 0 or very few results
  2. 随机IP地址最终可能匹配0或很少的结果
  3. An exception would skew the results, so we should aim to avoid handling exceptions
  4. 异常会扭曲结果,因此我们应该避免处理异常

So instead of filtering something that might match, I decided to exclude something that definitely won't match, hopefully still avoiding the DB cache, but also ensuring the same number of rows.

因此,我决定不过滤可能匹配的内容,而是排除肯定不匹配的内容,希望仍然避免使用DB缓存,但同时确保相同的行数。

I only tested against a local MySQL database, with the dataset:

我只对本地MySQL数据库进行了测试,数据集如下:

>>> Session.objects.all().count()
40219

Timing code:

计时代码:

import timeit
base = """
import random
import string
from django.contrib.sessions.models import Session
never_match = ''.join(random.choice(string.ascii_uppercase) for _ in range(10))
sessions = Session.objects.exclude(session_key=never_match){}
if sessions:
    pass
"""
s = base.format('count')

query_variations = [
    "",
    ".exists()",
    ".count()",
    "[0]",
]

for variation in query_variations:
    t = timeit.Timer(stmt=base.format(variation))
    print "{} => {:02f} usec/pass".format(variation.ljust(10), 1000000 * t.timeit(number=100)/100000)

outputs:

输出:

           => 1390.177710 usec/pass
.exists()  => 2.479579 usec/pass
.count()   => 22.426991 usec/pass
[0]        => 2.437079 usec/pass

So you can see that count() is roughly 9 times slower than exists() for this dataset.

所以您可以看到,对于这个数据集,count()比exist()慢了大约9倍。

[0] is also fast, but it needs exception handling.

[0]也很快,但是它需要异常处理。

#5


1  

I would imagine that the first method is the most efficient way (you could easily implement it in terms of the second method, so perhaps they are almost identical). The last one requires actually getting a whole object from the database, so it is almost certainly the most expensive.

我认为第一种方法是最有效的方法(您可以根据第二种方法轻松地实现它,因此它们可能几乎是相同的)。最后一个实际上需要从数据库中获取整个对象,因此它几乎肯定是最昂贵的。

But, like all of these questions, the only way to know for your particular database, schema and dataset is to test it yourself.

但是,像所有这些问题一样,了解特定数据库、模式和数据集的惟一方法是自己测试它。

智能推荐

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.itdaan.com/blog/2011/07/29/1aadcb5a5c1751426b79f5918bb4d30e.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告