PostgreSQL HAVING子句

正文

了解GROUP BY和HAVING子句的工作原理可以帮助写出更加高效的SQL。这里用一个真实例子来说明这一点。

假设由2个表：country、invoice：

select * from country;
ctry_code ctry_name
US USA
CA Canada
CN China
.. ..
7 rows

select * from invoice;
invoice_nbr ctry_cd year_nbr month_nbr item_cnt invoice_amt
2014001 BR 2014 3 13 162875
2021172 CA 2021 10 200 1299355
2020435 CN 2020 1 12 145654
2016201 US 2016 8 900 7125125
2021662 US 2021 4 100 800135
2018743 MX 2018 11 5 76124
.. .. .. .. .. ..
4 million rows

我想写一个在invoice上进行聚合的SQL：

1) 对于每个country和year，除了USA外的所有countries

2) 对于每个country，除了USA外的所有countries

3) 对于每year，包括USA的

4) 总计，包括USA的

这是我们想要从查询中得到的：

这是给我们预期的一个结果。使用6个SQL用于6个country（USA除外）中的每个年度总计，6个SQL用于所有年份的国家统计，一个SQL用于所有country的年度统计，最有一个SQL用于全部总计。然后将这14个SQL于13个UNION操作组合在一起：

-- country, year totals
select c.ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name, i.year_nbr -- grouping by country and year
UNION
....
.... -- unions for each of the 6 countries excluding USA
UNION
-- country totals
select c.ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name = 'Mexico'
group by c.ctry_name -- grouping by just country
UNION
....
.... -- unions for each of the 6 countries excluding USA
UNION
-- totals for all countries by year
select 'All countries' as ctry_name, i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by i.year_nbr -- grouping by just year for all countries
UNION
-- totals for all countries and all years
select 'All countries' as ctry_name, null as year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)

该查询虽然返回了预期结果，但是难以维护，并且效率低下。从invoice中获取14次数据，从而进行14次聚合，最后执行一组昂贵的UNION操作。

理解group by和having后，我们可以写一个更简单更加高效的SQL。和WHERE子句过滤原始数据一样，我们也可以使用having从group by结果集中进行过滤。下面的SQL使用having过滤掉了USA的汇总行，产生contry和year组合的聚合：

select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
group by cube(c.ctry_name, i.year_nbr)
having coalesce(c.ctry_name,'x') != 'USA' -- exclude USA summary rows

这个查询在做什么？在group by中使用CUBE来获取：

1) 每个country的每一行，year组合

2) 每个country的一行，包括所有year的数据

3) 每个year的一行，包括所有country

4) 所有year、所有country的一行

然后使用having子句排除ctry_name是USA的所有行。这个查询从invoice表仅获取一次数据，比原始SQL快很多。

注意，country总计的有一个null year，同样，year总计的有一个null country。我们使用coalesce函数将all countries进行转换，在having中使用coalesce，不会删除country名为NULL的行。

在这个例子中，我们只需要在4个聚合组的2个中排除USA的数据。如果要求在所有汇总行中排除USA的数据，我们将在agg之前使用WHERE子句过滤USA数据，而不是在having子句中，下面的SQL满足条件：

select coalesce(c.ctry_name,'All countries') as ctry_name,
i.year_nbr,
sum(i.item_cnt) as tot_cnt,
sum(i.invoice_amt) as tot_amt
from country c
inner join invoice i on (i.ctry_code = c.ctry_code)
where c.ctry_name != 'USA' -- exclude USA in all summaries
group by cube(c.ctry_name, i.year_nbr)

GROUP BY支持cube、rollup，值得花时间更好地理解。

此示例基于的实际 sql 有 24 个小查询，union每个查询中结合了两个以上的表。与原始代码的 200 多行相比，重写的 sql 仅 10 行长，并且在几秒钟内运行，而原始查询则需要半个多小时。

原文

https://smallthingssql.com/having-a-less-understood-sql-clause

网站首页 > 基础教程正文

PostgreSQL HAVING子句

正文

原文

猜你喜欢

网站首页 > 基础教程 正文

PostgreSQL HAVING子句

正文

原文

猜你喜欢

网站首页 > 基础教程正文