聚合与分组
聚合函数和 GROUP BY 是数据统计分析的核心功能。
聚合函数
聚合函数对一组值进行计算并返回单个值。
常用聚合函数
| 函数 | 功能 | 示例 |
|---|---|---|
COUNT() | 计数 | COUNT(*), COUNT(column) |
SUM() | 求和 | SUM(salary) |
AVG() | 平均值 | AVG(salary) |
MAX() | 最大值 | MAX(salary) |
MIN() | 最小值 | MIN(salary) |
COUNT 计数
-- 统计总员工数
SELECT COUNT(*) AS total_employees FROM employees;
-- 统计有部门的员工数(NULL 不计入)
SELECT COUNT(department_id) AS employees_with_dept FROM employees;
-- 统计不同部门的数量
SELECT COUNT(DISTINCT department_id) AS dept_count FROM employees;SUM 求和
-- 计算薪资总和
SELECT SUM(salary) AS total_salary FROM employees;
-- 计算技术部薪资总和
SELECT SUM(salary) AS tech_salary
FROM employees
WHERE department_id = 1;AVG 平均值
-- 计算平均薪资
SELECT AVG(salary) AS avg_salary FROM employees;
-- 计算平均薪资(四舍五入)
SELECT ROUND(AVG(salary), 2) AS avg_salary FROM employees;MAX / MIN 最值
-- 查询最高和最低薪资
SELECT
MAX(salary) AS max_salary,
MIN(salary) AS min_salary,
MAX(salary) - MIN(salary) AS salary_gap
FROM employees;
-- 查询最早和最晚入职日期
SELECT
MIN(hire_date) AS first_hire,
MAX(hire_date) AS last_hire
FROM employees;组合使用
-- 综合统计
SELECT
COUNT(*) AS total,
SUM(salary) AS sum_salary,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM employees;GROUP BY 分组
GROUP BY 将数据按指定列分组,然后对每组应用聚合函数。
基本语法
SELECT 列, 聚合函数(列)
FROM 表名
GROUP BY 列;单列分组
-- 统计各部门员工数量
SELECT
department_id,
COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
-- 结合 JOIN 显示部门名称
SELECT
d.name AS department,
COUNT(e.id) AS employee_count,
AVG(e.salary) AS avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;多列分组
-- 按部门和年份统计入职人数
SELECT
department_id,
YEAR(hire_date) AS hire_year,
COUNT(*) AS count
FROM employees
GROUP BY department_id, YEAR(hire_date)
ORDER BY department_id, hire_year;GROUP BY 规则
重要规则:SELECT 中的非聚合列必须出现在 GROUP BY 中。
-- ✅ 正确
SELECT department_id, COUNT(*) FROM employees GROUP BY department_id;
-- ❌ 错误(name 没有在 GROUP BY 中)
SELECT department_id, name, COUNT(*) FROM employees GROUP BY department_id;
-- ✅ 修正:将 name 也加入 GROUP BY 或使用聚合函数
SELECT department_id, MAX(name), COUNT(*) FROM employees GROUP BY department_id;HAVING 过滤分组
HAVING 用于过滤分组后的结果(WHERE 用于过滤分组前的行)。
WHERE vs HAVING
| 子句 | 执行时机 | 过滤对象 | 能否使用聚合函数 |
|---|---|---|---|
| WHERE | 分组前 | 原始行 | 不能 |
| HAVING | 分组后 | 分组结果 | 能 |
-- 查询平均薪资大于 15000 的部门
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 15000;
-- WHERE 和 HAVING 配合使用
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
WHERE salary > 10000 -- 先过滤薪资大于 10000 的员工
GROUP BY department_id
HAVING AVG(salary) > 15000; -- 再过滤平均薪资大于 15000 的部门常见场景
-- 查询员工数大于 2 的部门
SELECT department_id, COUNT(*) AS count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 2;
-- 查询有订单的员工及其订单总额
SELECT
employee_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY employee_id
HAVING SUM(amount) > 10000;执行顺序
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
- FROM: 选择表
- WHERE: 过滤行
- GROUP BY: 分组
- HAVING: 过滤分组
- SELECT: 选择列和聚合
- ORDER BY: 排序
- LIMIT: 限制行数
WITH ROLLUP 汇总
ROLLUP 在分组结果后添加汇总行。
-- 各部门薪资统计 + 总计
SELECT
COALESCE(d.name, '总计') AS department,
COUNT(e.id) AS count,
SUM(e.salary) AS total_salary
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
GROUP BY d.name WITH ROLLUP;结果示例:
| department | count | total_salary |
|------------|-------|--------------|
| 人事部 | 1 | 10000 |
| 技术部 | 3 | 51000 |
| 市场部 | 2 | 30000 |
| 总计 | 7 | 102000 |
窗口函数(分析函数)
窗口函数在不减少行数的情况下进行聚合计算。
基本语法
函数() OVER (
[PARTITION BY 分组列]
[ORDER BY 排序列]
[ROWS/RANGE 窗口范围]
)排名函数
-- ROW_NUMBER: 连续排名(1, 2, 3, 4)
-- RANK: 跳跃排名(1, 2, 2, 4)
-- DENSE_RANK: 密集排名(1, 2, 2, 3)
SELECT
name,
department_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank_num,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num
FROM employees;
-- 各部门内薪资排名
SELECT
name,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
FROM employees;聚合窗口函数
-- 显示每个员工薪资及部门平均薪资
SELECT
name,
department_id,
salary,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
-- 累计求和
SELECT
name,
salary,
SUM(salary) OVER (ORDER BY id) AS running_total
FROM employees;LEAD / LAG 偏移函数
-- 查看前一个和后一个员工的薪资
SELECT
name,
salary,
LAG(salary, 1) OVER (ORDER BY id) AS prev_salary,
LEAD(salary, 1) OVER (ORDER BY id) AS next_salary
FROM employees;
-- 计算薪资环比变化
SELECT
name,
salary,
salary - LAG(salary, 1) OVER (ORDER BY id) AS salary_change
FROM employees;FIRST_VALUE / LAST_VALUE
-- 各部门最高薪资
SELECT
name,
department_id,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS dept_max_salary
FROM employees;综合示例
部门统计报表
SELECT
d.name AS department,
COUNT(e.id) AS employee_count,
ROUND(AVG(e.salary), 2) AS avg_salary,
MAX(e.salary) AS max_salary,
MIN(e.salary) AS min_salary,
SUM(e.salary) AS total_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING COUNT(e.id) > 0
ORDER BY total_salary DESC;Top N 查询
-- 各部门薪资 Top 2 的员工
SELECT * FROM (
SELECT
name,
department_id,
salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn <= 2;总结
| 功能 | 语法 | 说明 |
|---|---|---|
| 聚合函数 | COUNT/SUM/AVG/MAX/MIN | 对一组值计算返回单值 |
| GROUP BY | GROUP BY 列 | 按列分组 |
| HAVING | HAVING 条件 | 过滤分组结果 |
| 窗口函数 | OVER(PARTITION BY … ORDER BY …) | 不减少行的聚合 |
上一节:04-高级查询技巧 下一节:06-数据定义语言 — CREATE、ALTER、DROP