聚合与分组

聚合函数和 GROUP BY 是数据统计分析的核心功能。

聚合函数

聚合函数对一组值进行计算并返回单个值。

常用聚合函数

函数功能示例
COUNT()计数COUNT(*), COUNT(column)
SUM()求和SUM(salary)
AVG()平均值AVG(salary)
MAX()最大值MAX(salary)
MIN()最小值MIN(salary)

COUNT 计数

-- 统计总员工数
SELECT COUNT(*) AS total_employees FROM employees;
 
-- 统计有部门的员工数(NULL 不计入)
SELECT COUNT(department_id) AS employees_with_dept FROM employees;
 
-- 统计不同部门的数量
SELECT COUNT(DISTINCT department_id) AS dept_count FROM employees;

SUM 求和

-- 计算薪资总和
SELECT SUM(salary) AS total_salary FROM employees;
 
-- 计算技术部薪资总和
SELECT SUM(salary) AS tech_salary 
FROM employees 
WHERE department_id = 1;

AVG 平均值

-- 计算平均薪资
SELECT AVG(salary) AS avg_salary FROM employees;
 
-- 计算平均薪资(四舍五入)
SELECT ROUND(AVG(salary), 2) AS avg_salary FROM employees;

MAX / MIN 最值

-- 查询最高和最低薪资
SELECT 
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary,
    MAX(salary) - MIN(salary) AS salary_gap
FROM employees;
 
-- 查询最早和最晚入职日期
SELECT 
    MIN(hire_date) AS first_hire,
    MAX(hire_date) AS last_hire
FROM employees;

组合使用

-- 综合统计
SELECT 
    COUNT(*) AS total,
    SUM(salary) AS sum_salary,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary
FROM employees;

GROUP BY 分组

GROUP BY 将数据按指定列分组,然后对每组应用聚合函数。

基本语法

SELECT 列, 聚合函数(列)
FROM 表名
GROUP BY 列;

单列分组

-- 统计各部门员工数量
SELECT 
    department_id,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
 
-- 结合 JOIN 显示部门名称
SELECT 
    d.name AS department,
    COUNT(e.id) AS employee_count,
    AVG(e.salary) AS avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name;

多列分组

-- 按部门和年份统计入职人数
SELECT 
    department_id,
    YEAR(hire_date) AS hire_year,
    COUNT(*) AS count
FROM employees
GROUP BY department_id, YEAR(hire_date)
ORDER BY department_id, hire_year;

GROUP BY 规则

重要规则:SELECT 中的非聚合列必须出现在 GROUP BY 中。

-- ✅ 正确
SELECT department_id, COUNT(*) FROM employees GROUP BY department_id;
 
-- ❌ 错误(name 没有在 GROUP BY 中)
SELECT department_id, name, COUNT(*) FROM employees GROUP BY department_id;
 
-- ✅ 修正:将 name 也加入 GROUP BY 或使用聚合函数
SELECT department_id, MAX(name), COUNT(*) FROM employees GROUP BY department_id;

HAVING 过滤分组

HAVING 用于过滤分组后的结果(WHERE 用于过滤分组前的行)。

WHERE vs HAVING

子句执行时机过滤对象能否使用聚合函数
WHERE分组前原始行不能
HAVING分组后分组结果
-- 查询平均薪资大于 15000 的部门
SELECT 
    department_id,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 15000;
 
-- WHERE 和 HAVING 配合使用
SELECT 
    department_id,
    AVG(salary) AS avg_salary
FROM employees
WHERE salary > 10000           -- 先过滤薪资大于 10000 的员工
GROUP BY department_id
HAVING AVG(salary) > 15000;    -- 再过滤平均薪资大于 15000 的部门

常见场景

-- 查询员工数大于 2 的部门
SELECT department_id, COUNT(*) AS count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 2;
 
-- 查询有订单的员工及其订单总额
SELECT 
    employee_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount
FROM orders
GROUP BY employee_id
HAVING SUM(amount) > 10000;

执行顺序

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
  1. FROM: 选择表
  2. WHERE: 过滤行
  3. GROUP BY: 分组
  4. HAVING: 过滤分组
  5. SELECT: 选择列和聚合
  6. ORDER BY: 排序
  7. LIMIT: 限制行数

WITH ROLLUP 汇总

ROLLUP 在分组结果后添加汇总行。

-- 各部门薪资统计 + 总计
SELECT 
    COALESCE(d.name, '总计') AS department,
    COUNT(e.id) AS count,
    SUM(e.salary) AS total_salary
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id
GROUP BY d.name WITH ROLLUP;

结果示例:

| department | count | total_salary |
|------------|-------|--------------|
| 人事部     | 1     | 10000        |
| 技术部     | 3     | 51000        |
| 市场部     | 2     | 30000        |
| 总计       | 7     | 102000       |

窗口函数(分析函数)

窗口函数在不减少行数的情况下进行聚合计算。

基本语法

函数() OVER (
    [PARTITION BY 分组列]
    [ORDER BY 排序列]
    [ROWS/RANGE 窗口范围]
)

排名函数

-- ROW_NUMBER: 连续排名(1, 2, 3, 4)
-- RANK: 跳跃排名(1, 2, 2, 4)
-- DENSE_RANK: 密集排名(1, 2, 2, 3)
 
SELECT 
    name,
    department_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK() OVER (ORDER BY salary DESC) AS rank_num,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num
FROM employees;
 
-- 各部门内薪资排名
SELECT 
    name,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_rank
FROM employees;

聚合窗口函数

-- 显示每个员工薪资及部门平均薪资
SELECT 
    name,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS dept_avg,
    salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
 
-- 累计求和
SELECT 
    name,
    salary,
    SUM(salary) OVER (ORDER BY id) AS running_total
FROM employees;

LEAD / LAG 偏移函数

-- 查看前一个和后一个员工的薪资
SELECT 
    name,
    salary,
    LAG(salary, 1) OVER (ORDER BY id) AS prev_salary,
    LEAD(salary, 1) OVER (ORDER BY id) AS next_salary
FROM employees;
 
-- 计算薪资环比变化
SELECT 
    name,
    salary,
    salary - LAG(salary, 1) OVER (ORDER BY id) AS salary_change
FROM employees;

FIRST_VALUE / LAST_VALUE

-- 各部门最高薪资
SELECT 
    name,
    department_id,
    salary,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS dept_max_salary
FROM employees;

综合示例

部门统计报表

SELECT 
    d.name AS department,
    COUNT(e.id) AS employee_count,
    ROUND(AVG(e.salary), 2) AS avg_salary,
    MAX(e.salary) AS max_salary,
    MIN(e.salary) AS min_salary,
    SUM(e.salary) AS total_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
HAVING COUNT(e.id) > 0
ORDER BY total_salary DESC;

Top N 查询

-- 各部门薪资 Top 2 的员工
SELECT * FROM (
    SELECT 
        name,
        department_id,
        salary,
        DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn <= 2;

总结

功能语法说明
聚合函数COUNT/SUM/AVG/MAX/MIN对一组值计算返回单值
GROUP BYGROUP BY 列按列分组
HAVINGHAVING 条件过滤分组结果
窗口函数OVER(PARTITION BY … ORDER BY …)不减少行的聚合

上一节04-高级查询技巧 下一节06-数据定义语言 — CREATE、ALTER、DROP


sql 聚合 分组 窗口函数