SQL绝对被人低估了,至少被我低估了。SQL是那种你可以投入少量时间却能获得较大收益的技术,也就是说边际效用极大。SQL具有很强的稳定性,尽管随着时间的推移有不少新的版本发布,但是基本功能几乎没怎么变化,两个主要版本SQL92和SQL99距今都二三十年了。SQL的稳定性能够让你一次性投入而长时间获得收益,相比某些前端技术或某些短命的框架而言,简直是一本万利的买卖。SQL又具有一定的难度,对于某些复杂操作,或者某些常见的坑(比如三值逻辑运算),不经过一定时间的学习,很难轻松掌握和面对。

如果把编程范式粗略地划分成命令式和函数式的话,SQL应该归于函数式这一类;但SQL处理的对象是集合,其数学基础是关系代数和集合论。这一点让SQL跟其他语言非常不同,你没法从其他编程经验中直接获得SQL的编写经验,也就是说,SQL是编程世界中比较特立独行的一个分支,需要花费时间认真对待。

下面以例子解释SQL的编写,主要参考日本作者Mick的《SQL进阶教程(第2版)》,一个容易被人看不上的非常土的书名,但其实内容极好,豆瓣9.1评分。下面的所有例子和SQL均在MySQL8上执行验证过。

[例1]有一城市人口表如下

DROP TABLE IF EXISTS dec_popu;
CREATE TABLE dec_popu (
  id INT PRIMARY KEY AUTO_INCREMENT,
  city VARCHAR (64) NOT NULL,
  gender VARCHAR(1) NOT NULL DEFAULT 'M',
  population INT NOT NULL DEFAULT 0
);

INSERT INTO dec_popu(city, gender, population) VALUES('贵阳市', 'M' , 26714);
INSERT INTO dec_popu(city, gender, population) VALUES('贵阳市', 'F' , 23134);
INSERT INTO dec_popu(city, gender, population) VALUES('六枝特区', 'M' , 3026);
INSERT INTO dec_popu(city, gender, population) VALUES('六枝特区', 'F' , 2926);
INSERT INTO dec_popu(city, gender, population) VALUES('德州市', 'M' , 23256);
INSERT INTO dec_popu(city, gender, population) VALUES('青岛市', 'M' , 24770);
INSERT INTO dec_popu(city, gender, population) VALUES('潍坊市', 'M' , 32005);
INSERT INTO dec_popu(city, gender, population) VALUES('潍坊市', 'F' , 12000);
INSERT INTO dec_popu(city, gender, population) VALUES('乐山市', 'M' , 19200);
INSERT INTO dec_popu(city, gender, population) VALUES('乐山市', 'F' , 10800);
id city gender population
1 贵阳市 M 26714
2 贵阳市 F 23134
3 六枝特区 M 3026
4 六枝特区 F 2926
5 德州市 M 23256
6 青岛市 M 24770
7 潍坊市 M 32005
8 潍坊市 F 12000
9 乐山市 M 19200
10 乐山市 F 10800
11 长春市 F 5230

要求:虽然表格中并没有省份信息,但是需要按照省份统计总人口。省份作为一种常识存在。这里练习使用CASE子句。

SELECT CASE
  WHEN city IN ('贵阳市','六枝特区') THEN '贵州'
  WHEN city IN ('德州市','青岛市','潍坊市' ) THEN '山东'
  WHEN city IN ('乐山市') THEN '四川'
  ELSE '未知'
  END AS province,
  SUM(population) AS Total
FROM dec_popu
GROUP BY province;

要求:以城市/男性人口/女性人口的格式统计数据。这里练习把CASE放到SUM函数中。

SELECT city,
  SUM(CASE WHEN gender = 'M' THEN population ELSE 0 END) AS population_male,
  SUM(CASE WHEN gender = 'F' THEN population ELSE 0 END) AS population_female
FROM dec_popu
GROUP BY city;

要求:按性别统计,表格中各性别总人数,山东省各性别总人数,山东省下面各市各性别总人数。

SELECT gender
  , SUM(population) AS Total
  , SUM(CASE WHEN city IN ('德州市') THEN population ELSE 0 END) AS DeZhouShi
  , SUM(CASE WHEN city IN ('青岛市') THEN population ELSE 0 END) AS QingDaoShi
  , SUM(CASE WHEN city IN ('潍坊市') THEN population ELSE 0 END) AS WeiFangShi
  , SUM(CASE WHEN city IN ('德州市','青岛市','潍坊市') 
        THEN population ELSE 0 END) AS ShanDongSheng
FROM dec_popu
GROUP BY gender;

要求:如果城市中只有1种性别的人口记录,那么选它,如果有2种性别的人口记录,那么就选男性的人口数据。这里可以看出CASE子句极大的灵活性。

SELECT city, 
  CASE WHEN COUNT(*) = 1 THEN MAX(population) 
  ELSE MAX(CASE WHEN gender = 'M' THEN population ELSE -1 END) END AS PopuMain
FROM dec_popu
GROUP BY city;

[例2]下面是课程表以及课程时间表

DROP TABLE IF EXISTS dec_course;
CREATE TABLE dec_course (
  course_id INT PRIMARY KEY AUTO_INCREMENT,
  course_name VARCHAR (32) NOT NULL
);

INSERT INTO dec_course(course_name) VALUES('计算机基础');
INSERT INTO dec_course(course_name) VALUES('会计学');
INSERT INTO dec_course(course_name) VALUES('财务管理学');
INSERT INTO dec_course(course_name) VALUES('英语');

DROP TABLE IF EXISTS dec_course_schedule;
CREATE TABLE dec_course_schedule (
  id INT PRIMARY KEY AUTO_INCREMENT,
  course_id INT NOT NULL,
  schedule_month  VARCHAR(10)
);

INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-05');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-06');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-07');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(2, '2022-06');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(2, '2022-07');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(3, '2022-05');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(4, '2022-07');
id course_id schedule_month
1 1 2022-05
2 1 2022-06
3 1 2022-07
4 2 2022-06
5 2 2022-07
6 3 2022-05
7 4 2022-07

要求:把每项课程在每个月是否开课表示出来。这里使用case把课程时间的行信息,转为列信息。

SELECT course_id, course_name,
  CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-05') THEN 'Yes' ELSE '-' END AS 'May',
  CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-06') THEN 'Yes' ELSE '-' END AS 'June',
  CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-07') THEN 'Yes' ELSE '-' END AS 'July'
FROM dec_course;

另一种实现方式:

SELECT c.course_name,
  CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-05') THEN 'Yes' ELSE '-' END AS May,
  CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-06') THEN 'Yes' ELSE '-' END AS June,
  CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-07') THEN 'Yes' ELSE '-' END AS July
FROM dec_course c;

第三种方法:

SELECT schedule.course_id, course.course_name,
  CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-05' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'May',
  CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-06' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'JUNE',
  CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-07' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'JULY'
FROM dec_course_schedule schedule 
LEFT OUTER JOIN dec_course course
  ON schedule.course_id = course.course_id
GROUP BY course_id;

[例3]下面是学生与他们多次模拟考试成绩的一个记录表,成绩记录在列中

DROP TABLE IF EXISTS dec_student_scores;
CREATE TABLE dec_student_scores (
  name VARCHAR (16) NOT NULL,
  score1 int,
  score2 int,
  score3 int
);

INSERT INTO dec_student_scores VALUES('sam', 123, 202, 153);
INSERT INTO dec_student_scores VALUES('andy', 251, 185, 132);
INSERT INTO dec_student_scores VALUES('tom', 214, 207, 111);
INSERT INTO dec_student_scores VALUES('judy', 203, 223, 198);
name score1 score2 score3
sam 123 202 153
andy 251 185 132
tomy 214 207 111
judy 203 223 198

要求:找出各个同学的最高分数。这个可以直接使用MySQL支持的GREATEST(score1, score2, score3)函数实现,但我们尝试使用CASE的方式。

SELECT name, 
  CASE WHEN score_bigger > score3 THEN score_bigger ELSE score3 END AS max_score 
FROM (
  SELECT name, score3, 
    CASE WHEN score1 > score2 THEN score1 ELSE score2 END AS score_bigger
  FROM dec_student_scores
) tmp;

另一种实现方式:

SELECT name, 
  CASE WHEN score1 > (CASE WHEN score2 > score3 THEN score2 ELSE score3 END)
  THEN score1
  ELSE (CASE WHEN score2 > score3 THEN score2 ELSE score3 END) END AS max_score
FROM dec_student_scores;

要求:将各位同学的成绩以sam, judy, tom, andy的顺序排序。这里练习将CASE子句放到ORDER BY之中。

SELECT name, score1, score2, score3
FROM dec_student_scores
ORDER BY (
  CASE name 
    WHEN 'sam' THEN 1
    WHEN 'judy' THEN 2
    WHEN 'tom' THEN 3
    WHEN 'andy' THEN 4
  ELSE NULL 
  END  
);

要求:求表中name的笛卡尔积,去除(a, a)这样的元组。使用CROSS JOIN来求笛卡尔积。

SELECT A.name AS name1, B.name AS name2
FROM dec_student_scores A  
CROSS JOIN dec_student_scores B
  ON A.name <> B.name
ORDER BY A.name;

更进一步,要求(a, b)和(b, a) 这样的数据,只保留一条。这其实是在找两个元素的组合。

SELECT A.name AS name1, B.name AS name2
FROM dec_student_scores A 
INNER JOIN dec_student_scores B
  ON A.name < B.name
ORDER BY A.name;

[例4]下面是各个服务器在某些日期的工作负载

 
DROP TABLE IF EXISTS dec_server_loads;
CREATE TABLE dec_server_loads (
  server VARCHAR (4) NOT NULL,
  sample_date date NOT NULL,
  loads int NOT NULL default 0
);

insert into dec_server_loads VALUES ('A', '2022-05-01', 1000);
insert into dec_server_loads VALUES ('A', '2022-05-02', 1200);
insert into dec_server_loads VALUES ('A', '2022-05-03', 1300);
insert into dec_server_loads VALUES ('A', '2022-05-04', 900);
insert into dec_server_loads VALUES ('A', '2022-05-05', 1220);
insert into dec_server_loads VALUES ('B', '2022-05-01', 1100);
insert into dec_server_loads VALUES ('B', '2022-05-02', 1200);
insert into dec_server_loads VALUES ('B', '2022-05-06', 1500);
insert into dec_server_loads VALUES ('B', '2022-05-07', 2000);
insert into dec_server_loads VALUES ('B', '2022-05-12', 1000);
insert into dec_server_loads VALUES ('C', '2022-05-01', 2000);
insert into dec_server_loads VALUES ('C', '2022-05-03', 1300);
insert into dec_server_loads VALUES ('C', '2022-05-04', 1200);
insert into dec_server_loads VALUES ('C', '2022-05-05', 1600);
server sample_date loads
A 2022-05-01 1000
A 2022-05-02 1200
A 2022-05-03 1300
A 2022-05-04 900
A 2022-05-05 1220
B 2022-05-01 1100
B 2022-05-02 1200
B 2022-05-06 1500
B 2022-05-07 2000
B 2022-05-12 1000
C 2022-05-01 2000
C 2022-05-03 1300
C 2022-05-04 1200
C 2022-05-05 1600

要求:每条记录都有一个日期,如何把这个日期的前一天找出来?这里需要用到Window函数。

SELECT server, sample_date,
  MIN(sample_date) OVER (
    PARTITION BY server ORDER BY sample_date ASC 
    ROWS between 1 preceding AND 1 preceding
  ) AS last_date
FROM dec_server_loads;

更进一步,把这个日期找出来,并且把这个日期对应的负载数据查出来。这里可以把Window单独定义出来,这样可以在SQL查询中直接使用,保持SQL的简洁。

SELECT server, sample_date, loads,
  MIN(sample_date) OVER W AS last_date,
  MIN(loads) OVER W AS last_loads
FROM dec_server_loads
Window W AS (PARTITION BY server ORDER BY sample_date ASC 
  ROWS between 1 preceding AND 1 preceding);

[例5]下面是一个产品表,里面包含产品信息和价格信息

DROP TABLE IF EXISTS dec_fruits;
CREATE TABLE dec_fruits (
  id INT PRIMARY KEY AUTO_INCREMENT,
  fruit_name VARCHAR(32) NOT NULL,
  price DECIMAL(10,2) 
);

INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Watermelon', 2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Banana', 5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Banana', 5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 8);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 8.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Watermelon', 2.2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Grape', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Orange', 12);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Carrot', 2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Tomato', 1);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Cherry', 20);
id fruit_name price
1 Apple 7.50
2 Watermelon 2.00
3 Banana 5.00
4 Apple 7.50
5 Apple 7.50
6 Banana 5.00
7 Apple 8.00
8 Apple 8.50
9 Watermelon 2.20
10 Grape 7.50
11 Orange 12.00
12 Carrot 2.00
13 Tomato 1.00
14 Cherry 20.00

要求:上面表格中有多个重复数据(fruit_name & price),选出那些重复的数据,选出id较大的作为重复数据。

SELECT * FROM dec_fruits p1
WHERE id > (
  SELECT MIN(p2.id) 
  FROM dec_fruits p2 
  WHERE p1.fruit_name = p2.fruit_name 
    AND p1.price = p2.price
);

要求:选出重复的数据,选出id较小的作为重复数据。练习使用EXCEPT来做集合的减法。

SELECT id FROM dec_fruits
EXCEPT
SELECT MAX(id) FROM dec_fruits
GROUP BY fruit_name, price;

要求:只要是重复了就选出来,比如有2条数据重复了,把这2条都选出来,而不是像上面那样只选出一条。这里注意EXISTS的用法,EXISTS非常神奇。

SELECT * FROM dec_fruits p1
WHERE EXISTS (
  SELECT 1 FROM (
    SELECT fruit_name, price, COUNT(*) AS cnt FROM dec_fruits 
    GROUP BY fruit_name, price
  ) p2
  WHERE p2.cnt > 1 
    AND p1.fruit_name = p2.fruit_name 
    AND p1.price = p2.price
);

要求:对所有产品按价格从高到低排序,注意这里有2种排序方式,如果第一名有2个,下一名是第三名的叫一般rank,下一名是第二的叫dense_rank.

SELECT fruit_name, price, 
  rank() over (ORDER BY price DESC) AS rank_1,
  dense_rank() over (ORDER BY price DESC) AS rank_2_dense
FROM dec_fruits;

另外一种实现方式,这里巧妙的使用了Count和Distinct Count作为排序序号。

SELECT p1.fruit_name, p1.price,
  (SELECT COUNT(p2.price) FROM dec_fruits p2 WHERE p2.price > p1.price) + 1 
    AS rank_1,
  (SELECT COUNT(DISTINCT p2.price) FROM dec_fruits p2 WHERE p2.price > p1.price) + 1 
    AS rank_2_dense
FROM dec_fruits p1
ORDER BY rank_1;

[例6]下面定义了一个用户表

DROP TABLE IF EXISTS dec_user;
CREATE TABLE dec_user  ( 
  id INT PRIMARY KEY AUTO_INCREMENT,
  nickname VARCHAR(32) NOT NULL,
  age int,
  address VARCHAR(200) DEFAULT ''
);

INSERT INTO dec_user(nickname, age, address) VALUES ('Liubei', 34, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Guanyu', 32, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Zhangfei', 31, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Caocao', 35, 'Wei');
INSERT INTO dec_user(nickname, age, address) VALUES ('Zhangliao', 32, 'Wei');
INSERT INTO dec_user(nickname, address) VALUES ('Guojia', 'Wei');
id nickname age address
1 Liubei 34 Shu
2 Guanyu 32 Shu
3 Zhangfei 31 Shu
4 Caocao 35 Wei
5 Zhangliao 32 Wei
6 Guojia null Wei

要求:每个用户都有一个Age,查询蜀国中有某个年龄的人而在魏国中没有这个年龄的人。正常应该查出id 是 1和3的蜀国用户,年龄分别为31和34. 由于魏国有年龄字段是NULL的用户,所以下面的NOT IN 方式并不能得到正确结果。

--incorrect due to NULL age in 'Wei'
SELECT * FROM dec_user
WHERE address = 'Shu'
  AND age NOT IN (
    SELECT age FROM dec_user
    WHERE address = 'Wei'
  );

正确的做法应该使用NOT EXISTS而不是NOT IN:

SELECT * FROM dec_user u1
WHERE address = 'Shu'
  AND NOT EXISTS (
    SELECT * FROM dec_user u2 
    WHERE u1.age = u2.age AND u2.address = 'Wei'
  );

注意:使用ORDER BY Age,NULL的数据将会排在前面。使用COALESCE函数将NULL数据转换为其他数据。

SELECT id, nickname, age FROM dec_user 
ORDER BY age;

SELECT id, nickname, COALESCE(age, -1) FROM dec_user;

注意:使用concat函数连接一个字符串和NULL,将会得到NULL

SELECT concat('Hello ', NULL) AS result FROM dual;

[例7]下面是一个会议出席人员表

DROP TABLE IF EXISTS dec_meeting;
CREATE TABLE dec_meeting (
  meeting_id CHAR(4) not null,
  person VARCHAR(32) not null
);

INSERT INTO dec_meeting VALUES('M1', 'ZhangFei');
INSERT INTO dec_meeting VALUES('M1', 'LuSu');
INSERT INTO dec_meeting VALUES('M1', 'Hu Dong');
INSERT INTO dec_meeting VALUES('M2', 'ZhangFei');
INSERT INTO dec_meeting VALUES('M2', 'XiaHouYuan');
INSERT INTO dec_meeting VALUES('M3', 'LuSu');
INSERT INTO dec_meeting VALUES('M3', 'Hu Dong');
INSERT INTO dec_meeting VALUES('M3', 'XiaHouYuan');
meeting_id person
M1 ZhangFei
M1 LuSu
M1 Hu Dong
M2 ZhangFei
M2 XiaHouYuan
M3 LuSu
M3 Hu Dong
M3 XiaHouYuan

要求:选出每次会议缺席的人。以这个表的Distinct person作为所有人员的集合。这里使用这个表自身的CROSS JOIN得到所有会议与所有人员的一个笛卡尔积,里面当然有重复的。使用这个集合去减原始的参会人员表,就会得到未参会人员表。这里可以看出来,NOT EXISTS与集合减法(EXCEPT)概念相近。

SELECT DISTINCT T1.meeting_id, T2.person 
FROM dec_meeting T1 CROSS JOIN dec_meeting T2  
WHERE NOT EXISTS (
  SELECT 1 FROM dec_meeting T3
  WHERE T1.meeting_id = T3.meeting_id 
    AND T2.person = T3.person
);

于是,你直接使用EXCEPT也可以:

SELECT DISTINCT T1.meeting_id, T2.person 
FROM dec_meeting T1 CROSS JOIN dec_meeting T2  
EXCEPT SELECT meeting_id, person FROM dec_meeting;

[例8]下面是又一个学生科目与成绩的表格

DROP TABLE IF EXISTS dec_test_scores;
CREATE TABLE dec_test_scores (
  student_id INT NOT NULL,
  gender CHAR(1) NOT NULL,
  subject VARCHAR (16) NOT NULL,
  score int
);

INSERT INTO dec_test_scores VALUES (1, 'F', '数学', 100);
INSERT INTO dec_test_scores VALUES (1, 'F', '语文', 80);
INSERT INTO dec_test_scores VALUES (1, 'F', '理化', 80);
INSERT INTO dec_test_scores VALUES (2, 'F', '数学', 80);
INSERT INTO dec_test_scores VALUES (2, 'F', '语文', 95);
INSERT INTO dec_test_scores VALUES (3, 'M', '数学', 40);
INSERT INTO dec_test_scores VALUES (3, 'M', '语文', 90);
INSERT INTO dec_test_scores VALUES (3, 'M', '社会', 55);
INSERT INTO dec_test_scores VALUES (4, 'M', '数学', 80);
INSERT INTO dec_test_scores VALUES (5, 'F', '数学', 56);
INSERT INTO dec_test_scores VALUES (5, 'F', '语文', 68);
INSERT INTO dec_test_scores VALUES (5, 'F', '社会', 88);
INSERT INTO dec_test_scores VALUES (6, 'M', '数学', 99);
INSERT INTO dec_test_scores VALUES (6, 'M', '社会', 55); 
INSERT INTO dec_test_scores VALUES (7, 'M', '数学', 92);
INSERT INTO dec_test_scores VALUES (7, 'M', '语文', 85);
student_id gender subject score
1 F 数学 100
1 F 语文 80
1 F 理化 80
2 F 数学 80
2 F 语文 95
3 M 数学 40
3 M 语文 90
3 M 社会 55
4 M 数学 80
5 F 数学 56
5 F 语文 68
5 F 社会 88
6 M 数学 99
6 M 社会 55
7 M 数学 92
7 M 语文 85

要求:查找学生,学生的全部科目成绩均大于等于60. 此处依然练习使用NOT EXISTS.

SELECT DISTINCT student_id 
FROM dec_test_scores t1
WHERE NOT EXISTS (
  SELECT 1 FROM dec_test_scores t2
  WHERE t1.student_id = t2.student_id
    AND t2.score < 60
);

要求:查找学生,学生拥有语文和数学成绩,并且语文成绩>=60, 数学成绩>=80.

SELECT student_id FROM dec_test_scores t1
WHERE subject IN ('数学', '语文') 
  AND NOT EXISTS (
    SELECT 1 FROM dec_test_scores t2
    WHERE t1.student_id = t2.student_id
      AND 1 = CASE WHEN subject = '数学' AND score < 80 THEN 1 
          WHEN subject = '语文' AND score < 60 THEN 1 
          ELSE 0 END
  )
GROUP BY student_id
HAVING COUNT(1) > 1;

ChatGPT给出的答案:

SELECT student_id
FROM dec_test_scores
WHERE subject IN ('语文', '数学')
  GROUP BY student_id HAVING COUNT(DISTINCT subject) = 2
  AND MIN(CASE WHEN subject = '数学' THEN score END) >= 80
  AND MIN(CASE WHEN subject = '语文' THEN score END) >= 60;

要求:哪个分数是出现次数最多的,出现几次? 深入体验GROUP BY的法.

SELECT score, COUNT(*) AS max_cnt 
FROM dec_test_scores  
GROUP BY score
HAVING max_cnt >= (
  SELECT MAX(cnt) FROM (
    SELECT COUNT(*) AS cnt FROM dec_test_scores 
    GROUP BY score
  ) T1
);

要求:哪门课程,其中至少75%的学生成绩大于等于80分?

SELECT subject FROM dec_test_scores
GROUP by subject 
HAVING SUM(CASE WHEN score >= 80 then 1 else 0 END) >= COUNT(*) * 3 / 4;

要求:哪门课程,男生及格的人数大于女生及格人数?

SELECT subject FROM dec_test_scores
GROUP BY subject
HAVING SUM(CASE WHEN score >=60 AND gender = 'M' THEN 1 ELSE 0 END) 
  > SUM(CASE WHEN score >=60 AND gender = 'F' THEN 1 ELSE 0 END) ;

要求:哪门课程,女生的平均分大于男生的平均分?注意:此处else部分用NULL而不是0,用NULL可以确保在计算avg时候不包含在内;即求avg时,对于NULL的记录,不会包含在分母里。

SELECT subject FROM dec_test_scores
GROUP BY subject
HAVING AVG(CASE WHEN gender = 'F' THEN score ELSE NULL END) 
  > AVG(CASE WHEN gender = 'M' THEN score ELSE NULL END);

[例9]下面是一个seq表

DROP TABLE IF EXISTS dec_seq;
CREATE TABLE dec_seq (
  seq int NOT NULL,
  name VARCHAR(16) NOT NULL default ''
);

要求:快速插入1~100这100个数字.

INSERT INTO dec_seq(seq)
SELECT 1 + ones.id + 10 * tens.id AS number 
FROM (
    SELECT 0 AS id
    UNION SELECT 1  
    UNION SELECT 2 
    UNION SELECT 3  
    UNION SELECT 4  
    UNION SELECT 5  
    UNION SELECT 6  
    UNION SELECT 7  
    UNION SELECT 8  
    UNION SELECT 9
  ) ones CROSS JOIN (
    SELECT 0 AS id
    UNION SELECT 1  
    UNION SELECT 2 
    UNION SELECT 3  
    UNION SELECT 4  
    UNION SELECT 5  
    UNION SELECT 6  
    UNION SELECT 7  
    UNION SELECT 8  
    UNION SELECT 9
  ) tens;

[例10]下面是城市仓库储存原材料的表,列是城市,入库日期和材料

DROP TABLE IF EXISTS dec_materials;
CREATE TABLE dec_materials(
  center VARCHAR(16) not null,
  in_date date, 
  material VARCHAR(4)
);

INSERT INTO dec_materials VALUES ('Beijing', '2018-04-01', '锡');
INSERT INTO dec_materials VALUES ('Beijing', '2018-04-12', '锌');
INSERT INTO dec_materials VALUES ('Beijing', '2018-05-17', '铝');
INSERT INTO dec_materials VALUES ('Beijing', '2018-05-20', '锌');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-20', '铜');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-22', '镍');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-29', '铅');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-03-15', '钛');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-04-01', '钢');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-04-24', '钢');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-05-02', '镁');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-05-10', '钛');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-10', '锌');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-28', '锡');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-29', '铝');
center in_date material
Beijing 2018-04-01
Beijing 2018-04-12
Beijing 2018-05-17
Beijing 2018-05-20
Shanghai 2018-04-20
Shanghai 2018-04-22
Shanghai 2018-04-29
Tianjin 2018-03-15
Tianjin 2018-04-01
Tianjin 2018-04-24
Tianjin 2018-05-02
Tianjin 2018-05-10
Dalian 2018-05-10
Dalian 2018-05-28
Dalian 2018-05-29

要求:哪些城市对某种原料有多于1次的入库?观察此处Count(Distinct AAA)的巧妙用法:

SELECT center FROM DEC_MATERIALS
GROUP BY center
HAVING COUNT(material) > COUNT(DISTINCT material);

--Another way
SELECT center, 
  CASE WHEN COUNT(material) > COUNT(DISTINCT material) 
    THEN 'Duplicated Material' ELSE 'No' END AS duplicated
FROM DEC_MATERIALS
GROUP BY center;

要求:哪些中心储存的原材料完全一样?

WITH citym AS (
  SELECT DISTINCT center, material FROM dec_materials
)
SELECT citymt1.center AS city1, '-' , citymt2.center AS city2 
FROM citym citymt1, citym citymt2
WHERE citymt1.center < citymt2.center 
  AND citymt1.material = citymt2.material
GROUP BY citymt1.center, citymt2.center
HAVING COUNT(*) = (SELECT COUNT(*) FROM citym citymt3 WHERE citymt1.center = citymt3.center)
   AND COUNT(*) = (SELECT COUNT(*) FROM citym citymt4 WHERE citymt2.center = citymt4.center);

[例11]下面是各个商店以及商店所售卖商品的表

DROP TABLE IF EXISTS dec_items;
CREATE TABLE dec_items(
  item varchar(16) not null  
);
INSERT INTO dec_items VALUES ('啤酒');
INSERT INTO dec_items VALUES ('纸尿裤');
INSERT INTO dec_items VALUES ('自行车');

DROP TABLE IF EXISTS dec_shop_items;
CREATE TABLE dec_shop_items(
  shop varchar(8) not null,
  item varchar(16)   
);
INSERT INTO dec_shop_items VALUES ('A', '啤酒'); 
INSERT INTO dec_shop_items VALUES ('A', '纸尿裤'); 
INSERT INTO dec_shop_items VALUES ('A', '自行车'); 
INSERT INTO dec_shop_items VALUES ('A', '漫画书'); 
INSERT INTO dec_shop_items VALUES ('B', '啤酒');
INSERT INTO dec_shop_items VALUES ('B', '纸尿裤'); 
INSERT INTO dec_shop_items VALUES ('B', '自行车'); 
INSERT INTO dec_shop_items VALUES ('C', '电视'); 
INSERT INTO dec_shop_items VALUES ('C', '纸尿裤'); 
INSERT INTO dec_shop_items VALUES ('C', '自行车'); 
INSERT INTO dec_shop_items VALUES ('D', '台灯'); 
INSERT INTO dec_shop_items VALUES ('D', '牙膏'); 
shop item
A 啤酒
A 纸尿裤
A 自行车
A 漫画书
B 啤酒
B 纸尿裤
B 自行车
C 电视
C 纸尿裤
C 自行车
D 台灯
D 牙膏

要求:哪所商店售卖items表中的所有商品?

SELECT s.shop FROM dec_shop_items s 
INNER JOIN dec_items i
  ON s.item = i.item
GROUP BY s.shop
HAVING COUNT(s.item) = (SELECT COUNT(item) FROM dec_items);

要求:哪所商店不多不少刚好拥有items表中的所有商品?

SELECT s.shop FROM dec_shop_items s 
LEFT OUTER JOIN dec_items i
  ON s.item = i.item 
GROUP BY s.shop
HAVING COUNT(s.item) = (SELECT COUNT(item) FROM dec_items)
   AND COUNT(i.item) = (SELECT COUNT(item) FROM dec_items);

要求:列出所有商店的已经拥有的在items表中的商品数量,以及还差的数量?这里还是使用left join,在计算COUNT时,NULL值不算。

SELECT shop, 
  COUNT(item) AS shop_having_count, 
  (SELECT COUNT(*) FROM dec_items) - COUNT(item) AS diff_count 
FROM (
  SELECT s.shop, i.item FROM dec_shop_items s 
  LEFT OUTER JOIN dec_items i
    ON s.item = i.item
) t1
GROUP BY shop ;

[例12]下面是来自不同学院的学生他们提交论文的日期的表

DROP TABLE IF EXISTS dec_submission;
CREATE TABLE dec_submission (
  student_id char(4) PRIMARY KEY,
  dpt varchar(32) not null, 
  sbmt_date date 
);
INSERT INTO dec_submission VALUES('100', '理学院', '2018-10-01');
INSERT INTO dec_submission VALUES('101', '理学院', '2018-09-22');
INSERT INTO dec_submission(student_id, dpt) VALUES('102', '文学院');
INSERT INTO dec_submission VALUES('103', '文学院', '2018-09-10');
INSERT INTO dec_submission VALUES('200', '文学院', '2018-09-22'); 
INSERT INTO dec_submission(student_id, dpt) VALUES('201', '工学院');
INSERT INTO dec_submission VALUES('202', '经济学院', '2018-09-25');
student_id dpt sbmt_date
100 理学院 2018-10-01
101 理学院 2018-09-22
102 文学院 null
103 文学院 2018-09-10
200 文学院 2018-09-22
201 工学院 null
202 经济学院 2018-09-25

要求:哪个学院的学生全部在9月份提交?

SELECT dpt FROM dec_submission
GROUP BY dpt
HAVING COUNT(*) = SUM(
  CASE WHEN sbmt_date between '2018-09-01' and '2018-09-30' THEN 1 
  ELSE 0 END
);

[例13]下表是销售表,包含销售年份和销售数量

DROP TABLE IF EXISTS dec_sale;
CREATE TABLE dec_sale (
  sale_year int,
  sale int
);

INSERT INTO dec_sale VALUES(2010, 50);
INSERT INTO dec_sale VALUES(2011, 51);
INSERT INTO dec_sale VALUES(2012, 52);
INSERT INTO dec_sale VALUES(2013, 52);
INSERT INTO dec_sale VALUES(2014, 50);
INSERT INTO dec_sale VALUES(2015, 50);
INSERT INTO dec_sale VALUES(2016, 49);
INSERT INTO dec_sale VALUES(2017, 55);

要求:统计本年度与上年度的销售记录之差,因为表中的年份是连续的,所以这里使用了按年份与年份+1的左连接:

select s1.sale_year, 
  s1.sale as current_sale, 
  s2.sale as last_year_sale,
  s1.sale - s2.sale as diff
FROM dec_sale s1 LEFT OUTER JOIN dec_sale s2
ON s1.sale_year = s2.sale_year + 1;

另外,很自然地,我们可以使用Window函数的方式:

SELECT sale_year, 
  sale AS current_sale, 
  SUM(sale) OVER (ORDER BY sale_year rows between 1 preceding AND 1 preceding)
    AS last_year_sale,
  sale - SUM(sale) OVER (ORDER BY sale_year rows between 1 preceding AND 1 preceding) 
    AS diff
FROM dec_sale;

上面SQL的窗口定义使用的是 rows between…and…, 使用这种方式可以不必在乎年份是否连续,它只考虑它物理上临近的上一行即可;还可以使用 range between…and…, 使用range 则表明使用值来限定,这就要求sale_year 必须连续,如果不连续,则会出现找不到上年的情况。比如我们update 一下该表:

UPDATE dec_sale SET sale_year = 2019 WHERE sale_year = 2015;
SELECT * FROM dec_sale ORDER BY sale_year;
sale_year sale
2010 50
2011 51
2012 52
2013 52
2014 50
2016 49
2017 55
2019 50

这种情况下,使用上面的rows between…and… 没有问题,但如果使用 range between…and…则会出现空的行。仔细体会两种不同窗口定义的区别。

sale_year sale last_year_sale diff
2010 50 null null
2011 51 50 1
2012 52 51 1
2013 52 52 0
2014 50 52 -2
2016 49 null null
2017 55 49 6
2019 50 null null

[例14]下面是一个产品表,包括产品类别以及产品价格

DROP TABLE IF EXISTS dec_products;
CREATE TABLE dec_products (
  product_id char(4),
  product_name varchar(32),
  product_type varchar(16),  
  price DECIMAL(10,2)
);

INSERT INTO dec_products VALUES('0001', 'T-shirt', 'Clothes', 50);
INSERT INTO dec_products VALUES('0002', 'Printer', 'Office', 800);
INSERT INTO dec_products VALUES('0003', 'Sports shirt', 'Clothes', 120);
INSERT INTO dec_products VALUES('0004', 'Knife', 'Kitchen', 69);
INSERT INTO dec_products VALUES('0005', 'Pot', 'Kitchen', 339);
INSERT INTO dec_products VALUES('0006', 'Chopsticks', 'Kitchen', 16);
INSERT INTO dec_products VALUES('0007', 'Board', 'Kitchen', 88);
INSERT INTO dec_products VALUES('0008', 'Pen', 'Office', 2);
product_id product_name product_type price
0001 T-shirt Clothes 50.00
0002 Printer Office 800.00
0003 Sports shirt Clothes 120.00
0004 Knife Kitchen 69.00
0005 Pot Kitchen 339.00
0006 Chopsticks Kitchen 16.00
0007 Board Kitchen 88.00
0008 Pen Office 2.00

要求:对于每个产品类别,找出其中大于该类别平均价格的产品:

SELECT * FROM dec_products A
WHERE price > (SELECT avg(price)
  FROM dec_products B
  WHERE A.product_type = B.product_type
  GROUP BY product_type
);

同样,可以使用窗口函数:

SELECT product_name, product_type, price, avg_price
FROM (
  SELECT product_name, product_type, price,
    avg(price) OVER (partition by product_type) AS avg_price  
  FROM dec_products 
) ALT1
WHERE price > avg_price;

要求:对所有产品按类别和名称排序后,取出第2~6条,这通常是一个分页取数据的需求:

SELECT product_name, product_type, price, row_num, total_count 
FROM (
  SELECT product_name, product_type, price,
    ROW_NUMBER() OVER(ORDER BY product_type, product_name) AS row_num,
    COUNT(*) OVER () AS total_count
  FROM dec_products ) T1
WHERE row_num between 2 AND 6;

[例15]下面是一个旅馆预定表,里面包含预定开始和结束日期

DROP TABLE IF EXISTS dec_reservations;
CREATE TABLE dec_reservations (
  reserver char(16) not null,
  start_date DATE,
  end_date DATE
);

INSERT INTO dec_reservations VALUES ('Zhang', '2018-10-26', '2018-10-27');
INSERT INTO dec_reservations VALUES ('Wang', '2018-10-28', '2018-10-31');
INSERT INTO dec_reservations VALUES ('Li', '2018-10-31', '2018-11-01');
INSERT INTO dec_reservations VALUES ('Zhao', '2018-11-03', '2018-11-04');
INSERT INTO dec_reservations VALUES ('Qian', '2018-11-03', '2018-11-05');
INSERT INTO dec_reservations VALUES ('Sun', '2018-11-04', '2018-11-06');
reserver start_date end_date
Zhang 2018-10-26 2018-10-27
Wang 2018-10-28 2018-10-31
Li 2018-10-31 2018-11-01
Zhao 2018-11-03 2018-11-04
Qian 2018-11-03 2018-11-05
Sun 2018-11-04 2018-11-06

要求:哪些是重复预订的,重复预订是指预订日期重叠。

SELECT * FROM dec_reservations A
WHERE EXISTS(
 SELECT * FROM dec_reservations B
 WHERE A.reserver <> B.reserver
   AND (A.start_date between B.start_date AND B.end_date
   OR A.end_date between  B.start_date AND B.end_date)
);

另一种解决方案,这种方式会显示预订冲突的两个人:

SELECT * FROM ( 
  SELECT reserver, start_date, end_date, 
    MAX(start_date) OVER (ORDER BY start_date rows between 1 following AND 1 following) 
      AS next_start_date,
    MAX(reserver) OVER (ORDER BY start_date rows between 1 following AND 1 following) 
      AS next_reserver
  FROM dec_reservations) T1
WHERE next_start_date between start_date AND end_date;

[例16]这是一个按日期记录的账户余额表

DROP TABLE IF EXISTS dec_accounts;
CREATE TABLE dec_accounts (
  prc_date DATE,
  prc_amt Decimal(10,2)
);

INSERT INTO dec_accounts VALUES ('2018-10-26', 100);
INSERT INTO dec_accounts VALUES ('2018-10-28', 200);
INSERT INTO dec_accounts VALUES ('2018-10-31', 300);
INSERT INTO dec_accounts VALUES ('2018-11-03', 400);
INSERT INTO dec_accounts VALUES ('2018-11-04', -500);
INSERT INTO dec_accounts VALUES ('2018-11-06', 600);
INSERT INTO dec_accounts VALUES ('2018-11-11', 700);
prc_date prc_amt
2018-10-26 100.00
2018-10-28 200.00
2018-10-31 300.00
2018-11-03 400.00
2018-11-04 -500.00
2018-11-06 600.00
2018-11-11 700.00

要求:查询当前行以及前2行这总共3行的平均值。

SELECT prc_date, prc_amt, AVG(prc_amt) 
  OVER (ORDER BY prc_date ROWS BETWEEN 2 preceding AND current ROW) 
  AS avg_amt
FROM dec_accounts;

可以看出,使用window函数可以很轻松地实现这个目标,如果不使用window函数该怎么做呢?

----define a temporary table acct with ROW_NUM
WITH acct AS ( 
  SELECT (@row_number:=@row_number + 1) AS seq, t.prc_date, t.prc_amt
  FROM dec_accounts t, (SELECT @row_number:=0) AS r
  ORDER BY prc_date
)
SELECT alt.*, CASE 
  WHEN alt.amt2 is NULL THEN alt.amt1
  WHEN alt.amt3 is NULL THEN (alt.amt1 + alt.amt2)/2
  ELSE (alt.amt1 + alt.amt2 + alt.amt3)/3 END AS avg_amt
FROM (SELECT acct.seq, 
    acct.prc_date, 
	acct.prc_amt AS amt1, 
	acct2.prc_amt AS amt2,
	acct3.prc_amt AS amt3
  FROM acct LEFT OUTER JOIN acct acct2
    ON acct.seq = acct2.seq + 1
  LEFT OUTER JOIN acct acct3
   ON acct.seq = acct3.seq +2
) alt;

上面,我们利用with首先定义一个tmp表,我们需要给每行加上行号,然后通过行号来做左外连接,这个方式比之前window方式麻烦地多。还有一种写法稍微简单一些、但逻辑比较复杂的:

SELECT prc_date, A1.prc_amt, (
  SELECT avg(prc_amt) FROM dec_accounts A2
  WHERE A1.prc_date >= A2.prc_date
    AND(
      SELECT COUNT(*) FROM dec_accounts A3 
      WHERE A3.prc_date between A2.prc_date AND A1.prc_date) <=3) AS mvg_SUM
FROM dec_accounts A1
ORDER BY prc_date;

[例17]有如下的家庭信息表,这个表中的员工子女信息存放在各个列中;为了方便,我们把子女信息存为多个行,所以建立了一个view

DROP TABLE IF EXISTS dec_family_info;
CREATE TABLE dec_family_info (
  name varchar(32),
  child_1 varchar(32),
  child_2 varchar(32),
  child_3 varchar(32)
);
INSERT INTO dec_family_info VALUES ('嘉鸿', '小阳', '小卡', '小光');
INSERT INTO dec_family_info VALUES ('赤加', '阿春', '阿夏', NULL);
INSERT INTO dec_family_info VALUES ('谢非', '海海', NULl , NULL);
INSERT INTO dec_family_info VALUES ('龙田', NULL, NULL, NULL);

DROP VIEW IF EXISTS dec_v_family_info;
CREATE VIEW dec_v_family_info(child) as 
  SELECT chd from(
    SELECT child_1 as chd from dec_family_info
    UNION
    SELECT child_2 as chd from dec_family_info
    UNION
    SELECT child_3 as chd from dec_family_info
  ) tmp
  WHERE tmp.chd IS NOT NULL;
name child_1 child_2 child_3
嘉鸿 小阳 小卡 小光
赤加红 阿春 阿夏 null
谢非 海海 null null
龙田 null null null

所以,如何把各个子女转换成行信息?下面演示 列 => 行:

SELECT name, child_1 as chd from dec_family_info WHERE child_1 IS NOT NULL
UNION
SELECT name, child_2 as chd from dec_family_info WHERE child_2 IS NOT NULL
UNION
SELECT name, child_3 as chd from dec_family_info WHERE child_3 IS NOT NULL;

或者,通过之前创建的VIEW来:

SELECT p.name, vc.child
FROM dec_family_info p LEFT OUTER JOIN dec_v_family_info vc
ON vc.child IN (p.child_1, p.child_2, p.child_3);

[例18]我们有两个物品名单,包含id和name,其中id是准确的,但name可能有变更

DROP TABLE IF EXISTS dec_namea;
CREATE TABLE dec_namea (
  id int PRIMARY KEY,
  name varchar(32)
);
DROP TABLE IF EXISTS dec_nameb;
CREATE TABLE dec_nameb (
  id int PRIMARY KEY,
  name varchar(32)
);
INSERT INTO dec_namea VALUES(1, '大众途观');
INSERT INTO dec_namea VALUES(2, '东风大货');
INSERT INTO dec_namea VALUES(3, '丰田皮卡');
INSERT INTO dec_nameb VALUES(1, '大众途观L');
INSERT INTO dec_nameb VALUES(2, '东风大货');
INSERT INTO dec_nameb VALUES(4, '特斯拉 Model Y');

List 1:

id name
1 大众途观
2 东风大货
3 丰田皮卡

List 2:

id name
1 大众途观L
2 东风大货
4 特斯拉 Model Y

要求:查找name有变更的物品,以及单独只在list 1或list 2中的物品。这里可以使用全外连接(FULL OUTER JOIN)加id相同条件即可,由于MySQL并不支持全外连接,所以我们使用左连接UNION右连接来实现全连接:

SELECT * FROM  (
  SELECT a.id AS id, a.name AS a_name, b.name AS b_name
  FROM dec_namea a LEFT OUTER JOIN dec_nameb b
  ON a.id = b.id
  UNION
  SELECT b.id AS id, a.name AS a_name, b.name AS b_name
  FROM dec_namea a right outer join dec_nameb b
  ON a.id = b.id
) ALT1
WHERE a_name IS NULL OR b_name IS NULL OR a_name <> b_name;

要求:使用表B的数据更新表A数据,即如果数据存在B而不存在A则insert,如果同时存在但name不同,则使用B的name:

INSERT INTO dec_namea
SELECT id, name FROM dec_nameb
  ON DUPLICATE KEY UPDATE dec_namea.name = dec_nameb.name;

[例19]有下面员工技能表以及项目技能需求表

DROP TABLE IF EXISTS dec_emp_skills;
CREATE TABLE dec_emp_skills (
  emp varchar(32), 
  skill varchar(32)
);

INSERT INTO dec_emp_skills VALUES('王田', 'Oracle');
INSERT INTO dec_emp_skills VALUES('王田', 'UNIX');
INSERT INTO dec_emp_skills VALUES('王田', 'Java');
INSERT INTO dec_emp_skills VALUES('王田', 'MySQL');
INSERT INTO dec_emp_skills VALUES('李神奇', 'Oracle');
INSERT INTO dec_emp_skills VALUES('李神奇', 'UNIX');
INSERT INTO dec_emp_skills VALUES('李神奇', 'Java');
INSERT INTO dec_emp_skills VALUES('李神奇', 'C++');
INSERT INTO dec_emp_skills VALUES('张平井', 'UNIX');
INSERT INTO dec_emp_skills VALUES('张平井', 'Java');
INSERT INTO dec_emp_skills VALUES('张平井', 'C++');
INSERT INTO dec_emp_skills VALUES('张平井', 'Spark');
INSERT INTO dec_emp_skills VALUES('张平井', 'Mainframe');
INSERT INTO dec_emp_skills VALUES('张平井', 'NET');
INSERT INTO dec_emp_skills VALUES('若填部', 'Perl');
INSERT INTO dec_emp_skills VALUES('若填部', 'Mainframe');
INSERT INTO dec_emp_skills VALUES('韩都来', 'Oracle');
INSERT INTO dec_emp_skills VALUES('韩都来', 'Java');
INSERT INTO dec_emp_skills VALUES('韩都来', 'UNIX');
INSERT INTO dec_emp_skills VALUES('周经为', 'Oracle');
INSERT INTO dec_emp_skills VALUES('周经为', 'UNIX');
INSERT INTO dec_emp_skills VALUES('周经为', 'UNIX');

DROP TABLE IF EXISTS dec_skill_requirements;
CREATE TABLE dec_skill_requirements (
  skill varchar(32)
);

INSERT INTO dec_skill_requirements VALUES ('Oracle');
INSERT INTO dec_skill_requirements VALUES ('UNIX');
INSERT INTO dec_skill_requirements VALUES ('Java');
emp skill
王田 Oracle
王田 UNIX
王田 Java
王田 MySQL
李神奇 Oracle
李神奇 UNIX
李神奇 Java
李神奇 C++
张平井 UNIX
张平井 Java
张平井 C++
张平井 Spark
张平井 Mainframe
张平井 NET
若填部 Perl
若填部 Mainframe
韩都来 Oracle
韩都来 Java
韩都来 UNIX
周经为 Oracle
周经为 UNIX
周经为 UNIX

要求:哪位员工有需求表所需所有技能?

SELECT DISTINCT emp FROM dec_emp_skills emp1
WHERE NOT EXISTS (
  SELECT skill FROM dec_skill_requirements
  EXCEPT
  SELECT skill FROM dec_emp_skills emp2
  WHERE emp1.emp = emp2.emp
);

--Another way:
SELECT emp FROM (
  SELECT DISTINCT e.emp, r.skill 
  FROM dec_emp_skills e
  LEFT JOIN dec_skill_requirements r
    ON e.skill = r.skill
) ALT1
GROUP by emp HAVING COUNT(skill) = 
  (SELECT COUNT(*) FROM dec_skill_requirements);

要求:哪位员工有且仅有需求中的技能,即他的技能刚好符合要求,不少也不多。这里有三种方式:

SELECT emp FROM dec_emp_skills t1
WHERE NOT EXISTS (
  SELECT skill FROM dec_skill_requirements
  EXCEPT
  SELECT t2.skill FROM dec_emp_skills t2
  WHERE t1.emp = t2.emp
)
GROUP BY emp
HAVING COUNT(*) = (SELECT COUNT(*) FROM dec_skill_requirements);

--Another way:
SELECT emp, 
  COUNT(has_skill) as has_skill_count, 
  COUNT(requirement_skill) as requirement_skill_count 
FROM (
  SELECT DISTINCT e.emp, 
    e.skill as has_skill, 
    r.skill as requirement_skill
  FROM dec_emp_skills e
  LEFT JOIN dec_skill_requirements r
    ON e.skill = r.skill
) TMP1
GROUP by emp
HAVING requirement_skill_count = has_skill_count
  AND requirement_skill_count = (
    SELECT COUNT(*) FROM dec_skill_requirements
);

第三种方式原理是:两个集合,如果A - B 是空并且 B - A 也是空,那么这两个集合就是相等的;而表达集合是空的方式就用NOT EXISTS:

SELECT DISTINCT emp FROM dec_emp_skills t1
WHERE NOT EXISTS (
  SELECT skill FROM dec_skill_requirements
  EXCEPT
  SELECT t2.skill FROM dec_emp_skills t2
  WHERE t1.emp = t2.emp
) 
AND NOT EXISTS (
  SELECT t2.skill FROM dec_emp_skills t2
  WHERE t1.emp = t2.emp
  EXCEPT
  SELECT skill FROM dec_skill_requirements
);

[例20]下面是一个座位预订表

DROP TABLE IF EXISTS dec_seats;
CREATE TABLE dec_seats (
  seat INT, 
  status varchar(16)
);

INSERT INTO dec_seats VALUES(1, 'Reserved');
INSERT INTO dec_seats VALUES(2, 'Reserved');
INSERT INTO dec_seats VALUES(3, '');
INSERT INTO dec_seats VALUES(4, '');
INSERT INTO dec_seats VALUES(5, '');
INSERT INTO dec_seats VALUES(6, 'Reserved');
INSERT INTO dec_seats VALUES(7, '');
INSERT INTO dec_seats VALUES(8, '');
INSERT INTO dec_seats VALUES(9, '');
INSERT INTO dec_seats VALUES(10, '');
INSERT INTO dec_seats VALUES(11, '');
INSERT INTO dec_seats VALUES(12, 'Reserved');
INSERT INTO dec_seats VALUES(13, 'Reserved');
INSERT INTO dec_seats VALUES(14, '');
INSERT INTO dec_seats VALUES(15, '');
seat status
1 Reserved
2 Reserved
3  
4  
5  
6 Reserved
7  
8  
9  
10  
11  
12 Reserved
13 Reserved
14  
15  

要求:有一个人想要预订3个连续的座位,找出表中所有的可选项。

SELECT a.seat, '~', b.seat
FROM dec_seats a, dec_seats b
WHERE b.seat = a.seat + 2
AND NOT EXISTS (
  SELECT * FROM dec_seats c
  WHERE c.seat between a.seat AND b.seat
    AND c.status <> ''
);
  
--Another way:
SELECT seat, '~', end_seat
FROM (SELECT seat, MAX(seat) OVER (
     ORDER BY seat rows between 2 following AND 2 following) AS end_seat
  FROM dec_seats 
  WHERE status = ''
)t1
WHERE end_seat = seat + 2;

[例21]使用维度表(Dimension)的信息来更新事实表(Fact)中关联信息,这种需求非常常见

DROP TABLE IF EXISTS dec_product_sale;
CREATE TABLE dec_product_sale (
  sale_id INT PRIMARY KEY AUTO_INCREMENT,
  product_id char(4),
  product_name varchar(32),
  product_type varchar(16), 
  quantity decimal
);

INSERT INTO dec_product_sale(product_id, quantity) values('0001', 30);
INSERT INTO dec_product_sale(product_id, quantity) values('0001', 35);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 10);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 12);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 11);
INSERT INTO dec_product_sale(product_id, quantity) values('0005', 10);
INSERT INTO dec_product_sale(product_id, quantity) values('0008', 510);
INSERT INTO dec_product_sale(product_id, quantity) values('0008', 1020);
sale_id product_id product_name product_type quantity
1 0001 null null 30
2 0001 null null 35
3 0003 null null 10
4 0003 null null 12
5 0003 null null 11
6 0005 null null 10
7 0008 null null 510
8 0008 null null 1020

要求:这是我们定义的又一个产品销售表,结合之前定义的dec_products表,更新dec_product_sale表中的product_name和product_type字段。

下面的SQL在某些数据库(比如redshift)上是work的,但MySQL 目前不支持:

UPDATE dec_product_sale s 
  SET s.product_name = p.product_name, s.product_type = p.product_type
FROM dec_products p
WHERE s.product_id = p.product_id;

目前在MySQL上可以使用JOIN来联合2个表,并更新:

UPDATE dec_product_sale s JOIN dec_products p
  ON s.product_id = p.product_id
SET s.product_name = p.product_name, s.product_type = p.product_type;

[例22]查询随机数和UNID

UNID在数据库中非常重要,并且如果使用并行方式处理数据,UNID方式几乎就是唯一可以使用的标记主ID的方式了。这也是在文档型数据库经常使用UNID而不是自增ID的原因。

Mysql语法:

SELECT UUID() as uuid, RAND() as rand, SUBSTRING(CAST(RAND() AS CHAR), 3) AS rand_num;

SELECT UUID() as uuid from information_schema.tables limit 100; 
uuid rand rand_num
85f98c2f-7e15-11ee-ab6a-9c2dcd25c19e 0.4829847072092674 8978415288942322

RedShift居然没有UUID函数,但是有Random(),可以这样(这是一个伪随机的,在大规模应用的时候不能保证不重复):

SELECT LEFT(md5(random()::text), 32) AS uuid,
    RANDOM() AS rand
uuid rand
0cc175b9c0f1b6a831c399e269772661 0.7800906673978432

[例23]有一个职员表t_employee, 有字段employee_id和employee_name等

要求:查找这些人的姓中,只有一个的那些人,可以把name字段的第一个字认为是这个人的姓。

SELECT employee_id, employee_name 
FROM t_employee
INNER JOIN (
  SELECT SUBSTRING(employee_name, 1, 1) AS family_name
  FROM t_employee
  GROUP BY SUBSTRING(employee_name, 1, 1) 
  HAVING COUNT(*) = 1
) alt1
ON alt1.family_name = SUBSTRING(t_employee.employee_name, 1, 1)

[例24]下面是一个员工表和部门表

DROP TABLE IF EXISTS dec_employee;
CREATE TABLE dec_employee (
  account_key INT,
  report_line char(8),
  project varchar(16)
);
INSERT into dec_employee VALUES(1, 'ta', 'NorthEast1');
INSERT into dec_employee VALUES(2, 'ta', '');
INSERT into dec_employee VALUES(3, 'ta', '');
INSERT into dec_employee VALUES(4, 'ce', 'SouthEast1');
INSERT into dec_employee VALUES(5, 'ce', '');
INSERT into dec_employee VALUES(6, 'ce', 'South');
INSERT into dec_employee VALUES(7, 'ta', 'North');
INSERT into dec_employee VALUES(8, 'dt', '');
INSERT into dec_employee VALUES(9, 'dt', 'South');

DROP TABLE IF EXISTS dec_department;
CREATE TABLE dec_department (
  account_key INT,
  department_key INT  
);
INSERT INTO dec_department VALUES(1, 777);
INSERT INTO dec_department VALUES(2, 777);
INSERT INTO dec_department VALUES(3, 655);
INSERT INTO dec_department VALUES(4, 777);
INSERT INTO dec_department VALUES(5, 705);
INSERT INTO dec_department VALUES(6, 655);
INSERT INTO dec_department VALUES(7, 777);

表dec_employee

account_key report_line project
1 ta NorthEast1
2 ta  
3 ta  
4 ce SouthEast1
5 ce  
6 ce South
7 ta North
8 dt  
9 dt South

表dec_department

account_key department_key
1 777
2 777
3 655
4 777
5 705
6 655
7 777

要求:对于员工表中project为空的,使用如下规则填充:如果employee表中有跟他一样report_line的,以及在department表中跟他一样department的,并且有project的,就用这个project;如果找到多条,那就可以使用任意一个。

对于上面的数据,期望结果是:

2 填充为 NorthEast1 或 North
3 保持为空
5 保持为空
8 保持为空

这是一个在ChatGPT帮助下完成的答案,比较复杂;就让它作为这篇漫长博客的结尾吧。

UPDATE dec_employee main 
SET project = COALESCE((
  SELECT emp.project FROM (
    SELECT account_key, report_line, project
    FROM dec_employee
  ) emp
  INNER JOIN dec_department dept0 ON dept0.account_key = emp.account_key
  INNER JOIN dec_department dept1 ON dept1.account_key = main.account_key
  WHERE emp.project <> ''
    AND emp.report_line = main.report_line 
    AND dept0.department_key = dept1.department_key
    AND emp.account_key <> main.account_key
  LIMIT 1), ''
)
WHERE project = '';