神奇的SQL
SQL绝对被人低估了,至少被我低估了。SQL是那种你可以投入少量时间却能获得较大收益的技术,也就是说边际效用极大。SQL具有很强的稳定性,尽管随着时间的推移有不少新的版本发布,但是基本功能几乎没怎么变化,两个主要版本SQL92和SQL99距今都二三十年了。SQL的稳定性能够让你一次性投入而长时间获得收益,相比某些前端技术或某些短命的框架而言,简直是一本万利的买卖。SQL又具有一定的难度,对于某些复杂操作,或者某些常见的坑(比如三值逻辑运算),不经过一定时间的学习,很难轻松掌握和面对。
如果把编程范式粗略地划分成命令式和函数式的话,SQL应该归于函数式这一类;但SQL处理的对象是集合,其数学基础是关系代数和集合论。这一点让SQL跟其他语言非常不同,你没法从其他编程经验中直接获得SQL的编写经验,也就是说,SQL是编程世界中比较特立独行的一个分支,需要花费时间认真对待。
下面以例子解释SQL的编写,主要参考日本作者Mick的《SQL进阶教程(第2版)》,一个容易被人看不上的非常土的书名,但其实内容极好,豆瓣9.1评分。下面的所有例子和SQL均在MySQL8上执行验证过。
[例1]有一城市人口表如下
DROP TABLE IF EXISTS dec_popu;
CREATE TABLE dec_popu (
id INT PRIMARY KEY AUTO_INCREMENT,
city VARCHAR (64) NOT NULL,
gender VARCHAR(1) NOT NULL DEFAULT 'M',
population INT NOT NULL DEFAULT 0
);
INSERT INTO dec_popu(city, gender, population) VALUES('贵阳市', 'M' , 26714);
INSERT INTO dec_popu(city, gender, population) VALUES('贵阳市', 'F' , 23134);
INSERT INTO dec_popu(city, gender, population) VALUES('六枝特区', 'M' , 3026);
INSERT INTO dec_popu(city, gender, population) VALUES('六枝特区', 'F' , 2926);
INSERT INTO dec_popu(city, gender, population) VALUES('德州市', 'M' , 23256);
INSERT INTO dec_popu(city, gender, population) VALUES('青岛市', 'M' , 24770);
INSERT INTO dec_popu(city, gender, population) VALUES('潍坊市', 'M' , 32005);
INSERT INTO dec_popu(city, gender, population) VALUES('潍坊市', 'F' , 12000);
INSERT INTO dec_popu(city, gender, population) VALUES('乐山市', 'M' , 19200);
INSERT INTO dec_popu(city, gender, population) VALUES('乐山市', 'F' , 10800);
id | city | gender | population |
---|---|---|---|
1 | 贵阳市 | M | 26714 |
2 | 贵阳市 | F | 23134 |
3 | 六枝特区 | M | 3026 |
4 | 六枝特区 | F | 2926 |
5 | 德州市 | M | 23256 |
6 | 青岛市 | M | 24770 |
7 | 潍坊市 | M | 32005 |
8 | 潍坊市 | F | 12000 |
9 | 乐山市 | M | 19200 |
10 | 乐山市 | F | 10800 |
11 | 长春市 | F | 5230 |
要求:虽然表格中并没有省份信息,但是需要按照省份统计总人口。省份作为一种常识存在。这里练习使用CASE子句。
SELECT CASE
WHEN city IN ('贵阳市','六枝特区') THEN '贵州'
WHEN city IN ('德州市','青岛市','潍坊市' ) THEN '山东'
WHEN city IN ('乐山市') THEN '四川'
ELSE '未知'
END AS province,
SUM(population) AS Total
FROM dec_popu
GROUP BY province;
要求:以城市/男性人口/女性人口的格式统计数据。这里练习把CASE放到SUM函数中。
SELECT city,
SUM(CASE WHEN gender = 'M' THEN population ELSE 0 END) AS population_male,
SUM(CASE WHEN gender = 'F' THEN population ELSE 0 END) AS population_female
FROM dec_popu
GROUP BY city;
要求:按性别统计,表格中各性别总人数,山东省各性别总人数,山东省下面各市各性别总人数。
SELECT gender
, SUM(population) AS Total
, SUM(CASE WHEN city IN ('德州市') THEN population ELSE 0 END) AS DeZhouShi
, SUM(CASE WHEN city IN ('青岛市') THEN population ELSE 0 END) AS QingDaoShi
, SUM(CASE WHEN city IN ('潍坊市') THEN population ELSE 0 END) AS WeiFangShi
, SUM(CASE WHEN city IN ('德州市','青岛市','潍坊市')
THEN population ELSE 0 END) AS ShanDongSheng
FROM dec_popu
GROUP BY gender;
要求:如果城市中只有1种性别的人口记录,那么选它,如果有2种性别的人口记录,那么就选男性的人口数据。这里可以看出CASE子句极大的灵活性。
SELECT city,
CASE WHEN COUNT(*) = 1 THEN MAX(population)
ELSE MAX(CASE WHEN gender = 'M' THEN population ELSE -1 END) END AS PopuMain
FROM dec_popu
GROUP BY city;
[例2]下面是课程表以及课程时间表
DROP TABLE IF EXISTS dec_course;
CREATE TABLE dec_course (
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_name VARCHAR (32) NOT NULL
);
INSERT INTO dec_course(course_name) VALUES('计算机基础');
INSERT INTO dec_course(course_name) VALUES('会计学');
INSERT INTO dec_course(course_name) VALUES('财务管理学');
INSERT INTO dec_course(course_name) VALUES('英语');
DROP TABLE IF EXISTS dec_course_schedule;
CREATE TABLE dec_course_schedule (
id INT PRIMARY KEY AUTO_INCREMENT,
course_id INT NOT NULL,
schedule_month VARCHAR(10)
);
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-05');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-06');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(1, '2022-07');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(2, '2022-06');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(2, '2022-07');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(3, '2022-05');
INSERT INTO dec_course_schedule(course_id, schedule_month) VALUES(4, '2022-07');
id | course_id | schedule_month |
---|---|---|
1 | 1 | 2022-05 |
2 | 1 | 2022-06 |
3 | 1 | 2022-07 |
4 | 2 | 2022-06 |
5 | 2 | 2022-07 |
6 | 3 | 2022-05 |
7 | 4 | 2022-07 |
要求:把每项课程在每个月是否开课表示出来。这里使用case把课程时间的行信息,转为列信息。
SELECT course_id, course_name,
CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-05') THEN 'Yes' ELSE '-' END AS 'May',
CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-06') THEN 'Yes' ELSE '-' END AS 'June',
CASE WHEN course_id IN (SELECT course_id FROM dec_course_schedule WHERE schedule_month = '2022-07') THEN 'Yes' ELSE '-' END AS 'July'
FROM dec_course;
另一种实现方式:
SELECT c.course_name,
CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-05') THEN 'Yes' ELSE '-' END AS May,
CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-06') THEN 'Yes' ELSE '-' END AS June,
CASE WHEN EXISTS (SELECT course_id FROM dec_course_schedule cs WHERE c.course_id = cs.course_id AND cs.schedule_month = '2022-07') THEN 'Yes' ELSE '-' END AS July
FROM dec_course c;
第三种方法:
SELECT schedule.course_id, course.course_name,
CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-05' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'May',
CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-06' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'JUNE',
CASE WHEN SUM(CASE WHEN schedule.schedule_month = '2022-07' THEN 1 ELSE NULL END) >=1 THEN 'Yes' ELSE '-' END AS 'JULY'
FROM dec_course_schedule schedule
LEFT OUTER JOIN dec_course course
ON schedule.course_id = course.course_id
GROUP BY course_id;
[例3]下面是学生与他们多次模拟考试成绩的一个记录表,成绩记录在列中
DROP TABLE IF EXISTS dec_student_scores;
CREATE TABLE dec_student_scores (
name VARCHAR (16) NOT NULL,
score1 int,
score2 int,
score3 int
);
INSERT INTO dec_student_scores VALUES('sam', 123, 202, 153);
INSERT INTO dec_student_scores VALUES('andy', 251, 185, 132);
INSERT INTO dec_student_scores VALUES('tom', 214, 207, 111);
INSERT INTO dec_student_scores VALUES('judy', 203, 223, 198);
name | score1 | score2 | score3 |
---|---|---|---|
sam | 123 | 202 | 153 |
andy | 251 | 185 | 132 |
tomy | 214 | 207 | 111 |
judy | 203 | 223 | 198 |
要求:找出各个同学的最高分数。这个可以直接使用MySQL支持的GREATEST(score1, score2, score3)函数实现,但我们尝试使用CASE的方式。
SELECT name,
CASE WHEN score_bigger > score3 THEN score_bigger ELSE score3 END AS max_score
FROM (
SELECT name, score3,
CASE WHEN score1 > score2 THEN score1 ELSE score2 END AS score_bigger
FROM dec_student_scores
) tmp;
另一种实现方式:
SELECT name,
CASE WHEN score1 > (CASE WHEN score2 > score3 THEN score2 ELSE score3 END)
THEN score1
ELSE (CASE WHEN score2 > score3 THEN score2 ELSE score3 END) END AS max_score
FROM dec_student_scores;
要求:将各位同学的成绩以sam, judy, tom, andy的顺序排序。这里练习将CASE子句放到ORDER BY之中。
SELECT name, score1, score2, score3
FROM dec_student_scores
ORDER BY (
CASE name
WHEN 'sam' THEN 1
WHEN 'judy' THEN 2
WHEN 'tom' THEN 3
WHEN 'andy' THEN 4
ELSE NULL
END
);
要求:求表中name的笛卡尔积,去除(a, a)这样的元组。使用CROSS JOIN来求笛卡尔积。
SELECT A.name AS name1, B.name AS name2
FROM dec_student_scores A
CROSS JOIN dec_student_scores B
ON A.name <> B.name
ORDER BY A.name;
更进一步,要求(a, b)和(b, a) 这样的数据,只保留一条。这其实是在找两个元素的组合。
SELECT A.name AS name1, B.name AS name2
FROM dec_student_scores A
INNER JOIN dec_student_scores B
ON A.name < B.name
ORDER BY A.name;
[例4]下面是各个服务器在某些日期的工作负载
DROP TABLE IF EXISTS dec_server_loads;
CREATE TABLE dec_server_loads (
server VARCHAR (4) NOT NULL,
sample_date date NOT NULL,
loads int NOT NULL default 0
);
insert into dec_server_loads VALUES ('A', '2022-05-01', 1000);
insert into dec_server_loads VALUES ('A', '2022-05-02', 1200);
insert into dec_server_loads VALUES ('A', '2022-05-03', 1300);
insert into dec_server_loads VALUES ('A', '2022-05-04', 900);
insert into dec_server_loads VALUES ('A', '2022-05-05', 1220);
insert into dec_server_loads VALUES ('B', '2022-05-01', 1100);
insert into dec_server_loads VALUES ('B', '2022-05-02', 1200);
insert into dec_server_loads VALUES ('B', '2022-05-06', 1500);
insert into dec_server_loads VALUES ('B', '2022-05-07', 2000);
insert into dec_server_loads VALUES ('B', '2022-05-12', 1000);
insert into dec_server_loads VALUES ('C', '2022-05-01', 2000);
insert into dec_server_loads VALUES ('C', '2022-05-03', 1300);
insert into dec_server_loads VALUES ('C', '2022-05-04', 1200);
insert into dec_server_loads VALUES ('C', '2022-05-05', 1600);
server | sample_date | loads |
---|---|---|
A | 2022-05-01 | 1000 |
A | 2022-05-02 | 1200 |
A | 2022-05-03 | 1300 |
A | 2022-05-04 | 900 |
A | 2022-05-05 | 1220 |
B | 2022-05-01 | 1100 |
B | 2022-05-02 | 1200 |
B | 2022-05-06 | 1500 |
B | 2022-05-07 | 2000 |
B | 2022-05-12 | 1000 |
C | 2022-05-01 | 2000 |
C | 2022-05-03 | 1300 |
C | 2022-05-04 | 1200 |
C | 2022-05-05 | 1600 |
要求:每条记录都有一个日期,如何把这个日期的前一天找出来?这里需要用到Window函数。
SELECT server, sample_date,
MIN(sample_date) OVER (
PARTITION BY server ORDER BY sample_date ASC
ROWS between 1 preceding AND 1 preceding
) AS last_date
FROM dec_server_loads;
更进一步,把这个日期找出来,并且把这个日期对应的负载数据查出来。这里可以把Window单独定义出来,这样可以在SQL查询中直接使用,保持SQL的简洁。
SELECT server, sample_date, loads,
MIN(sample_date) OVER W AS last_date,
MIN(loads) OVER W AS last_loads
FROM dec_server_loads
Window W AS (PARTITION BY server ORDER BY sample_date ASC
ROWS between 1 preceding AND 1 preceding);
[例5]下面是一个产品表,里面包含产品信息和价格信息
DROP TABLE IF EXISTS dec_fruits;
CREATE TABLE dec_fruits (
id INT PRIMARY KEY AUTO_INCREMENT,
fruit_name VARCHAR(32) NOT NULL,
price DECIMAL(10,2)
);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Watermelon', 2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Banana', 5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Banana', 5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 8);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Apple', 8.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Watermelon', 2.2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Grape', 7.5);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Orange', 12);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Carrot', 2);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Tomato', 1);
INSERT INTO dec_fruits(fruit_name, price) VALUES('Cherry', 20);
id | fruit_name | price |
---|---|---|
1 | Apple | 7.50 |
2 | Watermelon | 2.00 |
3 | Banana | 5.00 |
4 | Apple | 7.50 |
5 | Apple | 7.50 |
6 | Banana | 5.00 |
7 | Apple | 8.00 |
8 | Apple | 8.50 |
9 | Watermelon | 2.20 |
10 | Grape | 7.50 |
11 | Orange | 12.00 |
12 | Carrot | 2.00 |
13 | Tomato | 1.00 |
14 | Cherry | 20.00 |
要求:上面表格中有多个重复数据(fruit_name & price),选出那些重复的数据,选出id较大的作为重复数据。
SELECT * FROM dec_fruits p1
WHERE id > (
SELECT MIN(p2.id)
FROM dec_fruits p2
WHERE p1.fruit_name = p2.fruit_name
AND p1.price = p2.price
);
要求:选出重复的数据,选出id较小的作为重复数据。练习使用EXCEPT来做集合的减法。
SELECT id FROM dec_fruits
EXCEPT
SELECT MAX(id) FROM dec_fruits
GROUP BY fruit_name, price;
要求:只要是重复了就选出来,比如有2条数据重复了,把这2条都选出来,而不是像上面那样只选出一条。这里注意EXISTS的用法,EXISTS非常神奇。
SELECT * FROM dec_fruits p1
WHERE EXISTS (
SELECT 1 FROM (
SELECT fruit_name, price, COUNT(*) AS cnt FROM dec_fruits
GROUP BY fruit_name, price
) p2
WHERE p2.cnt > 1
AND p1.fruit_name = p2.fruit_name
AND p1.price = p2.price
);
要求:对所有产品按价格从高到低排序,注意这里有2种排序方式,如果第一名有2个,下一名是第三名的叫一般rank,下一名是第二的叫dense_rank.
SELECT fruit_name, price,
rank() over (ORDER BY price DESC) AS rank_1,
dense_rank() over (ORDER BY price DESC) AS rank_2_dense
FROM dec_fruits;
另外一种实现方式,这里巧妙的使用了Count和Distinct Count作为排序序号。
SELECT p1.fruit_name, p1.price,
(SELECT COUNT(p2.price) FROM dec_fruits p2 WHERE p2.price > p1.price) + 1
AS rank_1,
(SELECT COUNT(DISTINCT p2.price) FROM dec_fruits p2 WHERE p2.price > p1.price) + 1
AS rank_2_dense
FROM dec_fruits p1
ORDER BY rank_1;
[例6]下面定义了一个用户表
DROP TABLE IF EXISTS dec_user;
CREATE TABLE dec_user (
id INT PRIMARY KEY AUTO_INCREMENT,
nickname VARCHAR(32) NOT NULL,
age int,
address VARCHAR(200) DEFAULT ''
);
INSERT INTO dec_user(nickname, age, address) VALUES ('Liubei', 34, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Guanyu', 32, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Zhangfei', 31, 'Shu');
INSERT INTO dec_user(nickname, age, address) VALUES ('Caocao', 35, 'Wei');
INSERT INTO dec_user(nickname, age, address) VALUES ('Zhangliao', 32, 'Wei');
INSERT INTO dec_user(nickname, address) VALUES ('Guojia', 'Wei');
id | nickname | age | address |
---|---|---|---|
1 | Liubei | 34 | Shu |
2 | Guanyu | 32 | Shu |
3 | Zhangfei | 31 | Shu |
4 | Caocao | 35 | Wei |
5 | Zhangliao | 32 | Wei |
6 | Guojia | null | Wei |
要求:每个用户都有一个Age,查询蜀国中有某个年龄的人而在魏国中没有这个年龄的人。正常应该查出id 是 1和3的蜀国用户,年龄分别为31和34. 由于魏国有年龄字段是NULL的用户,所以下面的NOT IN 方式并不能得到正确结果。
--incorrect due to NULL age in 'Wei'
SELECT * FROM dec_user
WHERE address = 'Shu'
AND age NOT IN (
SELECT age FROM dec_user
WHERE address = 'Wei'
);
正确的做法应该使用NOT EXISTS而不是NOT IN:
SELECT * FROM dec_user u1
WHERE address = 'Shu'
AND NOT EXISTS (
SELECT * FROM dec_user u2
WHERE u1.age = u2.age AND u2.address = 'Wei'
);
注意:使用ORDER BY Age,NULL的数据将会排在前面。使用COALESCE函数将NULL数据转换为其他数据。
SELECT id, nickname, age FROM dec_user
ORDER BY age;
SELECT id, nickname, COALESCE(age, -1) FROM dec_user;
注意:使用concat函数连接一个字符串和NULL,将会得到NULL
SELECT concat('Hello ', NULL) AS result FROM dual;
[例7]下面是一个会议出席人员表
DROP TABLE IF EXISTS dec_meeting;
CREATE TABLE dec_meeting (
meeting_id CHAR(4) not null,
person VARCHAR(32) not null
);
INSERT INTO dec_meeting VALUES('M1', 'ZhangFei');
INSERT INTO dec_meeting VALUES('M1', 'LuSu');
INSERT INTO dec_meeting VALUES('M1', 'Hu Dong');
INSERT INTO dec_meeting VALUES('M2', 'ZhangFei');
INSERT INTO dec_meeting VALUES('M2', 'XiaHouYuan');
INSERT INTO dec_meeting VALUES('M3', 'LuSu');
INSERT INTO dec_meeting VALUES('M3', 'Hu Dong');
INSERT INTO dec_meeting VALUES('M3', 'XiaHouYuan');
meeting_id | person |
---|---|
M1 | ZhangFei |
M1 | LuSu |
M1 | Hu Dong |
M2 | ZhangFei |
M2 | XiaHouYuan |
M3 | LuSu |
M3 | Hu Dong |
M3 | XiaHouYuan |
要求:选出每次会议缺席的人。以这个表的Distinct person作为所有人员的集合。这里使用这个表自身的CROSS JOIN得到所有会议与所有人员的一个笛卡尔积,里面当然有重复的。使用这个集合去减原始的参会人员表,就会得到未参会人员表。这里可以看出来,NOT EXISTS与集合减法(EXCEPT)概念相近。
SELECT DISTINCT T1.meeting_id, T2.person
FROM dec_meeting T1 CROSS JOIN dec_meeting T2
WHERE NOT EXISTS (
SELECT 1 FROM dec_meeting T3
WHERE T1.meeting_id = T3.meeting_id
AND T2.person = T3.person
);
于是,你直接使用EXCEPT也可以:
SELECT DISTINCT T1.meeting_id, T2.person
FROM dec_meeting T1 CROSS JOIN dec_meeting T2
EXCEPT SELECT meeting_id, person FROM dec_meeting;
[例8]下面是又一个学生科目与成绩的表格
DROP TABLE IF EXISTS dec_test_scores;
CREATE TABLE dec_test_scores (
student_id INT NOT NULL,
gender CHAR(1) NOT NULL,
subject VARCHAR (16) NOT NULL,
score int
);
INSERT INTO dec_test_scores VALUES (1, 'F', '数学', 100);
INSERT INTO dec_test_scores VALUES (1, 'F', '语文', 80);
INSERT INTO dec_test_scores VALUES (1, 'F', '理化', 80);
INSERT INTO dec_test_scores VALUES (2, 'F', '数学', 80);
INSERT INTO dec_test_scores VALUES (2, 'F', '语文', 95);
INSERT INTO dec_test_scores VALUES (3, 'M', '数学', 40);
INSERT INTO dec_test_scores VALUES (3, 'M', '语文', 90);
INSERT INTO dec_test_scores VALUES (3, 'M', '社会', 55);
INSERT INTO dec_test_scores VALUES (4, 'M', '数学', 80);
INSERT INTO dec_test_scores VALUES (5, 'F', '数学', 56);
INSERT INTO dec_test_scores VALUES (5, 'F', '语文', 68);
INSERT INTO dec_test_scores VALUES (5, 'F', '社会', 88);
INSERT INTO dec_test_scores VALUES (6, 'M', '数学', 99);
INSERT INTO dec_test_scores VALUES (6, 'M', '社会', 55);
INSERT INTO dec_test_scores VALUES (7, 'M', '数学', 92);
INSERT INTO dec_test_scores VALUES (7, 'M', '语文', 85);
student_id | gender | subject | score |
---|---|---|---|
1 | F | 数学 | 100 |
1 | F | 语文 | 80 |
1 | F | 理化 | 80 |
2 | F | 数学 | 80 |
2 | F | 语文 | 95 |
3 | M | 数学 | 40 |
3 | M | 语文 | 90 |
3 | M | 社会 | 55 |
4 | M | 数学 | 80 |
5 | F | 数学 | 56 |
5 | F | 语文 | 68 |
5 | F | 社会 | 88 |
6 | M | 数学 | 99 |
6 | M | 社会 | 55 |
7 | M | 数学 | 92 |
7 | M | 语文 | 85 |
要求:查找学生,学生的全部科目成绩均大于等于60. 此处依然练习使用NOT EXISTS.
SELECT DISTINCT student_id
FROM dec_test_scores t1
WHERE NOT EXISTS (
SELECT 1 FROM dec_test_scores t2
WHERE t1.student_id = t2.student_id
AND t2.score < 60
);
要求:查找学生,学生拥有语文和数学成绩,并且语文成绩>=60, 数学成绩>=80.
SELECT student_id FROM dec_test_scores t1
WHERE subject IN ('数学', '语文')
AND NOT EXISTS (
SELECT 1 FROM dec_test_scores t2
WHERE t1.student_id = t2.student_id
AND 1 = CASE WHEN subject = '数学' AND score < 80 THEN 1
WHEN subject = '语文' AND score < 60 THEN 1
ELSE 0 END
)
GROUP BY student_id
HAVING COUNT(1) > 1;
ChatGPT给出的答案:
SELECT student_id
FROM dec_test_scores
WHERE subject IN ('语文', '数学')
GROUP BY student_id HAVING COUNT(DISTINCT subject) = 2
AND MIN(CASE WHEN subject = '数学' THEN score END) >= 80
AND MIN(CASE WHEN subject = '语文' THEN score END) >= 60;
要求:哪个分数是出现次数最多的,出现几次? 深入体验GROUP BY的法.
SELECT score, COUNT(*) AS max_cnt
FROM dec_test_scores
GROUP BY score
HAVING max_cnt >= (
SELECT MAX(cnt) FROM (
SELECT COUNT(*) AS cnt FROM dec_test_scores
GROUP BY score
) T1
);
要求:哪门课程,其中至少75%的学生成绩大于等于80分?
SELECT subject FROM dec_test_scores
GROUP by subject
HAVING SUM(CASE WHEN score >= 80 then 1 else 0 END) >= COUNT(*) * 3 / 4;
要求:哪门课程,男生及格的人数大于女生及格人数?
SELECT subject FROM dec_test_scores
GROUP BY subject
HAVING SUM(CASE WHEN score >=60 AND gender = 'M' THEN 1 ELSE 0 END)
> SUM(CASE WHEN score >=60 AND gender = 'F' THEN 1 ELSE 0 END) ;
要求:哪门课程,女生的平均分大于男生的平均分?注意:此处else部分用NULL而不是0,用NULL可以确保在计算avg时候不包含在内;即求avg时,对于NULL的记录,不会包含在分母里。
SELECT subject FROM dec_test_scores
GROUP BY subject
HAVING AVG(CASE WHEN gender = 'F' THEN score ELSE NULL END)
> AVG(CASE WHEN gender = 'M' THEN score ELSE NULL END);
[例9]下面是一个seq表
DROP TABLE IF EXISTS dec_seq;
CREATE TABLE dec_seq (
seq int NOT NULL,
name VARCHAR(16) NOT NULL default ''
);
要求:快速插入1~100这100个数字.
INSERT INTO dec_seq(seq)
SELECT 1 + ones.id + 10 * tens.id AS number
FROM (
SELECT 0 AS id
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
) ones CROSS JOIN (
SELECT 0 AS id
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
) tens;
[例10]下面是城市仓库储存原材料的表,列是城市,入库日期和材料
DROP TABLE IF EXISTS dec_materials;
CREATE TABLE dec_materials(
center VARCHAR(16) not null,
in_date date,
material VARCHAR(4)
);
INSERT INTO dec_materials VALUES ('Beijing', '2018-04-01', '锡');
INSERT INTO dec_materials VALUES ('Beijing', '2018-04-12', '锌');
INSERT INTO dec_materials VALUES ('Beijing', '2018-05-17', '铝');
INSERT INTO dec_materials VALUES ('Beijing', '2018-05-20', '锌');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-20', '铜');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-22', '镍');
INSERT INTO dec_materials VALUES ('Shanghai', '2018-04-29', '铅');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-03-15', '钛');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-04-01', '钢');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-04-24', '钢');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-05-02', '镁');
INSERT INTO dec_materials VALUES ('Tianjin', '2018-05-10', '钛');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-10', '锌');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-28', '锡');
INSERT INTO dec_materials VALUES ('Dalian', '2018-05-29', '铝');
center | in_date | material |
---|---|---|
Beijing | 2018-04-01 | 锡 |
Beijing | 2018-04-12 | 锌 |
Beijing | 2018-05-17 | 铝 |
Beijing | 2018-05-20 | 锌 |
Shanghai | 2018-04-20 | 铜 |
Shanghai | 2018-04-22 | 镍 |
Shanghai | 2018-04-29 | 铅 |
Tianjin | 2018-03-15 | 钛 |
Tianjin | 2018-04-01 | 钢 |
Tianjin | 2018-04-24 | 钢 |
Tianjin | 2018-05-02 | 镁 |
Tianjin | 2018-05-10 | 钛 |
Dalian | 2018-05-10 | 锌 |
Dalian | 2018-05-28 | 锡 |
Dalian | 2018-05-29 | 铝 |
要求:哪些城市对某种原料有多于1次的入库?观察此处Count(Distinct AAA)的巧妙用法:
SELECT center FROM DEC_MATERIALS
GROUP BY center
HAVING COUNT(material) > COUNT(DISTINCT material);
--Another way
SELECT center,
CASE WHEN COUNT(material) > COUNT(DISTINCT material)
THEN 'Duplicated Material' ELSE 'No' END AS duplicated
FROM DEC_MATERIALS
GROUP BY center;
要求:哪些中心储存的原材料完全一样?
WITH citym AS (
SELECT DISTINCT center, material FROM dec_materials
)
SELECT citymt1.center AS city1, '-' , citymt2.center AS city2
FROM citym citymt1, citym citymt2
WHERE citymt1.center < citymt2.center
AND citymt1.material = citymt2.material
GROUP BY citymt1.center, citymt2.center
HAVING COUNT(*) = (SELECT COUNT(*) FROM citym citymt3 WHERE citymt1.center = citymt3.center)
AND COUNT(*) = (SELECT COUNT(*) FROM citym citymt4 WHERE citymt2.center = citymt4.center);
[例11]下面是各个商店以及商店所售卖商品的表
DROP TABLE IF EXISTS dec_items;
CREATE TABLE dec_items(
item varchar(16) not null
);
INSERT INTO dec_items VALUES ('啤酒');
INSERT INTO dec_items VALUES ('纸尿裤');
INSERT INTO dec_items VALUES ('自行车');
DROP TABLE IF EXISTS dec_shop_items;
CREATE TABLE dec_shop_items(
shop varchar(8) not null,
item varchar(16)
);
INSERT INTO dec_shop_items VALUES ('A', '啤酒');
INSERT INTO dec_shop_items VALUES ('A', '纸尿裤');
INSERT INTO dec_shop_items VALUES ('A', '自行车');
INSERT INTO dec_shop_items VALUES ('A', '漫画书');
INSERT INTO dec_shop_items VALUES ('B', '啤酒');
INSERT INTO dec_shop_items VALUES ('B', '纸尿裤');
INSERT INTO dec_shop_items VALUES ('B', '自行车');
INSERT INTO dec_shop_items VALUES ('C', '电视');
INSERT INTO dec_shop_items VALUES ('C', '纸尿裤');
INSERT INTO dec_shop_items VALUES ('C', '自行车');
INSERT INTO dec_shop_items VALUES ('D', '台灯');
INSERT INTO dec_shop_items VALUES ('D', '牙膏');
shop | item |
---|---|
A | 啤酒 |
A | 纸尿裤 |
A | 自行车 |
A | 漫画书 |
B | 啤酒 |
B | 纸尿裤 |
B | 自行车 |
C | 电视 |
C | 纸尿裤 |
C | 自行车 |
D | 台灯 |
D | 牙膏 |
要求:哪所商店售卖items表中的所有商品?
SELECT s.shop FROM dec_shop_items s
INNER JOIN dec_items i
ON s.item = i.item
GROUP BY s.shop
HAVING COUNT(s.item) = (SELECT COUNT(item) FROM dec_items);
要求:哪所商店不多不少刚好拥有items表中的所有商品?
SELECT s.shop FROM dec_shop_items s
LEFT OUTER JOIN dec_items i
ON s.item = i.item
GROUP BY s.shop
HAVING COUNT(s.item) = (SELECT COUNT(item) FROM dec_items)
AND COUNT(i.item) = (SELECT COUNT(item) FROM dec_items);
要求:列出所有商店的已经拥有的在items表中的商品数量,以及还差的数量?这里还是使用left join,在计算COUNT时,NULL值不算。
SELECT shop,
COUNT(item) AS shop_having_count,
(SELECT COUNT(*) FROM dec_items) - COUNT(item) AS diff_count
FROM (
SELECT s.shop, i.item FROM dec_shop_items s
LEFT OUTER JOIN dec_items i
ON s.item = i.item
) t1
GROUP BY shop ;
[例12]下面是来自不同学院的学生他们提交论文的日期的表
DROP TABLE IF EXISTS dec_submission;
CREATE TABLE dec_submission (
student_id char(4) PRIMARY KEY,
dpt varchar(32) not null,
sbmt_date date
);
INSERT INTO dec_submission VALUES('100', '理学院', '2018-10-01');
INSERT INTO dec_submission VALUES('101', '理学院', '2018-09-22');
INSERT INTO dec_submission(student_id, dpt) VALUES('102', '文学院');
INSERT INTO dec_submission VALUES('103', '文学院', '2018-09-10');
INSERT INTO dec_submission VALUES('200', '文学院', '2018-09-22');
INSERT INTO dec_submission(student_id, dpt) VALUES('201', '工学院');
INSERT INTO dec_submission VALUES('202', '经济学院', '2018-09-25');
student_id | dpt | sbmt_date |
---|---|---|
100 | 理学院 | 2018-10-01 |
101 | 理学院 | 2018-09-22 |
102 | 文学院 | null |
103 | 文学院 | 2018-09-10 |
200 | 文学院 | 2018-09-22 |
201 | 工学院 | null |
202 | 经济学院 | 2018-09-25 |
要求:哪个学院的学生全部在9月份提交?
SELECT dpt FROM dec_submission
GROUP BY dpt
HAVING COUNT(*) = SUM(
CASE WHEN sbmt_date between '2018-09-01' and '2018-09-30' THEN 1
ELSE 0 END
);
[例13]下表是销售表,包含销售年份和销售数量
DROP TABLE IF EXISTS dec_sale;
CREATE TABLE dec_sale (
sale_year int,
sale int
);
INSERT INTO dec_sale VALUES(2010, 50);
INSERT INTO dec_sale VALUES(2011, 51);
INSERT INTO dec_sale VALUES(2012, 52);
INSERT INTO dec_sale VALUES(2013, 52);
INSERT INTO dec_sale VALUES(2014, 50);
INSERT INTO dec_sale VALUES(2015, 50);
INSERT INTO dec_sale VALUES(2016, 49);
INSERT INTO dec_sale VALUES(2017, 55);
要求:统计本年度与上年度的销售记录之差,因为表中的年份是连续的,所以这里使用了按年份与年份+1的左连接:
select s1.sale_year,
s1.sale as current_sale,
s2.sale as last_year_sale,
s1.sale - s2.sale as diff
FROM dec_sale s1 LEFT OUTER JOIN dec_sale s2
ON s1.sale_year = s2.sale_year + 1;
另外,很自然地,我们可以使用Window函数的方式:
SELECT sale_year,
sale AS current_sale,
SUM(sale) OVER (ORDER BY sale_year rows between 1 preceding AND 1 preceding)
AS last_year_sale,
sale - SUM(sale) OVER (ORDER BY sale_year rows between 1 preceding AND 1 preceding)
AS diff
FROM dec_sale;
上面SQL的窗口定义使用的是 rows between…and…, 使用这种方式可以不必在乎年份是否连续,它只考虑它物理上临近的上一行即可;还可以使用 range between…and…, 使用range 则表明使用值来限定,这就要求sale_year 必须连续,如果不连续,则会出现找不到上年的情况。比如我们update 一下该表:
UPDATE dec_sale SET sale_year = 2019 WHERE sale_year = 2015;
SELECT * FROM dec_sale ORDER BY sale_year;
sale_year | sale |
---|---|
2010 | 50 |
2011 | 51 |
2012 | 52 |
2013 | 52 |
2014 | 50 |
2016 | 49 |
2017 | 55 |
2019 | 50 |
这种情况下,使用上面的rows between…and… 没有问题,但如果使用 range between…and…则会出现空的行。仔细体会两种不同窗口定义的区别。
sale_year | sale | last_year_sale | diff |
---|---|---|---|
2010 | 50 | null | null |
2011 | 51 | 50 | 1 |
2012 | 52 | 51 | 1 |
2013 | 52 | 52 | 0 |
2014 | 50 | 52 | -2 |
2016 | 49 | null | null |
2017 | 55 | 49 | 6 |
2019 | 50 | null | null |
[例14]下面是一个产品表,包括产品类别以及产品价格
DROP TABLE IF EXISTS dec_products;
CREATE TABLE dec_products (
product_id char(4),
product_name varchar(32),
product_type varchar(16),
price DECIMAL(10,2)
);
INSERT INTO dec_products VALUES('0001', 'T-shirt', 'Clothes', 50);
INSERT INTO dec_products VALUES('0002', 'Printer', 'Office', 800);
INSERT INTO dec_products VALUES('0003', 'Sports shirt', 'Clothes', 120);
INSERT INTO dec_products VALUES('0004', 'Knife', 'Kitchen', 69);
INSERT INTO dec_products VALUES('0005', 'Pot', 'Kitchen', 339);
INSERT INTO dec_products VALUES('0006', 'Chopsticks', 'Kitchen', 16);
INSERT INTO dec_products VALUES('0007', 'Board', 'Kitchen', 88);
INSERT INTO dec_products VALUES('0008', 'Pen', 'Office', 2);
product_id | product_name | product_type | price |
---|---|---|---|
0001 | T-shirt | Clothes | 50.00 |
0002 | Printer | Office | 800.00 |
0003 | Sports shirt | Clothes | 120.00 |
0004 | Knife | Kitchen | 69.00 |
0005 | Pot | Kitchen | 339.00 |
0006 | Chopsticks | Kitchen | 16.00 |
0007 | Board | Kitchen | 88.00 |
0008 | Pen | Office | 2.00 |
要求:对于每个产品类别,找出其中大于该类别平均价格的产品:
SELECT * FROM dec_products A
WHERE price > (SELECT avg(price)
FROM dec_products B
WHERE A.product_type = B.product_type
GROUP BY product_type
);
同样,可以使用窗口函数:
SELECT product_name, product_type, price, avg_price
FROM (
SELECT product_name, product_type, price,
avg(price) OVER (partition by product_type) AS avg_price
FROM dec_products
) ALT1
WHERE price > avg_price;
要求:对所有产品按类别和名称排序后,取出第2~6条,这通常是一个分页取数据的需求:
SELECT product_name, product_type, price, row_num, total_count
FROM (
SELECT product_name, product_type, price,
ROW_NUMBER() OVER(ORDER BY product_type, product_name) AS row_num,
COUNT(*) OVER () AS total_count
FROM dec_products ) T1
WHERE row_num between 2 AND 6;
[例15]下面是一个旅馆预定表,里面包含预定开始和结束日期
DROP TABLE IF EXISTS dec_reservations;
CREATE TABLE dec_reservations (
reserver char(16) not null,
start_date DATE,
end_date DATE
);
INSERT INTO dec_reservations VALUES ('Zhang', '2018-10-26', '2018-10-27');
INSERT INTO dec_reservations VALUES ('Wang', '2018-10-28', '2018-10-31');
INSERT INTO dec_reservations VALUES ('Li', '2018-10-31', '2018-11-01');
INSERT INTO dec_reservations VALUES ('Zhao', '2018-11-03', '2018-11-04');
INSERT INTO dec_reservations VALUES ('Qian', '2018-11-03', '2018-11-05');
INSERT INTO dec_reservations VALUES ('Sun', '2018-11-04', '2018-11-06');
reserver | start_date | end_date |
---|---|---|
Zhang | 2018-10-26 | 2018-10-27 |
Wang | 2018-10-28 | 2018-10-31 |
Li | 2018-10-31 | 2018-11-01 |
Zhao | 2018-11-03 | 2018-11-04 |
Qian | 2018-11-03 | 2018-11-05 |
Sun | 2018-11-04 | 2018-11-06 |
要求:哪些是重复预订的,重复预订是指预订日期重叠。
SELECT * FROM dec_reservations A
WHERE EXISTS(
SELECT * FROM dec_reservations B
WHERE A.reserver <> B.reserver
AND (A.start_date between B.start_date AND B.end_date
OR A.end_date between B.start_date AND B.end_date)
);
另一种解决方案,这种方式会显示预订冲突的两个人:
SELECT * FROM (
SELECT reserver, start_date, end_date,
MAX(start_date) OVER (ORDER BY start_date rows between 1 following AND 1 following)
AS next_start_date,
MAX(reserver) OVER (ORDER BY start_date rows between 1 following AND 1 following)
AS next_reserver
FROM dec_reservations) T1
WHERE next_start_date between start_date AND end_date;
[例16]这是一个按日期记录的账户余额表
DROP TABLE IF EXISTS dec_accounts;
CREATE TABLE dec_accounts (
prc_date DATE,
prc_amt Decimal(10,2)
);
INSERT INTO dec_accounts VALUES ('2018-10-26', 100);
INSERT INTO dec_accounts VALUES ('2018-10-28', 200);
INSERT INTO dec_accounts VALUES ('2018-10-31', 300);
INSERT INTO dec_accounts VALUES ('2018-11-03', 400);
INSERT INTO dec_accounts VALUES ('2018-11-04', -500);
INSERT INTO dec_accounts VALUES ('2018-11-06', 600);
INSERT INTO dec_accounts VALUES ('2018-11-11', 700);
prc_date | prc_amt |
---|---|
2018-10-26 | 100.00 |
2018-10-28 | 200.00 |
2018-10-31 | 300.00 |
2018-11-03 | 400.00 |
2018-11-04 | -500.00 |
2018-11-06 | 600.00 |
2018-11-11 | 700.00 |
要求:查询当前行以及前2行这总共3行的平均值。
SELECT prc_date, prc_amt, AVG(prc_amt)
OVER (ORDER BY prc_date ROWS BETWEEN 2 preceding AND current ROW)
AS avg_amt
FROM dec_accounts;
可以看出,使用window函数可以很轻松地实现这个目标,如果不使用window函数该怎么做呢?
----define a temporary table acct with ROW_NUM
WITH acct AS (
SELECT (@row_number:=@row_number + 1) AS seq, t.prc_date, t.prc_amt
FROM dec_accounts t, (SELECT @row_number:=0) AS r
ORDER BY prc_date
)
SELECT alt.*, CASE
WHEN alt.amt2 is NULL THEN alt.amt1
WHEN alt.amt3 is NULL THEN (alt.amt1 + alt.amt2)/2
ELSE (alt.amt1 + alt.amt2 + alt.amt3)/3 END AS avg_amt
FROM (SELECT acct.seq,
acct.prc_date,
acct.prc_amt AS amt1,
acct2.prc_amt AS amt2,
acct3.prc_amt AS amt3
FROM acct LEFT OUTER JOIN acct acct2
ON acct.seq = acct2.seq + 1
LEFT OUTER JOIN acct acct3
ON acct.seq = acct3.seq +2
) alt;
上面,我们利用with首先定义一个tmp表,我们需要给每行加上行号,然后通过行号来做左外连接,这个方式比之前window方式麻烦地多。还有一种写法稍微简单一些、但逻辑比较复杂的:
SELECT prc_date, A1.prc_amt, (
SELECT avg(prc_amt) FROM dec_accounts A2
WHERE A1.prc_date >= A2.prc_date
AND(
SELECT COUNT(*) FROM dec_accounts A3
WHERE A3.prc_date between A2.prc_date AND A1.prc_date) <=3) AS mvg_SUM
FROM dec_accounts A1
ORDER BY prc_date;
[例17]有如下的家庭信息表,这个表中的员工子女信息存放在各个列中;为了方便,我们把子女信息存为多个行,所以建立了一个view
DROP TABLE IF EXISTS dec_family_info;
CREATE TABLE dec_family_info (
name varchar(32),
child_1 varchar(32),
child_2 varchar(32),
child_3 varchar(32)
);
INSERT INTO dec_family_info VALUES ('嘉鸿', '小阳', '小卡', '小光');
INSERT INTO dec_family_info VALUES ('赤加', '阿春', '阿夏', NULL);
INSERT INTO dec_family_info VALUES ('谢非', '海海', NULl , NULL);
INSERT INTO dec_family_info VALUES ('龙田', NULL, NULL, NULL);
DROP VIEW IF EXISTS dec_v_family_info;
CREATE VIEW dec_v_family_info(child) as
SELECT chd from(
SELECT child_1 as chd from dec_family_info
UNION
SELECT child_2 as chd from dec_family_info
UNION
SELECT child_3 as chd from dec_family_info
) tmp
WHERE tmp.chd IS NOT NULL;
name | child_1 | child_2 | child_3 |
---|---|---|---|
嘉鸿 | 小阳 | 小卡 | 小光 |
赤加红 | 阿春 | 阿夏 | null |
谢非 | 海海 | null | null |
龙田 | null | null | null |
所以,如何把各个子女转换成行信息?下面演示 列 => 行:
SELECT name, child_1 as chd from dec_family_info WHERE child_1 IS NOT NULL
UNION
SELECT name, child_2 as chd from dec_family_info WHERE child_2 IS NOT NULL
UNION
SELECT name, child_3 as chd from dec_family_info WHERE child_3 IS NOT NULL;
或者,通过之前创建的VIEW来:
SELECT p.name, vc.child
FROM dec_family_info p LEFT OUTER JOIN dec_v_family_info vc
ON vc.child IN (p.child_1, p.child_2, p.child_3);
[例18]我们有两个物品名单,包含id和name,其中id是准确的,但name可能有变更
DROP TABLE IF EXISTS dec_namea;
CREATE TABLE dec_namea (
id int PRIMARY KEY,
name varchar(32)
);
DROP TABLE IF EXISTS dec_nameb;
CREATE TABLE dec_nameb (
id int PRIMARY KEY,
name varchar(32)
);
INSERT INTO dec_namea VALUES(1, '大众途观');
INSERT INTO dec_namea VALUES(2, '东风大货');
INSERT INTO dec_namea VALUES(3, '丰田皮卡');
INSERT INTO dec_nameb VALUES(1, '大众途观L');
INSERT INTO dec_nameb VALUES(2, '东风大货');
INSERT INTO dec_nameb VALUES(4, '特斯拉 Model Y');
List 1:
id | name |
---|---|
1 | 大众途观 |
2 | 东风大货 |
3 | 丰田皮卡 |
List 2:
id | name |
---|---|
1 | 大众途观L |
2 | 东风大货 |
4 | 特斯拉 Model Y |
要求:查找name有变更的物品,以及单独只在list 1或list 2中的物品。这里可以使用全外连接(FULL OUTER JOIN)加id相同条件即可,由于MySQL并不支持全外连接,所以我们使用左连接UNION右连接来实现全连接:
SELECT * FROM (
SELECT a.id AS id, a.name AS a_name, b.name AS b_name
FROM dec_namea a LEFT OUTER JOIN dec_nameb b
ON a.id = b.id
UNION
SELECT b.id AS id, a.name AS a_name, b.name AS b_name
FROM dec_namea a right outer join dec_nameb b
ON a.id = b.id
) ALT1
WHERE a_name IS NULL OR b_name IS NULL OR a_name <> b_name;
要求:使用表B的数据更新表A数据,即如果数据存在B而不存在A则insert,如果同时存在但name不同,则使用B的name:
INSERT INTO dec_namea
SELECT id, name FROM dec_nameb
ON DUPLICATE KEY UPDATE dec_namea.name = dec_nameb.name;
[例19]有下面员工技能表以及项目技能需求表
DROP TABLE IF EXISTS dec_emp_skills;
CREATE TABLE dec_emp_skills (
emp varchar(32),
skill varchar(32)
);
INSERT INTO dec_emp_skills VALUES('王田', 'Oracle');
INSERT INTO dec_emp_skills VALUES('王田', 'UNIX');
INSERT INTO dec_emp_skills VALUES('王田', 'Java');
INSERT INTO dec_emp_skills VALUES('王田', 'MySQL');
INSERT INTO dec_emp_skills VALUES('李神奇', 'Oracle');
INSERT INTO dec_emp_skills VALUES('李神奇', 'UNIX');
INSERT INTO dec_emp_skills VALUES('李神奇', 'Java');
INSERT INTO dec_emp_skills VALUES('李神奇', 'C++');
INSERT INTO dec_emp_skills VALUES('张平井', 'UNIX');
INSERT INTO dec_emp_skills VALUES('张平井', 'Java');
INSERT INTO dec_emp_skills VALUES('张平井', 'C++');
INSERT INTO dec_emp_skills VALUES('张平井', 'Spark');
INSERT INTO dec_emp_skills VALUES('张平井', 'Mainframe');
INSERT INTO dec_emp_skills VALUES('张平井', 'NET');
INSERT INTO dec_emp_skills VALUES('若填部', 'Perl');
INSERT INTO dec_emp_skills VALUES('若填部', 'Mainframe');
INSERT INTO dec_emp_skills VALUES('韩都来', 'Oracle');
INSERT INTO dec_emp_skills VALUES('韩都来', 'Java');
INSERT INTO dec_emp_skills VALUES('韩都来', 'UNIX');
INSERT INTO dec_emp_skills VALUES('周经为', 'Oracle');
INSERT INTO dec_emp_skills VALUES('周经为', 'UNIX');
INSERT INTO dec_emp_skills VALUES('周经为', 'UNIX');
DROP TABLE IF EXISTS dec_skill_requirements;
CREATE TABLE dec_skill_requirements (
skill varchar(32)
);
INSERT INTO dec_skill_requirements VALUES ('Oracle');
INSERT INTO dec_skill_requirements VALUES ('UNIX');
INSERT INTO dec_skill_requirements VALUES ('Java');
emp | skill |
---|---|
王田 | Oracle |
王田 | UNIX |
王田 | Java |
王田 | MySQL |
李神奇 | Oracle |
李神奇 | UNIX |
李神奇 | Java |
李神奇 | C++ |
张平井 | UNIX |
张平井 | Java |
张平井 | C++ |
张平井 | Spark |
张平井 | Mainframe |
张平井 | NET |
若填部 | Perl |
若填部 | Mainframe |
韩都来 | Oracle |
韩都来 | Java |
韩都来 | UNIX |
周经为 | Oracle |
周经为 | UNIX |
周经为 | UNIX |
要求:哪位员工有需求表所需所有技能?
SELECT DISTINCT emp FROM dec_emp_skills emp1
WHERE NOT EXISTS (
SELECT skill FROM dec_skill_requirements
EXCEPT
SELECT skill FROM dec_emp_skills emp2
WHERE emp1.emp = emp2.emp
);
--Another way:
SELECT emp FROM (
SELECT DISTINCT e.emp, r.skill
FROM dec_emp_skills e
LEFT JOIN dec_skill_requirements r
ON e.skill = r.skill
) ALT1
GROUP by emp HAVING COUNT(skill) =
(SELECT COUNT(*) FROM dec_skill_requirements);
要求:哪位员工有且仅有需求中的技能,即他的技能刚好符合要求,不少也不多。这里有三种方式:
SELECT emp FROM dec_emp_skills t1
WHERE NOT EXISTS (
SELECT skill FROM dec_skill_requirements
EXCEPT
SELECT t2.skill FROM dec_emp_skills t2
WHERE t1.emp = t2.emp
)
GROUP BY emp
HAVING COUNT(*) = (SELECT COUNT(*) FROM dec_skill_requirements);
--Another way:
SELECT emp,
COUNT(has_skill) as has_skill_count,
COUNT(requirement_skill) as requirement_skill_count
FROM (
SELECT DISTINCT e.emp,
e.skill as has_skill,
r.skill as requirement_skill
FROM dec_emp_skills e
LEFT JOIN dec_skill_requirements r
ON e.skill = r.skill
) TMP1
GROUP by emp
HAVING requirement_skill_count = has_skill_count
AND requirement_skill_count = (
SELECT COUNT(*) FROM dec_skill_requirements
);
第三种方式原理是:两个集合,如果A - B 是空并且 B - A 也是空,那么这两个集合就是相等的;而表达集合是空的方式就用NOT EXISTS:
SELECT DISTINCT emp FROM dec_emp_skills t1
WHERE NOT EXISTS (
SELECT skill FROM dec_skill_requirements
EXCEPT
SELECT t2.skill FROM dec_emp_skills t2
WHERE t1.emp = t2.emp
)
AND NOT EXISTS (
SELECT t2.skill FROM dec_emp_skills t2
WHERE t1.emp = t2.emp
EXCEPT
SELECT skill FROM dec_skill_requirements
);
[例20]下面是一个座位预订表
DROP TABLE IF EXISTS dec_seats;
CREATE TABLE dec_seats (
seat INT,
status varchar(16)
);
INSERT INTO dec_seats VALUES(1, 'Reserved');
INSERT INTO dec_seats VALUES(2, 'Reserved');
INSERT INTO dec_seats VALUES(3, '');
INSERT INTO dec_seats VALUES(4, '');
INSERT INTO dec_seats VALUES(5, '');
INSERT INTO dec_seats VALUES(6, 'Reserved');
INSERT INTO dec_seats VALUES(7, '');
INSERT INTO dec_seats VALUES(8, '');
INSERT INTO dec_seats VALUES(9, '');
INSERT INTO dec_seats VALUES(10, '');
INSERT INTO dec_seats VALUES(11, '');
INSERT INTO dec_seats VALUES(12, 'Reserved');
INSERT INTO dec_seats VALUES(13, 'Reserved');
INSERT INTO dec_seats VALUES(14, '');
INSERT INTO dec_seats VALUES(15, '');
seat | status |
---|---|
1 | Reserved |
2 | Reserved |
3 | |
4 | |
5 | |
6 | Reserved |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | Reserved |
13 | Reserved |
14 | |
15 |
要求:有一个人想要预订3个连续的座位,找出表中所有的可选项。
SELECT a.seat, '~', b.seat
FROM dec_seats a, dec_seats b
WHERE b.seat = a.seat + 2
AND NOT EXISTS (
SELECT * FROM dec_seats c
WHERE c.seat between a.seat AND b.seat
AND c.status <> ''
);
--Another way:
SELECT seat, '~', end_seat
FROM (SELECT seat, MAX(seat) OVER (
ORDER BY seat rows between 2 following AND 2 following) AS end_seat
FROM dec_seats
WHERE status = ''
)t1
WHERE end_seat = seat + 2;
[例21]使用维度表(Dimension)的信息来更新事实表(Fact)中关联信息,这种需求非常常见
DROP TABLE IF EXISTS dec_product_sale;
CREATE TABLE dec_product_sale (
sale_id INT PRIMARY KEY AUTO_INCREMENT,
product_id char(4),
product_name varchar(32),
product_type varchar(16),
quantity decimal
);
INSERT INTO dec_product_sale(product_id, quantity) values('0001', 30);
INSERT INTO dec_product_sale(product_id, quantity) values('0001', 35);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 10);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 12);
INSERT INTO dec_product_sale(product_id, quantity) values('0003', 11);
INSERT INTO dec_product_sale(product_id, quantity) values('0005', 10);
INSERT INTO dec_product_sale(product_id, quantity) values('0008', 510);
INSERT INTO dec_product_sale(product_id, quantity) values('0008', 1020);
sale_id | product_id | product_name | product_type | quantity |
---|---|---|---|---|
1 | 0001 | null | null | 30 |
2 | 0001 | null | null | 35 |
3 | 0003 | null | null | 10 |
4 | 0003 | null | null | 12 |
5 | 0003 | null | null | 11 |
6 | 0005 | null | null | 10 |
7 | 0008 | null | null | 510 |
8 | 0008 | null | null | 1020 |
要求:这是我们定义的又一个产品销售表,结合之前定义的dec_products表,更新dec_product_sale表中的product_name和product_type字段。
下面的SQL在某些数据库(比如redshift)上是work的,但MySQL 目前不支持:
UPDATE dec_product_sale s
SET s.product_name = p.product_name, s.product_type = p.product_type
FROM dec_products p
WHERE s.product_id = p.product_id;
目前在MySQL上可以使用JOIN来联合2个表,并更新:
UPDATE dec_product_sale s JOIN dec_products p
ON s.product_id = p.product_id
SET s.product_name = p.product_name, s.product_type = p.product_type;
[例22]查询随机数和UNID
UNID在数据库中非常重要,并且如果使用并行方式处理数据,UNID方式几乎就是唯一可以使用的标记主ID的方式了。这也是在文档型数据库经常使用UNID而不是自增ID的原因。
Mysql语法:
SELECT UUID() as uuid, RAND() as rand, SUBSTRING(CAST(RAND() AS CHAR), 3) AS rand_num;
SELECT UUID() as uuid from information_schema.tables limit 100;
uuid | rand | rand_num |
---|---|---|
85f98c2f-7e15-11ee-ab6a-9c2dcd25c19e | 0.4829847072092674 | 8978415288942322 |
RedShift居然没有UUID函数,但是有Random(),可以这样(这是一个伪随机的,在大规模应用的时候不能保证不重复):
SELECT LEFT(md5(random()::text), 32) AS uuid,
RANDOM() AS rand
uuid | rand |
---|---|
0cc175b9c0f1b6a831c399e269772661 | 0.7800906673978432 |
[例23]有一个职员表t_employee, 有字段employee_id和employee_name等
要求:查找这些人的姓中,只有一个的那些人,可以把name字段的第一个字认为是这个人的姓。
SELECT employee_id, employee_name
FROM t_employee
INNER JOIN (
SELECT SUBSTRING(employee_name, 1, 1) AS family_name
FROM t_employee
GROUP BY SUBSTRING(employee_name, 1, 1)
HAVING COUNT(*) = 1
) alt1
ON alt1.family_name = SUBSTRING(t_employee.employee_name, 1, 1)
[例24]下面是一个员工表和部门表
DROP TABLE IF EXISTS dec_employee;
CREATE TABLE dec_employee (
account_key INT,
report_line char(8),
project varchar(16)
);
INSERT into dec_employee VALUES(1, 'ta', 'NorthEast1');
INSERT into dec_employee VALUES(2, 'ta', '');
INSERT into dec_employee VALUES(3, 'ta', '');
INSERT into dec_employee VALUES(4, 'ce', 'SouthEast1');
INSERT into dec_employee VALUES(5, 'ce', '');
INSERT into dec_employee VALUES(6, 'ce', 'South');
INSERT into dec_employee VALUES(7, 'ta', 'North');
INSERT into dec_employee VALUES(8, 'dt', '');
INSERT into dec_employee VALUES(9, 'dt', 'South');
DROP TABLE IF EXISTS dec_department;
CREATE TABLE dec_department (
account_key INT,
department_key INT
);
INSERT INTO dec_department VALUES(1, 777);
INSERT INTO dec_department VALUES(2, 777);
INSERT INTO dec_department VALUES(3, 655);
INSERT INTO dec_department VALUES(4, 777);
INSERT INTO dec_department VALUES(5, 705);
INSERT INTO dec_department VALUES(6, 655);
INSERT INTO dec_department VALUES(7, 777);
表dec_employee
account_key | report_line | project |
---|---|---|
1 | ta | NorthEast1 |
2 | ta | |
3 | ta | |
4 | ce | SouthEast1 |
5 | ce | |
6 | ce | South |
7 | ta | North |
8 | dt | |
9 | dt | South |
表dec_department
account_key | department_key |
---|---|
1 | 777 |
2 | 777 |
3 | 655 |
4 | 777 |
5 | 705 |
6 | 655 |
7 | 777 |
要求:对于员工表中project为空的,使用如下规则填充:如果employee表中有跟他一样report_line的,以及在department表中跟他一样department的,并且有project的,就用这个project;如果找到多条,那就可以使用任意一个。
对于上面的数据,期望结果是:
2 填充为 NorthEast1 或 North
3 保持为空
5 保持为空
8 保持为空
这是一个在ChatGPT帮助下完成的答案,比较复杂;就让它作为这篇漫长博客的结尾吧。
UPDATE dec_employee main
SET project = COALESCE((
SELECT emp.project FROM (
SELECT account_key, report_line, project
FROM dec_employee
) emp
INNER JOIN dec_department dept0 ON dept0.account_key = emp.account_key
INNER JOIN dec_department dept1 ON dept1.account_key = main.account_key
WHERE emp.project <> ''
AND emp.report_line = main.report_line
AND dept0.department_key = dept1.department_key
AND emp.account_key <> main.account_key
LIMIT 1), ''
)
WHERE project = '';