每隔一段时间,我都会遇到一个神秘的NATURAL JOIN SQL操作符的用例,当我能把它变成NATURAL FULL JOIN ,我就更高兴了。曩昔的几篇关于这个主题的博文包含:

  • 在SQL中运用NATURAL FULL JOIN来比较两个表
  • 用不可思议的NATURAL FULL OUTER JOIN打动你的搭档!

最近,我在Reddit上偶然发现了一个问题:

有没有这样一种东西,能够连接两个彻底不同的表,没有任何关系,但却像一个联盟一样?

起初我想到了UNION CORRESPONDING 语法,这在大多数SQL方言中并不存在,即便它是一个规范功用。但后来,我想起这又是一个关于NATURAL FULL JOIN 的完美用例,这次与上面的比如稍有不同,即两个表的内容要进行比较。这一次,咱们要保证两个连接的表永远没有匹配的行,以获得类似UNION 的行为。

考虑一下Sakila数据库,在该数据库中,咱们有3个包含人的表,包含:

  • ACTOR
  • CUSTOMER
  • STAFF

这些表的界说如下:

CREATE TABLE actor (
  actor_id integer NOT NULL PRIMARY KEY,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  last_update timestamp NOT NULL
);
CREATE TABLE customer (
  customer_id integer NOT NULL PRIMARY KEY,
  store_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  email varchar(50),
  address_id integer NOT NULL,
  active boolean NOT NULL,
  create_date date NOT NULL,
  last_update timestamp
);
CREATE TABLE staff (
  staff_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  address_id integer NOT NULL,
  email varchar(50),
  store_id integer NOT NULL,
  active boolean NOT NULL,
  username varchar(16) NOT NULL,
  password varchar(40),
  last_update timestamp NOT NULL,
  picture bytea
);

能够看出,常见的列实际上只要(FIRST_NAME, LAST_NAME, LAST_UPDATE) ,一切其他的列都是针对表的。运用下面的查询,咱们能够将一切的数据连接起来:

SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;

结果看起来是这样的:

|source  |first_name|last_name|last_update            |actor_id|...|customer_id|...|staff_id|...|
|--------|----------|---------|-----------------------|--------|---|-----------|---|--------|---|
|actor   |PENELOPE  |GUINESS  |2006-02-15 04:34:33.000|1       |...|           |   |        |   |
|actor   |NICK      |WAHLBERG |2006-02-15 04:34:33.000|2       |...|           |   |        |   |
|actor   |ED        |CHASE    |2006-02-15 04:34:33.000|3       |...|           |   |        |   |
|customer|MARY      |SMITH    |2006-02-15 04:57:20.000|        |   |1          |...|        |   |
|customer|PATRICIA  |JOHNSON  |2006-02-15 04:57:20.000|        |   |2          |...|        |   |
|customer|LINDA     |WILLIAMS |2006-02-15 04:57:20.000|        |   |3          |...|        |   |
|staff   |Mike      |Hillyer  |2006-02-15 04:57:16.000|        |   |           |   |1       |...|
|staff   |Jon       |Stephens |2006-02-15 04:57:16.000|        |   |           |   |2       |...|

一些观察结果:

  • NATURAL JOIN 的匹配列(即具有相同称号的列)位于开头。它们包含组成的SOURCE 列,这对每个连接源来说都是不同的,所以咱们从来没有匹配过,这是咱们想要的。咱们想要UNION 语义(即连接3个表),而不是匹配它们。
  • 之后列出了每个表所特有的列。它们只包含属于相关的数据。SOURCE

这种技能显然不适合每天运用,但偶然也会有用。所以,不要轻视了 “数据 “的力气。NATURAL FULL JOIN

注意事项

我作了点小弊,实际的Sakila数据库形式在CUSTOMERSTAFF 表之间有一个冲突:

CREATE TABLE customer (
  customer_id integer NOT NULL PRIMARY KEY,
  store_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  email varchar(50),
  address_id integer NOT NULL,
  activebool boolean NOT NULL,
  create_date date NOT NULL,
  last_update timestamp
  active integer NOT NULL, -- This is an integer
);

有了这个表的界说,我的查询就出现了这个过错:

SQL过错[42804]。ERROR:JOIN/USING类型整数和布尔值不能匹配

因此,为了处理这个问题,我修补了CUSTOMER 表的界说:

-- Patch
WITH customer AS (
  SELECT 
    customer_id, 
    store_id, 
    first_name, 
    last_name, 
    email, 
    address_id, 
    activebool as active, 
    create_date, 
    last_update 
  FROM customer
)
-- Original query
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;

期望BigQuery的有用 [* REPLACE (...)](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace)语法能够更广泛地运用。