When viewing data, we often run into duplicate entries. For example, we may want to query addresses, and we should expect to see multiple addresses with the same country. If we would like to see a unique list of countries, states, etc we can use the DISTINCT
clause. In this article, we will learn how to use DISTINCT in PostgreSQL.
The basic syntax of DISTINCT is as follows:
SELECT DISTINCT
[column_names]
FROM
[table_name]
Here we can type a single or multiple columns.
For our setup, we will use docker compose to create a Postgres database and to connect phpmyadmin. Start by copying the following into a docker compose file called docker-compose.yml
version: '3'
services:
db:
image: 'postgres:latest'
ports:
- 5432:5432
environment:
POSTGRES_USER: username
POSTGRES_PASSWORD: password
POSTGRES_DB: default_database
volumes:
- psqldata:/var/lib/postgresql
phpmyadmin:
image: phpmyadmin/phpmyadmin
links:
- db
environment:
PMA_HOST: db
PMA_PORT: 3306
PMA_ARBITRARY: 1
restart: always
ports:
- 8081:80
volumes:
psqldata:
We can run this file, we can use docker-compose up
. One this is done, open up phpmyadmin by going to http://localhost:8081.
You can then login by leaving the host empty and using the following credentials.
POSTGRES_USER: username
POSTGRES_PASSWORD: password
In this article, we will need some data to work with. If you don't understand these commands, don't worry, we will cover them in later articles.
We will be using the sample db provided here: https://dev.Postgres.com/doc/sakila/en/. However, we will only enter what we need rather than import the whole db.
Next, let's create an actor
table.
CREATE TABLE actor (
actor_id smallint,
first_name VARCHAR(45) NOT NULL,
last_name VARCHAR(45) NOT NULL,
last_update TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (actor_id)
) ;
And finally, let's enter a few rows.
INSERT INTO actor VALUES
(1,'PENELOPE','GUINESS','2006-02-15 04:34:33'),
(2,'NICK','WAHLBERG','2006-02-15 04:34:33'),
(3,'ED','CHASE','2006-02-15 04:34:33'),
(4,'JENNIFER','DAVIS','2006-02-15 04:34:33'),
(5,'JOHNNY','LOLLOBRIGIDA','2006-02-15 04:34:33'),
(6,'BETTE','NICHOLSON','2006-02-15 04:34:33'),
(7,'GRACE','MOSTEL','2006-02-15 04:34:33'),
(8,'MATTHEW','JOHANSSON','2006-02-15 04:34:33')
Let’s start by selecting one column. Here we can see many duplicates.
SELECT last_name FROM actor;
last_name |
---|
CHASE |
DAVIS |
GUINESS |
JOHANSSON |
JOHANSSON |
LOLLOBRIGIDA |
MOSTEL |
NICHOLSON |
WAHLBERG |
Now, let’s do the same but use the DISTINCT clause.
SELECT DISTINCT last_name FROM actor;
last_name |
---|
CHASE |
DAVIS |
GUINESS |
JOHANSSON |
LOLLOBRIGIDA |
MOSTEL |
NICHOLSON |
WAHLBERG |
Now we have a unique list.
Before we found the unique of one column, however, we can always use multiple combinations to select unique combinations.
For example, let’s say we want to select the unique city and state combinations. We will see repeat states, but not city and state combinations.
Without distinct here is what we get.
SELECT
state, city
FROM
customers
ORDER BY
state,
city;
+---------------+----------------+ | state | city | +---------------+----------------+ | BC | Tsawassen | | BC | Vancouver | | CA | Brisbane | | CA | Burbank | .. | CA | San Francisco | | CA | San Francisco | ... | MA | Boston | | MA | Boston | | MA | Brickhaven | | MA | Brickhaven | | MA | Brickhaven | ... | NY | NYC | | NY | NYC | | NY | NYC | | NY | NYC | | NY | NYC | ...
Now, with DISTINCT.
SELECT DISTINCT
state, city
FROM
customers
ORDER BY
state,
city;
+---------------+----------------+ | state | city | +---------------+----------------+ | BC | Tsawassen | | BC | Vancouver | | CA | Brisbane | | CA | Burbank | | CA | Burlingame | | CA | Glendale | | CA | Los Angeles | | CA | Pasadena | | CA | San Diego | ...