Working with GroupBy in Postgres

Intro

The Group By clause allows us to summarize multiple rows into a less or even a single row. For example, if we want to count the number of people with the same last name or sum the number of orders in a day, we can use the Group By clause. In this article, we will learn how to use the Group By claus in PostgreSQL.

The Syntax

The basic syntax of using a Group By is as follows:

SELECT 
  [columns]
FROM
  [table]
WHERE
  [conditions]
GROUP BY [columns];

Getting Setup

For our setup, we will use docker compose to create a Postgres database and to connect phpmyadmin. Start by copying the following into a docker compose file called docker-compose.yml

version: '3'
 
services:
  db:
    image: 'postgres:latest'
    ports:
      - 5432:5432
    environment:
      POSTGRES_USER: username
      POSTGRES_PASSWORD: password
      POSTGRES_DB: default_database
    volumes:
      - psqldata:/var/lib/postgresql

  phpmyadmin:
    image: phpmyadmin/phpmyadmin
    links:
      - db
    environment:
      PMA_HOST: db
      PMA_PORT: 3306
      PMA_ARBITRARY: 1
    restart: always
    ports:
      - 8081:80

volumes:
  psqldata:

We can run this file, we can use docker-compose up. One this is done, open up phpmyadmin by going to http://localhost:8081.

You can then login by leaving the host empty and using the following credentials.

POSTGRES_USER: username
POSTGRES_PASSWORD: password

Creating a DB

In this article, we will need some data to work with. If you don't understand these commands, don't worry, we will cover them in later articles.

We will be using the sample db provided here: https://dev.Postgres.com/doc/sakila/en/. However, we will only enter what we need rather than import the whole db.

Next, let's create an actor table.

CREATE TABLE actor (
  actor_id smallint,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  last_update TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id)
) ;

And finally, let's enter a few rows.

INSERT INTO actor VALUES 
(1,'PENELOPE','GUINESS','2006-02-15 04:34:33'),
(2,'NICK','WAHLBERG','2006-02-15 04:34:33'),
(3,'ED','CHASE','2006-02-15 04:34:33'),
(4,'JENNIFER','DAVIS','2006-02-15 04:34:33'),
(5,'JOHNNY','LOLLOBRIGIDA','2006-02-15 04:34:33'),
(6,'BETTE','NICHOLSON','2006-02-15 04:34:33'),
(7,'GRACE','MOSTEL','2006-02-15 04:34:33'),
(8,'MATTHEW','JOHANSSON','2006-02-15 04:34:33'),
(9,'JOAN','JOHANSSON','2006-02-15 04:34:33')

Group By Example

In our first example, we will use group by to group all actors by their last name. We wont use any aggregate function, such as COUNT. This results in giving us a distinct list of names, similar to the DISTINCT clause.

SELECT
    last_name AS LastName
FROM
    actor
GROUP BY LastName;

LastName
CHASE
DAVIS
GUINESS
JOHANSSON
LOLLOBRIGIDA
MOSTEL
NICHOLSON
WAHLBERG

Using Aggregate Functions

Next, we will do the same command, but will add the COUNT(*) clause to the column list. The * will just infer to count the groups. This should return a list of actor last names and the count of each.

SELECT
    last_name AS LastName,
    COUNT(*)
FROM
    actor
GROUP BY LastName;

LastName	COUNT(*)
	1
CHASE	1
DAVIS	1
GUINESS	1
JOHANSSON	2
LOLLOBRIGIDA	1
MOSTEL	1
NICHOLSON	1
WAHLBERG	1

Using the Having Clause

Often, we will want to filter our groups. We can no longer use the WHERE clause as that works on the initial rows before grouping. We can use the HAVING clause to filter groups.


SELECT
    last_name AS LastName,
    COUNT(*) AS LastNameCount
FROM
    actor
GROUP BY LastName
HAVING LastNameCount > 1;

LastName	LastNameCount
JOHANSSON	2