Working with Aggregate Functions in Postgres

04.07.2022

Intro

Postgresql aggregate functions allow us to summarize groups of rows. For example, we can count the number of sales, get the average revenue, and more. In this article we will learn how to use aggregate functions in Postgresql.

The Syntax

The basic syntax of a Upper is as follows:

SELECT upper(string|column);

Getting Setup

We will be using docker in this article, but feel free to install your database locally instead. Once you have docker installed, create a new file called docker-compose.yml and add the following.

version: '3'
 
services:
  db:
    image: 'postgres:latest'
    ports:
      - 5432:5432
    environment:
      POSTGRES_USER: username
      POSTGRES_PASSWORD: password
      POSTGRES_DB: default_database
    volumes:
      - psqldata:/var/lib/postgresql

  phpmyadmin:
    image: phpmyadmin/phpmyadmin
    links:
      - db
    environment:
      PMA_HOST: db
      PMA_PORT: 3306
      PMA_ARBITRARY: 1
    restart: always
    ports:
      - 8081:80

volumes:
  psqldata:

Next, run docker-compose up.

Now, navigate to http://localhost:8081/ to access phpMyAdmin. Then log in with the username root and pass root_pass.

Click the SQL tab and you are ready to go.

Creating a DB

In this article, we will need some data to work with. If you don't understand these commands, don't worry, we will cover them in later articles.

We will be using the sample db provided here: https://dev.mysql.com/doc/sakila/en/. However, we will only enter what we need rather than import the whole db.

Next, let's create an film table. This is a slightly simplified version of the sakila database.

CREATE TABLE employees (
    emp_no      INT             NOT NULL,
    birth_date  DATE            NOT NULL,
    first_name  VARCHAR(14)     NOT NULL,
    last_name   VARCHAR(16)     NOT NULL,
    gender      VARCHAR(1),
    hire_date   DATE            NOT NULL,
    PRIMARY KEY (emp_no)
);
CREATE TABLE salaries (
    emp_no      INT             NOT NULL,
    salary      INT             NOT NULL,
    from_date   DATE            NOT NULL,
    to_date     DATE            NOT NULL,
    FOREIGN KEY (emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE,
    PRIMARY KEY (emp_no, from_date)
);

Now, let's enter a few rows

INSERT INTO employees VALUES (10001,'1953-09-02','Georgi','Facello','M','1986-06-26'),
(10002,'1964-06-02','Bezalel','Simmel','F','1985-11-21'),
(10003,'1959-12-03','Parto','Bamford','M','1986-08-28'),
(10004,'1954-05-01','Chirstian','Koblick','M','1986-12-01'),
(10005,'1955-01-21','Kyoichi','Maliniak','M','1989-09-12');
INSERT INTO salaries VALUES 
(10001,60117,'1986-06-26','1987-06-26'),
(10002,62102,'1987-06-26','1988-06-25'),
(10003,66074,'1988-06-25','1989-06-25'),
(10004,66596,'1989-06-25','1990-06-25'),
(10005,66961,'1990-06-25','1991-06-25');

The Basics

Here is a quick overview of the basic aggregate functions that are used often in postgresql.

  • AVG(): This function returns the average value.
  • COUNT(): This function returns the number of values.
  • MAX(): This function returns the maximum value.
  • MIN(): This function returns the minimum value.
  • SUM(): This function returns the sum of values.

Count

Our first example will count the number of rows in our employee table. We can use COUNT(*) in our select to count all rows.

select 
	count(*)
from employees e;
count
5

Now, let's say we want to see how many employees share the same birthday. We can use our count function with the birth_date in the select. Then, we can group our rows by birth date.

select 
	count(*), birth_date 
from employees e
group by birth_date;

We see in our data set, none of the employees share the same birthday as each row of our result table has a count of 1.

count birth_date
1 1953-09-02
1 1964-06-02
1 1955-01-21
1 1959-12-03
1 1954-05-01

Average and Sum

Let's see more examples by joining the salary table. We start with the basic join, no aggregation.

select
	*
from employees e 
join salaries s on s.emp_no = e.emp_no;
emp_no birth_date first_name last_name gender hire_date emp_no salary from_date to_date
10001 1953-09-02 Georgi Facello M 1986-06-26 10001 60117 1986-06-26 1987-06-26
10002 1964-06-02 Bezalel Simmel F 1985-11-21 10002 62102 1987-06-26 1988-06-25
10003 1959-12-03 Parto Bamford M 1986-08-28 10003 66074 1988-06-25 1989-06-25
10004 1954-05-01 Chirstian Koblick M 1986-12-01 10004 66596 1989-06-25 1990-06-25
10005 1955-01-21 Kyoichi Maliniak M 1989-09-12 10005 66961 1990-06-25 1991-06-25

Let's now take the average of al the employees salary. To do this, we will use the avg function and pass in the salary column.

select
	avg(salary)
from employees e 
join salaries s on s.emp_no = e.emp_no;

The result is that our average salary is $64,370.

avg
64370.000000000000

Let's also sum the salaries to see the total amount paid to our employees.

select
	sum(salary)
from employees e 
join salaries s on s.emp_no = e.emp_no;
sum
321850

Min and Max

In our final example, let's see that we don't have to use only one aggregate function per a select. We can combine them. Let's select the min and the max salary to see what we pay on the high and low end.

select
	min(salary), max(salary)
from employees e 
join salaries s on s.emp_no = e.emp_no;
min max
60117 66961