Tracking unique visits to a page or user vists is a common requirement for business applications. Doing this with large volumes can be very difficult as the data requirements are high. Thus, we have the HyperLogLog data structure that can solve this problem, although it does only provide an approximation. This approximation is usually good enough in practice. In this article, we will learn how to use HyperLogLog in Redis with Nodejs.
For setting up Redis, I would recommend using a service for you in prod. Azure for example, has a great redis service that scales easily. However, you will want to learn redis and eventually how to scale it yourself. This will help with debugging cloud services or eventually, saving money and not using them.
We will start our intro to redis via using docker compose. Create a docker-compose.yml
file and add the following.
version: "3.2"
services:
redis:
image: "redis:alpine"
command: redis-server
ports:
- "6379:6379"
volumes:
- $PWD/redis-data:/var/lib/redis
- $PWD/redis.conf:/usr/local/etc/redis/redis.conf
environment:
- REDIS_REPLICATION_MODE=master
Ensure you have docker installed and run
docker-compose up
There are two modules I see often used in nodejs. I will tend towards
ioredis
as it has built in support for promises and many other features in redis.
npm install ioredis
Let's open up a new file, index.js
and go through many of
the common commands you will used with lists in redis.
We can add items to a HyperLogLog data type using the pfadd
function. We specify the name of the key, in this case "users", then we pass in the members.
// PF Add
await redis.pfadd("users", "user1", "user2")
For each of the examples below, I will use the following template to run all the commands. Here is my full index file. We will just replace the commands each time.
const Redis = require("ioredis")
const redis = new Redis({})
async function main() {
// PF Add
await redis.pfadd("users", "user1", "user2")
}
(async () => {
await main()
})()
Once we have a hyperloglog set, we can now get the count. HyperLogLog uses an approximation to help with a high volume, so the count wont always be exact. To get this count, pfcount
is the method to use.
// PF Count
const result = await redis.pfcount("users")
console.log(result) // 2
The last method for hyperloglog will be the merge command. We can use pfmerge
and pass in the name of two hyperloglog data sets to combine them. In the below example, you can see the we have two sets and after merged, only the unique members are counted.
// PF Merge
await redis.pfadd("users-app1", "user1", "user2")
await redis.pfadd("users-app2", "user1", "user3")
await redis.pfmerge("users-new", "users-app1", "users-app2")
const result = await redis.pfcount("users-new")
console.log(result) // 3