SQL window functions are calculation functions based on individual rows that create a new column that contains the new calculation. They are similar to aggregate functions but are different in that normal aggregate functions like “group by,” will provide a single number as the output. As mentioned earlier, with a window function, the results are placed in a new column for each row.
Window functions allow users to perform many analytical tasks, from ranking and partitioning data to calculating moving averages, cumulative sums, and differences between consecutive rows. Again, what is important here is that widow functions can be used to calculate values between rows.
Basic Example
In this first example, we will figure out how many customers we have from Texas and California and put this in a new column. Below is the code followed by the output.
SELECT first_name, last_name,state ,
COUNT(*) OVER () as total_customers
FROM customers
WHERE state in ('CA','TX')
ORDER BY customer_id;
In the select statement, we pull first_name, last_name, and state, and we then have our window function. In this window function, we are counting the number of customers in the customer table who are from CA and TX. The OVER() function is used to define the window. Since it is blank it is telling SQL that the entire table is the window. This will not make sense right now but is critical information for future examples. Lastly, we are ordering the data by customer_id.
The output indicates that 9,903 customers are from CA or TX. We can confirm this by running a separate line of code.
SELECT COUNT(*)
FROM customers
where state in('CA','TX')
The output from the window function is repeated in every row called total_customers. Repeating this information doesn’t make sense but it shows us what the window function does. For example, in row 1, the function sees that this person is from TX and then outputs that the total number of customers is 9,903 for somebody from TX or CA. This is a basic example of what window functions can do. Of course, there are things much more insightful than this that can be calculated.
Intermediate Example
We are now going to run mostly the same code with one slight difference. We want to know not just how many total customers there are but how many are from TX and how many are from CA. To do this we will have to use PARTITION BY which is the group by clause for window functions.
SELECT customer_id, first_name, last_name, state,
COUNT(*) OVER (PARTITION BY state) as total_customers
FROM customers
where state in ('CA','TX')
ORDER BY customer_id;
In the output, we have all of the same rows from the previous SELECT clause and the total_customers columns are different. When a person is from TX this column shows how many people are from TX but when a person is from CA it shows how many people are from CA. If you add up the number of people from TX and CA you get the following
4,865 + 5,038 = 9,903
This is the same amount as the total number of customers in our previous example. The PARTITION BY clause breaks the number of customers into two groups, those from TX and those from CA, and assigns the appropriate value based on where the customer in that row is from.
Advanced Examples
The next example will involve using the SUM aggregation function. We are going to add customer_id in a new column. This will be a running total. In other words, SQL will keep adding the customer_id until they get through all of the data. Below is the code followed by the output and explanation.
SELECT customer_id, title, first_name, last_name, state,
Sum(customer_id) OVER (ORDER BY customer_id) as total_customers
FROM customers
where state in('CA','TX')
ORDER BY customer_id;
Here is what is happening. In the total_customer column, a running total of customer_id is being created. For example, row 1 has the value 10 because that is the first customer id of someone from TX. Row 2 has a customer ID of 13. When you add 10 + 13 = 23 which is the value in row 2 of total_customers. This continues for the rest of the table.
Here is another example this time with the RANK() function. The RANK() function allows you to create a new column that ranks the data based on a criteria. For our example, we will rank the data based on their customer_id with the lower the id number the higher the ranking. To make this even more interesting, we will partition the data by state so that the lowest value customer_id will be number 1 for TX and the lowest ranking customer_id will be number 1 for CA. Below is the code
SELECT customer_id, title, first_name, last_name, state,
rank () OVER (PARTITION BY state ORDER BY customer_id) as total_customers
FROM customers
where state in ('CA','TX')
ORDER BY customer_id;
Row number 1 is rank 1 because it is the lowest value customer_id of all TX. Row 2 is also ranked 1 because it is the lowest value customer_id of all CA.
Conclusion
The possibilities are almost endless with window functions. These tools allow you to get into the data and find insightful answers to complex questions. The examples here are only scratching the surface of what can be achieved.
