This post will focus on several SQL commands that are often used for aggregation purposes. These commands are
- GROUP BY
The data that will be used as examples is the Seasons_Stats data from Kaggle which is a database of NBA statistics.
The DISTINCT command allows you to find unique items in a column. As an example, we want to find all of the unique positions in our database. Below is the code and output to do this.
SELECT DISTINCT Pos from Seasons_Stats
In this dataset we can see that there are more than 5 positions in basketball because some of the players were coded as playing more than one position
The GROUP BY command allows use to aggregate data by a group or a category in a categorical variable. In the example below we sum the points scored by position and we sort them by descending order. The code and output is below.
SELECT SUM(PTS), Pos from Seasons_Stats Group BY Pos Order by SUM(PTS) DESC
We can see that shooting guards are the most prolific scorers followed by small forwards. By aggregating the points and then grouping them by position we are able to achieve a more nuance insight into our data.
The HAVING command is a filter command similar to the WHERE command but the HAIVNG command is used in the context of aggregation. In the example below, we want to filter our previous code so that we only see positions that scored more than 2 million points. If we use a WHERE command we will get an error message as shown below.
SELECT SUM(PTS), Pos from Seasons_Stats WHERE SUM(PTS) >= 2000000 Group BY Pos Order by SUM(PTS) DESC
The WHERE command cannot deal with aggregate data. Below is the same code but with the using of the HAVING command.
SELECT SUM(PTS), Pos FROM Seasons_Stats GROUP BY Pos HAVING SUM(PTS) >= 2000000 ORDER BY SUM(PTS) DESC
By adding the HAVING command and stating we only wanted positions that scored more than 2 million points everything now works.
Hopefully, after learning about this commands, aggregating data is a little easier in SQL. There is always more than one way to do anything when using code. However, at least one way has been explained to deal with the problems that were addressed here.