Filtering commands in SQL
This post will explore the use and application of the WHERE, LIKE, and IN commands in SQL.
The WHERE command is generally used as a way to filter data in SQL. The database we are using in this post contains data on basketball players from 1950-2017. What we want to do is filter the data so that we only see data from players who played in 2017. In the example below, we will filter our basketball players by year be set to 2017
SELECT * FROM Seasons_Stats WHERE year = 2017
If you look at the year column you can clearly see that the value is set to 2017 just as we wanted it.
The WHERE command is not limited to numbers as it can also be used with text. In the example below, we filter our data with the WHERE command so that we only see players who played for the Golden State Warriors (GSW).
SELECT * FROM Seasons_Stats WHERE Tm = 'GSW'
If you look closely at the “Tm” column you will only see the initials for the Golden State Warriors.
The WHERE command is often teamed with the LIKE command when you are looking for a text but are not sure of an exact match. You can specify a pattern you are looking for similar to regular expressions. In the code below we use the LIKE command with the WHERE command searching for any player whose name begins with L.
SELECT * FROM Seasons_Stats WHERE Player LIKE 'L%'
The percentage sign (%) after the letter L tells SQL that anything can be after the letter L in the search and meet the criteria. We can also put the % at the end of or text or put one on both sides are some other combination as shown in the examples below.
In the example below, the last letter must be an L. After this example, is one in which an L must appear anywhere in the name.
SELECT * FROM Seasons_Stats WHERE Player LIKE '%L'
SELECT * FROM Seasons_Stats WHERE Player LIKE '%L%'
The IN command allows you to filter your data based on several values. This command is also combined with the WHERE command. In the example below, we filter our data so that we only see players whose position is small forward or shooting guard.
SELECT * FROM Seasons_Stats WHERE Pos in ('SG', 'SF')
You could also insert numerical values in the parentheses when using the IN command.
The command shared in this post provide more information on basic tools you can use and apply in SQL. Filtering using the WHERE, LIKE, and IN commands get helpo you to focus the breath of your research to find answers to your questions.
Intro to SQL
SQL (structured query language) is a programming language used with databases. A database is essentially a collection of tables that all contain information in a text or numerical form. SQL can be used for developing and maintaining databases for various purposes. Our goal is not to learn how to become database administrators. Instead, we are going to focus on using SQL for data analysis.
The data used in this example is NBA Salary and Statistics 2016-17 (SQLite version), a database file available at Kaggle. This database contains information on basketball statistics from 1950-2017. There is more that could be said, but for simplicity, we will learn as we go along.
The software used to explore the database in these examples is DBeaver, a free, open-source software database manager. You can run SQL from a terminal, but it is a little bit easier for beginners to have a GUI interface most of the time. If you want to use this database, you can download it and connect this database to whatever database manager you are using.
In this post, we will focus on using the following commands.
- ORDER BY
SELECT and FROM
The SELECT and FROM commands almost always work together at the beginning of an SQL query. The SELECT command tables which column(s) you want SQL to pull from a table. The FROM command indicates which table you want to pull data from. Below is code and a screenshot of using these two commands together to pull data from the database.
This is what the output looks like in DBeaver. What you see is different. The asterisk after SELECT tells SQL to pull all columns in the table. You can also see a combination of text and numeric data in the columns we can see.
The next command that we will learn is the LIMIT command. This command limits how many rows of data are returned for an output. The code and output are below.
By using the LIMIT command, only five rows of data were pulled. How many rows you want is up to you, but this example was set to 5.
The ORDER BY command allows you to sort the data from highest to lowest or vice versa based on one or more columns in the table. In the example below, we will sort the data by the age column descending to determine who are some of the oldest players to play in the NBA since 1950, which is the first year data is available in this data set. Also, note that only the first five rows are listed because of the LIMIT command. The code and output is below.
order by Age DESC
From the output, it appears that Kevin Willis is the oldest player to play in the NBA since 1950. The DESC argument next to ORDER BY sorts the results from highest to lowest.
There is much more to learn about SQL than what was shared here. However, the most important thing to know whenever doing a data analyst is to make sure you know what you want to know. This involves developing straightforward questions that can be answered with available tools.