Intro to SQL

SQL (structured query language) is a programming language used with databases. A database is essentially a collection of tables that all contain information in a text or numerical form. SQL can be used for developing and maintaining databases for various purposes. Our goal is not to learn how to become database administrators. Instead, we are going to focus on using SQL for data analysis.

The data used in this example is NBA Salary and Statistics 2016-17 (SQLite version), a database file available at Kaggle. This database contains information on basketball statistics from 1950-2017. There is more that could be said, but for simplicity, we will learn as we go along.

The software used to explore the database in these examples is DBeaver, a free, open-source software database manager. You can run SQL from a terminal, but it is a little bit easier for beginners to have a GUI interface most of the time. If you want to use this database, you can download it and connect this database to whatever database manager you are using.

In this post, we will focus on using the following commands.

  • SELECT
  • FROM
  • LIMIT
  • ORDER BY

SELECT and FROM

The SELECT and FROM commands almost always work together at the beginning of an SQL query. The SELECT command tables which column(s) you want SQL to pull from a table. The FROM command indicates which table you want to pull data from. Below is code and a screenshot of using these two commands together to pull data from the database.

SELECT * 

FROM Seasons_Stats

This is what the output looks like in DBeaver. What you see is different. The asterisk after SELECT tells SQL to pull all columns in the table. You can also see a combination of text and numeric data in the columns we can see.

LIMIT

The next command that we will learn is the LIMIT command. This command limits how many rows of data are returned for an output. The code and output are below.

SELECT * 

FROM Seasons_Stats

LIMIT 5

By using the LIMIT command, only five rows of data were pulled. How many rows you want is up to you, but this example was set to 5.

ORDER BY

The ORDER BY command allows you to sort the data from highest to lowest or vice versa based on one or more columns in the table. In the example below, we will sort the data by the age column descending to determine who are some of the oldest players to play in the NBA since 1950, which is the first year data is available in this data set. Also, note that only the first five rows are listed because of the LIMIT command. The code and output is below.

SELECT * 

FROM Seasons_Stats 

order by Age DESC 

LIMIT 5

From the output, it appears that Kevin Willis is the oldest player to play in the NBA since 1950. The DESC argument next to ORDER BY sorts the results from highest to lowest.

Conclusion

There is much more to learn about SQL than what was shared here. However, the most important thing to know whenever doing a data analyst is to make sure you know what you want to know. This involves developing straightforward questions that can be answered with available tools.

Leave a Reply