Monthly Archives: April 2025

Subqueries in SQL VIDEO

Subqueries are used in SQL to accomplish specific goals. In the video below we will discuss the use of subqueries.

ad

Windows Functions in SQL VIDEO

The video below explains the use of window functions in SQL. These functions allow you to calculate various values based on the criteria in that specific row. This can lead to some interesting insights depending on the situation.

ad
dollar-currency-money-us-dollar-47344.jpeg

Cost-Effectiveness Analysis

The purpose of a cost-effectiveness analysis is to determine the relationship between the benefits and expenses of a program. Naturally, there are many different ways to do this but there are some common steps for approaching this as shown below

  1. Define the program and outcome indicators
  2. Determine what you want to know
  3. Compute cost
  4. Determine the scope of program outcome data
  5. Compute outcome data
  6. Compute cost-effectiveness ratio

Define the Program and Outcome Indicators

Defining the programs means to know all the components and features of the program. For example, a reading lab program might have the following components.

  • Online reading in a computer lab that develops reading comprehension and pronunciation skills
  • Participation 30 minutes a day twice a week
  • The program lasts one semester
  • Participants are 30 fifth graders who are reading 2 levels below grade level

The example above is highly simplified but serves our purpose. Once the program is defined it is necessary to determine the outcome indicators. Outcomes are measurable changes in behavior. For our reading lab example below is the the outcome we want to measure.

  • Number of fifth-grade students who are reading at or above grade level at the end of the semester of reading lab participation.

With the information above we can move to step 2.

Developing Questions

Once you know what the program is about and the outcome you want to measure it is now time to shape questions for the study. This might seem like a wasted step because obviously we want to gather data about the outcome indicator. However, there might be more than one thing we want to know about the outcome. For example, we might want to know if there are differences by gender, race, or socioeconomic factors. Since we can nuance and complicate the study it is important to state explicitly what we want to know. Below are the questions for our reading lab example.

  • How many fifth-grade students reach grade level for reading comprehension through the use of the reading lab twice a week for 30 minutes?
  • How many fifth-grade students are unable to reach grade level for reading comprehension through the use of the reading lab twice a week for 30 minutes?
  • What is the cost per fifth-grade student for the use of the reading lab over the semester?

The next step is to determine the cost of the program

Determine Cost

It is now time to find out how much money was spent. This is a straightforward process that includes calculating the expenses for personnel, facilities, equipment, and other expenses. For our reading lab example, the costs are simple to compute and our shown below.

  • Personal: The total cost is 0 zero dollars because the teachers are already paid by the school and no additional staff was necessary
  • Facilities: Again, the total cost was zero because an existing computer lab was used.
  • Equipment: The expense for the license for the reading software is $30,000 dollars for the length of the program
  • Other expenses: Zero dollars for other expenses

For our example, only $30,000 is used for this program.

Determine Scope of Data Collection

The amount of data to collect depends on the questions to answer and the maturity of the program. If the program has been around for several years you have to decide if you want to collect data from all years or a subset. In our example, this is a new program so we will take all data from the the fifth graders who participated in the reading lab program.

Compute Outcome

Once the program has run its course it is time to determine outcomes. For our example, after the reading lab program was completed, each student took a reading comprehension test to assess what grade level they were at. For our purposes, students at or above grade level are considered successful. Below are the results

SuccessUnsuccess
2010

The information in the table above has already answered our first two questions for this study. We can now use this information to determine the cost-effectiveness ratio to answer the last question.

Cost-Effectiveness Ratio

The cost-effectiveness ratio can be calculated by dividing the cost of the program by the outcome. For our example, this would mean dividing $30,000 by 20 (number of success). In the table below we have several important calculations

Reading Lab Program
Program cost$30,000
Success rate20 / 30 * 100 = 66%
Number of students at grade level20
Cost per successful student$30,000 / 20 = $1,500

The table above provides all the information we need to assess this program within the scope that we defined. Right now it is hard to tell if this program is good or not because there is no standard or another program to compare it to. However, having external standards or another program for comparison is often expected with real examples.

Sometimes an additional step that is taken is a sensitivity analysis. A sensitivity analysis is especially important when there are a lot of estimations in the model. When it is necessary to estimate it is important to adjust these values high and low to see how they affect outcomes. For our example, this is not applicable.

Conclusion

Cost-effectiveness analysis is an important tool in determining the value of a program. The goals of a program are normally to help people while keeping in mind cost and effectiveness. The analysis presented here allows an evaluation to assess programs so that services can be rendered efficiently.

pexels-photo-950054.jpeg

Window Function in SQL

ad

SQL window functions are calculation functions based on individual rows that create a new column that contains the new calculation. They are similar to aggregate functions but are different in that normal aggregate functions like “group by,” will provide a single number as the output. As mentioned earlier, with a window function, the results are placed in a new column for each row.

Window functions allow users to perform many analytical tasks, from ranking and partitioning data to calculating moving averages, cumulative sums, and differences between consecutive rows. Again, what is important here is that widow functions can be used to calculate values between rows.

Basic Example

In this first example, we will figure out how many customers we have from Texas and California and put this in a new column. Below is the code followed by the output.

SELECT first_name, last_name,state ,
COUNT(*) OVER () as total_customers
FROM customers
WHERE state in ('CA','TX')
ORDER BY customer_id;

In the select statement, we pull first_name, last_name, and state, and we then have our window function. In this window function, we are counting the number of customers in the customer table who are from CA and TX. The OVER() function is used to define the window. Since it is blank it is telling SQL that the entire table is the window. This will not make sense right now but is critical information for future examples. Lastly, we are ordering the data by customer_id.

The output indicates that 9,903 customers are from CA or TX. We can confirm this by running a separate line of code.

SELECT COUNT(*)
FROM customers
where state in('CA','TX')

The output from the window function is repeated in every row called total_customers. Repeating this information doesn’t make sense but it shows us what the window function does. For example, in row 1, the function sees that this person is from TX and then outputs that the total number of customers is 9,903 for somebody from TX or CA. This is a basic example of what window functions can do. Of course, there are things much more insightful than this that can be calculated.

Intermediate Example

We are now going to run mostly the same code with one slight difference. We want to know not just how many total customers there are but how many are from TX and how many are from CA. To do this we will have to use PARTITION BY which is the group by clause for window functions.

SELECT customer_id, first_name, last_name, state,
COUNT(*) OVER (PARTITION BY state) as total_customers
FROM customers
where state in ('CA','TX')
ORDER BY customer_id;

In the output, we have all of the same rows from the previous SELECT clause and the total_customers columns are different. When a person is from TX this column shows how many people are from TX but when a person is from CA it shows how many people are from CA. If you add up the number of people from TX and CA you get the following

4,865 + 5,038 = 9,903

This is the same amount as the total number of customers in our previous example. The PARTITION BY clause breaks the number of customers into two groups, those from TX and those from CA, and assigns the appropriate value based on where the customer in that row is from.

Advanced Examples

The next example will involve using the SUM aggregation function. We are going to add customer_id in a new column. This will be a running total. In other words, SQL will keep adding the customer_id until they get through all of the data. Below is the code followed by the output and explanation.

SELECT customer_id, title, first_name, last_name, state,
Sum(customer_id) OVER (ORDER BY customer_id) as total_customers
FROM customers
where state in('CA','TX')
ORDER BY customer_id;

Here is what is happening. In the total_customer column, a running total of customer_id is being created. For example, row 1 has the value 10 because that is the first customer id of someone from TX. Row 2 has a customer ID of 13. When you add 10 + 13 = 23 which is the value in row 2 of total_customers. This continues for the rest of the table.

Here is another example this time with the RANK() function. The RANK() function allows you to create a new column that ranks the data based on a criteria. For our example, we will rank the data based on their customer_id with the lower the id number the higher the ranking. To make this even more interesting, we will partition the data by state so that the lowest value customer_id will be number 1 for TX and the lowest ranking customer_id will be number 1 for CA. Below is the code

SELECT customer_id, title, first_name, last_name, state,
rank () OVER (PARTITION BY state ORDER BY customer_id) as total_customers
FROM customers
where state in ('CA','TX')
ORDER BY customer_id;

Row number 1 is rank 1 because it is the lowest value customer_id of all TX. Row 2 is also ranked 1 because it is the lowest value customer_id of all CA.

Conclusion

The possibilities are almost endless with window functions. These tools allow you to get into the data and find insightful answers to complex questions. The examples here are only scratching the surface of what can be achieved.

orange and purple merging in water

UNION and INTERSECT in SQL

UNION and INTERSECT are two useful statements used in SQL for specific purposes. In this post, we will define, provide examples of each statement, and compare and contrast UNION, INTERSECT, with the JOIN command.

UNION

The UNION statement is used to append rows together from different select statements. Below is the code followed by the output and explanation.

(
SELECT street_address, city, state, postal_code
FROM customers
WHERE street_address IS NOT null
)
union
(
SELECT street_address, city, state, postal_code
FROM dealerships
WHERE street_address IS NOT null 
)

Notice how the select statements are both in their own set of parentheses with the union statement between them. In addition, you can see that both select statements used the same columns. Remember we are trying to combine data from different places that have the same columns. You can ignore this but the output will be hard to comprehend. For example, if you have different columns in each select statement you will get an output but it will be hard to interpret.

ad

Essentially, in the example above, we took the same columns from two different tables and created one table. UNION removes duplicates. If you want duplicates you must use UNION ALL

In contrast, joins are used to append columns together based on criteria. If we were to join two or more tables the joined table would increase in its number of columns and possibly its rows. A UNION will have a specified number of columns while growing in terms of the number of rows that are present in the output.

INTERSECT

INTERSECT finds common rows between select statements. It is highly similar to JOIN with the main difference being INTERSECT removes duplicates while JOIN will not. Below is an example.

SELECT state,postal_code FROM customers  
INTERSECT 
SELECT state,postal_code FROM dealerships;

The code is simple. You make your select statements and place INTERSECT in between them. The results above show us what data these two select statements share. When state and postal_code are the criteria for these two tables only these three rows are in common. If we did a JOIN we would get every instance of these three state and postal codes rather than just the unique ones.

Conclusion

UNION and INTERSECT have their place in the data analyst toolbelt. UNION is for appending rows together. INTERSECT is for finding commonalities between different select statements. Both UNION and INTERESECT remove duplicates which is not done when using a JOIN.