SQL Alchemy at Your Local Video Store

Jimmy Gardner
7 min readJul 13, 2021

This is a follow up article to Querying an SQL Database with SQL Alchemy

Introduction

Scene from E.T. ‘The Extra Terrestrial’

Pictured above is Elliot saying an emotional goodbye to E.T. as he returns to his homeland. The intense empathetic connection they formed during the movie culminated in this tearful, heartfelt final scene. E.T.’s finger glows and reaches out to touch Elliott's forehead. Elliot, emotional and nervous, stands still as E.T. reassures him that their connection will maintain.

Much like the connection E.T and Elliot formed throughout the movie, and maintained across vast expanses of outer space. We will be maintaining our connection between SQL Alchemy and our SQL database (formatted to represent a fictional movie rental store).

Table of Contents

Motivation

Identifying Our Top Customers

Movie Rating Preference

Customer Churn

Enhancing our Customer Retrieval Strategy

Conclusion

Motivation

In the last article, we walked through the basic steps for connecting to and querying from an SQL database using SQL Alchemy. We lightly motivated some of the SQL Alchemy operations with potential use cases. For example, we used the JOIN operation to try and settle a customer dispute regarding rental returns. We are going to take it a step further and assume that we are tasked with coming up with a marketing strategy for this rental store. Since we will be referencing this store a lot, I’d like to give it a name. Moving forward we will be calling this fictional movie rental store, Transformative Film Rentals.

Identifying our Top Customers

Lets return to the JOIN operation, and see if we can identify our top customers. We want to JOIN the table ‘rental’ on the table ‘customer’ through a common column ‘customer_id’. For our objective, we don’t necessarily have to use the JOIN operation, but it’s important because we want to see the customer name not just their customer_id. And the customer name only exists in the ‘customer’ table, so we need to use JOIN. Here is the operation,

JOIN = rental.join(customer, rental.c.customer_id==customer.c.customer_id)SELECT = select([rental.c.customer_id,customer.c.first_name,customer.c.last_name,func.count(rental.c.rental_date).label("NumRentals")])select_join = SELECT.select_from(JOIN)query = select_join.group_by(rental.c.customer_id).order_by(db.desc('NumRentals'))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
top_customers = pd.DataFrame(ResultSet, columns = ['CustomerID','FirstName','LastName','NumRentals'])
top_customers

Lets plot the top 20 customers, using the dataframe we created above,

Top 20 Customers, by last name

Movie Ratings Preference

Ok, so now we see that the customer with last name Hunt is our most frequent shopper having rented 46 films from Transformative Film Rentals. Rounding out the top 5 we have Seal, Shaw, Dean, and Sanders. Lets try and figure out what type of movies Eleanor Hunt is renting by looking at the distribution of the ratings in her film rental history. We start with a multi-join,

JOIN_A = rental.join(inventory,rental.c.inventory_id==inventory.c.inventory_id)
JOIN_B = JOIN_A.join(film, inventory.c.film_id==film.c.film_id)

So we want to find film rating . But the table ‘Film’ has no customer information on it, so we are going to use multiple join operations to collect customer id and film rating and put them in the same place. We need to pass through the ‘inventory’ table first in order to set up the join the takes us to the ‘film’ table. In JOIN_A we join the table ‘rental’ with the table ‘inventory’ on ‘inventory_id’, this allows us to have all the columns of ‘inventory’ with all the columns of ‘rental’, from there we JOIN the table ‘film’ on film_id which allows us to have all the relevant information for our final query. Here are the results of the multi JOIN,

Multi Join

Now, we have customer_id and rating in the same place. We can now use this join to group by customer_id and COUNT the rating types for each customer. But lets just concern ourselves with our top customer, Eleanor Hunt. Here is the query using the above JOIN,

SELECT = select([film.c.rating,func.count(film.c.rating).label('RatingCount')])# .select_from(JOIN_B) allows us to filter from the table 
# we created above using the multi join
collect_joins = SELECT.select_from(JOIN_B)query = collect_joins.where(rental.c.customer_id == 148).group_by(film.c.rating).order_by(db.desc('RatingCount'))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
top_customer_film_types = pd.DataFrame(ResultSet, columns = ['Rating','RatingCount'])
top_customer_film_types

Here is a distribution of the Eleanor’s film ratings,

Eleanor’s Movie Types

Lets try and glean some information from this plot. Eleanor’s top rating is ‘R’ and close behind is ‘PG’. So we could make a guess that 1. Eleanor rents movies for herself and her husband and 2. She rents movies for her children.

Next lets take a look at the overall distribution of movies,

Overall Distribution of Movies

So, maybe Transformative Film Rentals can offer a special where you buy any 2 R or PG-13 rated movies and get one PG rated film free. We want to incentivize buyers to consume more PG rated movies because there is a lot of paraphernalia money that could be made from Pixar or Disney, selling products related to PG movies. So we use the fact that buyers already like PG13 movies to drive exposure of PG movies. This strategy not only makes sense for the shoppers at large but also for Eleanor who we assumed to have PG aged children.

Customer Churn

Ok, so we cant exactly calculate the rate at which customers churn, but we can explore which customers have an inactive account in our store. Lets explore some of the inactive customers and in doing so try to develop a strategy for getting them to activate their account again.

Lets count the former activity level of our inactive customers. Or more precisely lets see how many films each inactive customer rented before they decided to freeze their account. First lets do another multi join, but this time with three joins,

join_A = rental.join(customer, rental.c.customer_id==customer.c.customer_id)
join_B = join_A.join(inventory, rental.c.inventory_id==inventory.c.inventory_id)
join_C = join_B.join(film , inventory.c.film_id==film.c.film_id)

Next, lets filter on ‘join_C’,

SELECT = select([customer.c.customer_id,func.count(rental.c.rental_date).label('NumRentals')])# here we use select_from to operate on the joins from abovejoin_select = SELECT.select_from(join_C)query = join_select.where(customer.c.active==0).group_by(rental.c.customer_id).order_by(db.desc('NumRentals'))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
top_inactive_customers = pd.DataFrame(ResultSet, columns = ['CustomerID','NumRentals'])
top_inactive_customers
Most ‘active’ Inactive customers

So here is our strategy, lets take a look at the top inactive customer’s movie rating distribution. Then we send them an email advertising some of our new releases in this category. The join is the same as above, except this time we filter by customer ‘368’ seeing that they had the most rentals. But moving forward we can apply this strategy to all our inactive customers. Here is the Alchemy,

SELECT = select([film.c.rating,func.count(film.c.rating).label('CountRatings')])join_select = SELECTq.select_from(join_C)query = join_select.where(customer.c.customer_id==368).group_by(film.c.rating).order_by(db.desc('CountRatings'))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
ratings_distribution = pd.DataFrame(ResultSet, columns = ['Rating','CountRatings'])
ratings_distribution
Rating Distribution Customer ‘368’

Enhancing our Customer Retrieval Strategy

Lets continue to evaluate customer 368. Customer 368 is actually ‘Harry Arce’. Interesting name. I did not create this dataset. Those who did obviously had some fun with it. Anyways, lets take a look at all of Harry’s film rentals, specifically those with PG-13 rating tags. The JOIN is the same as above, but lets add some different filters,

SELECT = select([rental.c.customer_id,film.c.rating,film.c.description,film.c.film_id])SELECT_JOIN = SELECT.select_from(join_C)query = SELECT_JOIN.where(db.and_(rental.c.customer_id==368,film.c.rating == 'PG-13'))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
Harry’s PG-13 Rentals

It looks like Harry is one for drama and possibly has an affinity for dramas involving feminists. So lets look at all the PG-13 films in our dataset and see if we can find a few films Harry hasn't seen, Here is the SQL alchemy,

base_query = select([film.c.film_id,film.c.rating,film.c.title,film.c.description])query = base_query.where(db.and_(film.c.rating=='PG-13',film.c.description.like('%drama%'),film.c.description.like('%feminist%')))ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
harrys_new = pd.DataFrame(ResultSet, columns = ['FilmID','Rating','Title','Description'])
harrys_new
Recommendations for Harry

If we cross reference the film_id from the recommendations for Harry query with the film_id from Harry’s PG-13 rentals query we can see all unique film_id’s. So we will only recommend those. It looks like he has already seen POND SETTLE, so we will recommend the other two. Lets send an email to Harry with a couple titles he might like,

  1. ALTER VICTORY: A Thoughtful Drama of a Composer And a Feminist who must Meet a Secret Agent in The Canadian Rockies
  2. RAIDERS ANTITRUST: A Amazing Drama of a Teacher And a Feminist who must Meet a Woman in The First Manned Space Station

Conclusion

If it wasn't clear before, it is now. If we put in a little work upfront we can write all the query’s we’d need for our data analysis in SQL Alchemy. This is exciting because it allows for complex queries to take place in a python script that can easily be turned into data frames for further visualization and analysis.

Thank you for reading ! Hope you enjoyed :)

If you’d like to check out the code notebook associated with this article you can take a look at my Github.

--

--