More Tools For Your SQL Alchemy

Anything you can do, I can do…. Well the same

Jimmy Gardner
6 min readJul 17, 2021
Alchemy Tools

I’m not going to pretend that I know the function of any of the tools pictured above. It’s unclear wether the unique shapes are a product of aesthetic preference or functional necessity. Possibly a mixture of both. Although at a glance it looks like the rows are grouped by function or possibly size. Regardless of their purpose any craftsman or woman can appreciate the need for versatility and flexibility in their tool set. This is what we will explore in this article, except we will concern ourselves with SQL Alchemy only, and all the ways it can make your life easier.

This is a follow up article to, two articles I wrote on SQL Alchemy. I’ll recap a bit, but im going to build on the tools we developed in those previous articles. You can find both articles here(1) and here(2).

Using Timestamps

Lets continue with our fictional rental store Transformative Film Rentals. Lets explore creating a strategy for scheduling employee work hours and when we’d might need to increase our coverage. We can you use the timestamp field ‘rental_date’ to explore this. Lets look at the number of rentals per hour in our film store,

base_query = select([func.strftime('%H',rental.c.rental_date).label('HR'),func.count(rental.c.customer_id)])query = base_query.group_by('HR')ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()

Again, its important to drive home how similar the structure of SQL Alchemy is to actual SQL syntax. The first thing to note is that, like SQL syntax, functions like COUNT, MIN, MAX, SUM, etc are referenced in the SELECT statement. In the SELECT statement above we use the functions STRFTIME and COUNT as part of our query. The STRFTIME function allows us to select only the hour from our timestamp field. Then we can group by hour and COUNT the number of rentals per hour. Here is a plot that depicts this,

Film Rentals Per Hour

The film rentals per hour calculation is over the lifetime of the store. This is why FPH is so large. We are trying to get a general idea of which times are most busy. Also the times are in military hours. So we see that at 1500 hours or 3pm the store is the most busy. Ok, so we know that we’ll probably need to staff up around 3pm. This timestamp plot is with respect to the ‘rental_date’. Lets try and write a query and visualize a plot for the activity of the store with respect to ‘return_date’. Here is the Alchemy,

base_query = select([func.strftime('%H',rental.c.return_date).label('HR'),func.count(rental.c.customer_id)])query = base_query.group_by('HR')ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
rental_dates = pd.DataFrame(ResultSet, columns = ['HR','FPH'])
rental_dates

This is the same query that is represented in the plot above, the only difference is that we are changing columns. Instead of the ‘rental_date’ column, we are using the ‘return_date’ column. Here is a plot of this query,

Film Returns Per hour

A quick note about the above queries. In both the film rentals and film returns queries we used the COUNT function to count the entries in ‘customer_id’ for each hour. The reason we did this is because there is a customer_id for every rental and return, so it makes it an easy way to track activity.

Taking a quick glance at the film returns plot above, we see that the highest amount of returns take place at midnight. Midnight is rather late, but Transformative Film Rentals offers drop box return so this makes sense and lines up with my personal experience. So assuming the store is closed at this hour we can schedule an extra shift member for the shift starting at 10am allowing the video clerks to put the returned films back on the shelf and update the inventory database in time for the 3pm rush.

We also see that the 3pm hour has a fair amount of activity for the return films plot. So if both rentals and returns are high during this time period, we can assign one staffer to the checkout station and one staffer to the return drop box. Timestamp fields are a good way to see the data on a granular level. So lets dive a little deeper and see which days and months are the most active for Transformative Film Rentals.

Timing is Everything

Lets take a look at the shopping patterns of our top 10 customers. Specifically lets take a look at which time of day our top 10 customers like to shop. We are looking to start a membership program that offers special deals and information for customers who join the program. So we’d like to advertise it to those who it would most likely benefit. Sure we could just send them all emails, and we will, but its nice to have some face time and maybe offer them something on the spot.

One of the nice advantages of using SQL Alchemy to run SQL queries is that you can use python data structures and control statements to make your queries more efficient. Here we will pull the the top 10 customers shopping patterns and see which hours of the day they frequent the store the most. We’ll use a list to store our top 10 customer id’s and then pass this list through the IN operator that will be a part of our Alchemy query,

Using Alchemy query inside a function

Here is a plot of our top 10 customers activity,

Count of our Top 10 customers activity per hour

Just taking a quick look at our plot, we see that the highest activity levels are at 4pm and 10pm (16 and 22 on the plot). The purple squares indicate total film rentals during that hour. It will probably be in our best interest to put the most engaging clerks on shift during these times. In the above query we looked at a new operator IN, and saw how seamless Alchemy queries can be embedded in python functions!

Time is Money

Lets see what this means in terms of dollar amounts. We’d expect that our highest dollar amounts come when there are the most film rentals. Lets see if this holds true. If all films cost the same amount, we’d expect the shape of the plot to be a copy of the one above. Lets see how close it is. But first here is the alchemy,

join_payment = rental.join(payment,rental.c.customer_id==payment.c.customer_id)SELECT = select([func.strftime("%H",rental.c.rental_date).label('HR'),func.sum(payment.c.amount).label('TotalSpent')])SELECT_join = SELECT.select_from(join_payment)query = SELECT_join.where(rental.c.customer_id.in_(customer_ids)).group_by('HR').order_by('HR')

ResultProxy = cnx.execute(query)
ResultSet = ResultProxy.fetchall()
payments = pd.DataFrame(ResultSet,columns = ['HR','TotalSpent'])
payments
Money Spent Per Hour by Top 10 customers

Virtually the same. Although if we reference the film rentals per hour plot, we see that at 06(6am) ,19(7pm), 21(9pm), and 23(11pm) timestamps there were 17 purchases each. But if we take a look at each of those timestamps on our money spent plot, we see that there is a max variation of $191. This occurs when taking the difference between the payments at the 9pm timestamp and the 11pm timestamp. Maybe someone can investigate why these payouts are so different? We can alter the query above with some additional filters and an extra JOIN to take a look at the type of films being rented at these hours.

Conclusion

Again SQL Alchemy provides a seamless way to do analysis in Python. We looked at the STRFTIME function as well as the IN operator in this article. You can access the database that was used to write these queries from my Github. Reach out to me on LinkedIn if anybody finds this database useful and can finish investigating the question I posed above.

Thanks for reading ! Down to talk SQL Alchemy whenever

--

--