Hi guys, hope y’all doing good. Now that everything has started to become normal and the way how it was actually, there are still a few areas that are not sorted out and remain a mess.
Yes, I am talking about your SQL database. Do you have no idea how to maintain your data & make it more readable? Do not worry, I am here for the rescue!
In this blog, I am going to share some interesting thoughts on CTE (Common Table Expression) in SQL like how CTE helps in querying hierarchical data and keeps our query more readable.
A common Table Expression is a temporary result set that exists for one query. CTE is defined/ initiated using the ‘WITH’ keyword followed by a name for the table expression and an actual base query called the anchor.
WITH
cte_name AS (
SELECT columns FROM table WHERE conditions
)
Let’s take a simple company and dealer table in which the company has many relations with dealers as an example. Table company holds the list of car companies and dealers with the list of dealer names and respective sales count.
Company Table
id | name | web_url |
---|---|---|
24 | Tata | www.tatacars.co.in |
25 | Toyota | www.toyota.co.in |
28 | Honda | www.honda.co.in |
30 | Hyundai | www.hyndai.co.in |
Dealer Table
id | cmpy_id | name | sales_count |
---|---|---|---|
221 | 24 | Dealer A | 24 |
235 | 25 | Dealer B | 51 |
249 | 28 | Dealer C | 41 |
256 | 24 | Dealer A1 | 62 |
263 | 25 | Dealer B1 | 55 |
275 | 30 | Dealer D | 56 |
With the above setup, we will try to fetch a list of company names and dealers with the highest sales count accordingly.
In this scenario, ranking dealers with the highest sales count for a particular company would be our base result set, with that we can easily form our main query as shown below.
WITH
ranked_dealers AS (
SELECT *, RANK( ) OVER (PARTITION BY cmpy_id ORDER BY sales_count desc) as rank
FROM dealers
)
SELECT cmpy.name AS company_name , rd.name AS dealer_name, rd.sales_count
FROM company AS cmpy
LEFT JOIN ranked_dealers AS rd ON cmpy.id = rd.cmpy_id
WHERE rd.rank = 1;
Query Result
company_name | dealer_name | sales_count |
---|---|---|
Tata | Dealer A1 | 62 |
Toyota | Dealer B1 | 55 |
Honda | Dealer C | 41 |
Hyundai | Dealer D | 56 |
Now we have the best dealers for the corresponding companies. By this simple example, we get a basic understanding of how CTE works. The same can be achieved using a subquery.
But for complex queries, we can have multiple table expressions with different names to use in the main query which helps improve readability and easy debugging.
Materialize
CTE WITH clause runs once and generates a temporary copy table, this is referred to as MATERIALIZE. It helps in query performance. However, this comes with disadvantages such as all columns are computed even if they are not used in the main query and mainly indexes won’t work well when used with WHERE or JOIN clauses.
Prior to POSTGRES 12, all CTEs are materialized. By default, Later versions materialize only if CTE is referenced more than once, and use inline if referenced only once which has a huge impact on query execution time. We can change the default behavior via code as shown below.
WITH cte_name AS MATERIALIZED
(or)
WITH cte_name AS NOT MATERIALIZED
Using CTE with Rails
CTE can be implemented on Rails using the ActiveRecordExtended gem. The above query can be translated into ActiveRecord as,
Company.with( ranked_dealers: “ SELECT *, RANK( ) OVER ( PARTITION BY cmpy_id ORDER BY sales_count desc ) as rank FROM dealers ” )
.joins( “ LEFT JOIN ranked_dealers ON ranked_dealers.cmpy_id = company.id ” )
.where( “ ranked_dealers.rank = 1 ” )
.select( “ company.name AS company_name , ranked_dealers.name AS dealer_name, ranked_dealers.sales_count” )
Does this blog on CTE sound captivating!!! Hope you have got an idea of how to have your data sorted out. Do let me know your thoughts in the comments.
Let us all soon meet in my next blog. Until then, it’s a wrap!