Thursday, June 09, 2011

SQL 2005 - Table Joins: Inner Join, Self Join and Outer Join with execution sequence and join order


1. Table Joins


"Table Joins" are useful for bringing data together from different tables based on their database relations. First, we will see how the join operates between tables. Then we will explore the Order of Execution when Join and where condition both exists. Finally, we will move our exploration to the Importance of the Join order.

2. Run the attached script


Before you begin, download the attached script and the downloaded script has T-SQL for creating three tables and data for this article. You should also run the Northwnd script as some example here uses the NorthWnd DB. Once you downloaded the script CreateObject.zip run the script in the NorthWnd Db.

Below is the content of three tables created by the Script:

Fig 1. Table required for this article


We are going to use these tables to perform the joins. These tables are just for demo purpose only and so I do not have proper table relationship in terms of Primary key and Foreign keys. OK, Let us move on.

3. Cartesian Product of Table


Usually, join will be performed between two tables based on the key columns between two tables those together constitutes the database table relationship. For Example, DeptId in the employee table and DeptId in the Department table make the relationship between these two tables.

The below example is joining two tables without using any key columns. Here, TableA and TableB are clubbed together to form the whole result-set based on "Cartesian Product". The Cartesian product will take a single record in the first table and attaches it with all the records in the second table. Then takes the Second records in the first table and attaches it with all the records the second table and this continue till the end of the record in the first table.

The result of the Cartesian Join is shown below:

Fig 2. Cartesian Product of two tables


4. Joining Two tables


When joining two tables to avoid the bulk number of records that results as shown in the previous example, we should chose a join column from both the tables. The example given below joins Table_A and Table_B based on the column called ID. Since column mapping is established between two tables, we will reduce huge records when compared to Cartesian Product.

Below is the Result of the Join:

Fig 3. Mapping column for the table join


Note that the Row Number 1 and Row number 5 are returned as the join result as they satisfy the mapping condition A.Id = B.Id. In the query, it is shown in the Red Box. You see, mapping produces the sub-set of the Cartesian Join. 

5. Joining multiple tables


In the above example, two tables are participated in the join. To join multiple tables, we should use the result of the previous join (Table 1 join Table 2) and pick a column from it (Join result), then pick a column in the third table then specify the join condition as in the previous example. This way we can join multiple numbers of tables. Consider whatever joint so far as the single table and join it with the third one.



Fig 4. Joining more than two tables - Example



First Table_A joins with Table_B, which is nothing but the previous example. Then the joint result of A and B is considered as single table say AB. Then this AB is joint with the Table_C forming the join of three tables. This is shown in the below picture:

Fig 5. How multiple join works




6. SQL Join Types


There are three types of join available based the way we join columns on two different tables.

  • Full Join
  • Inner Join
  • Left outer Join
  • Right outer Join


What we saw in the previous two sections are the inner joins. If we join the same table we call it as Self join and it is special category do not get confuse it with the join types. Let us see an example for the join types in next coming examples.

Before we go into those examples, remember that the result computed so for is considered as LEFT and the new table coming to join the existing result is RIGHT. This is useful when we are joining multiple tables with a different type of joins.

7. Full Join Example


A full join is somewhat different from the Cartesian product. Cartesian product will get all the possible row combination between the two joining tables. Full join takes the matching columns plus all table rows from the left table that doest match the right and all tables rows in the right that does not match the left. It applies null for unmatched row on the other end when doing so. The below example shows the "full join" between Table_A and Table_C


Fig 6. Content of Table_C


Fig 7. Full Join - Example



  1. In the above picture, the Blue Row is the matching row on both the table.
  2. Second row (Green First, red next) is the unmatched one. Row exists on the Left table and null substituted for all the columns in the Right.
  3. Third row (Red First, Green next) is also the unmatched one. Row exists on the Right side table, null returned for the left one.

Look at the from clause,

The Table_A is taken first and joint with Table_C. Here, The result set computed so for always treated as Left side of join (Table_A here) and the new table going to be joint (Table_C) is treated as a Right side of the join.

8. Left Join Example


Left join makes sure to take all the rows on the left table by placing the null entries for the table joining on the right side when there is an unmatched row on the right side.

Fig 8. Left Join - Example


In the above example, Id value of 2 in the Left table does not exist on the right side table Table_C.Id. But, we still got the 2,BBB row from the Table_A by placing the null entries for the right side table. This is shown in Green and red boxes above.

Also note that when SQL is processing, it takes the rows for the Table_A first (So Table is Left) then joins it with the Table_C (Right  side). It does not matter whether we provide joint condition as A.Id = C.Id or C.Id = A.Id

9. Right Join Example


It is the reverse of the left join. It implies take all the rows on the right side of the table by placing the null on the left table for unmatched rows. Below is the example for it:

Fig 9. Right join example


Blue Box : Matched rows.
Green : Row exits on the right side table Table_B and match (Based on Id column) not available on the left
Red : Null placement for the columns of Table_A

10. Inner Join Example


In inner join, only the matched rows are retrieved. Please refer section four. Inner join returns same result and hence one no need to worry about the placing a table on left or right.

11. Self Join Example


Joining the table with the same table is called the "Self Join". To explain this let us go the table on the Northwnd database (Attached with this article). Have a look at the columns in the employee table. The EmployeeId column is the Primary key column and each row belongs to a single employee. The reports to column refer some other row in the same table stating that referred row is the manager for the referring row (Employee). But the referred row (manager) is also an employee possibly having a valid entry on its ReportsTo column. So in the NorthWnd database this relationship achieves a hierarchical reporting structure.

Fig 10. Primary Key and Foreign key On Sample Table


Now have look at the below example:

Fig 11. Self Join Example


Here, the row pointed by ReportTo column is Manager. So the table on the left-hand side is employee and table on the Right-hand side are Manager. When the FirstName is picked from the left table of the joint result, it is Employee name and the same FirstName is picked from the right table of the join result is Manager name.

12. Execution Sequence of Table Joins


When the query involves the "combination of the outer and inner join" the execution sequence is important. If you have only inner join the execution sequence is not important as they are going to provide the same result. Well, What I am taking about?

Let us say you have a query, which has both inner join and outer join (Left or Right). Also, let us assume that you have where clause that filters the records and mapping column does not participate in the where clause. Now., which operation is performed first. We have two options:

  1. Apply the where clause record filter first then perform the table join
  2. Apply the Table Join first then perform the Where Clause filter

The above two option returns same result when all the joins involved are inner joins. But the result may differ when we have at least one outer join. OK. SQL chose the second option. Let us examine and prove this.

Given below are an Example and the result:

Fig 12. Table join condition versus Where Clause



How the Sequence differs is shown below:

Option 1

Fig 13. Execution Sequence - Option 1


Option 2:

Fig 14. Execution Sequence - Option 2

  
So keep in mind that the operation sequence as SQL first completes the join first then applies the where clause when the query has one or more outer joins.

13. Order of the Joins

Like the Operation sequence the "Order of the join" also important when you want to mix the inner joins with outer (Left or Right) joins. Again, if the entire join involved between the tables are inner joins then the join order is not important. But it matters when we mix the inner and outer joins.

What is Order of the Join?  If my query joins three tables like [X inner Y] Left Z, the order here is inner join performed first, and then the left join

OK. Let us go back to the NorthWnd Database again. The result you want to achieve is Get all customer names whether they have ordered or not. Also, list the quantity of order placed by the customer if they actually placed at-least one order.

Look at the Query and result below: [Outer Join then Inner Join]

Fig 15. Join Order - Outer join and then inner join


From the above query, you can see the order of join as mentioned below:
1) A right join between Orders and Customers. SQL first queries the Orders table (As it appears first) and treats the result as Left. Then it queries the Customers table next and treats the result set as Right. Finally from both the result set Right join is performed that means SQL ensures you that it will not lose any rows on the Right side result set that is it will not lose any rows from the Customers table. So you will get all customers including the two who don’t place any orders and since a matching records for those two rows are not available you will get null columns for the Orders. Now the resulting join result is available for the next join and this join result is now treated as Left.

2) The above returned result (Left side) is joint with the Order Details table. SQL knows it already has the Left result set so it query the table Order Details to have the Right part of the join. Finally an Inner join is performed between Left and Right based on the Order Id. But note that we have two null entries for the ordered column for which there are no corresponding customers in the Left side result. So the Inner join just skips those records. Thus, we got a total of 2155 rows skipping the two customers who does not place any orders. This is not the result we need. Read the Underlined text at the top of this section.

Now look at the Query and Result below: [Inner Join then Outer Join]

Fig 16. Join Order - Inner join and then outer join


Here, Inner join based on the OrderId between Orders and Order Details is performed first. This result (Left side) is then Right joint against the Customers table [Right].

Now let us analyse how this is giving the result we want.

The inner join between Order and Order Details brings all the matching records based on the order id. Note that we are not losing any order id here by null values. Then by keeping this already brought result on the left, Customers table is queried and kept in the Right. Then the right join is performed between Customers and Left side result based on the Customer Id. Now we get all the customers including the two for which we don’t have any matching records on the Left side table.

So…
Keep in mind that join order is important when you mix the inner join with an outer join.

14. Other way of achieving the same result


When I had a chat with one of my office friend (VaraPrasad), he told that the result you are expecting could be achieved without using the Right Join. How? That is the question I asked him. He told that Crystal reports do it, I will show you. Good, now this section is added to this article based on what I got from him.

Fig 17. Join Order - Change priority using Parenthesis


OK. Now let us see how this works and gives the expected result of not losing any customers. Note that the Rule remains same, whatever computed so for is Left and the Joining table is on the Right.


  1. SQL first queries the table Customers and keeps it as the result on the Left.
  2. It reads the Open parenthesis and queries the table Orders and keeps it Left again. Why? SQL Says “Boss, I know that I should not join this table now and the right side table is not yet ready because of the Open parenthesis encountered. So I kept this also on the Left side. Now, I need two right side table to complete the join.
  3. Now the Order Details table is queried and kept as a Right side of join as a Left side is already available.
  4. A join between Order and Order Details is performed based on the Order Id. The resultant records are treated as right because the Customer table is already queried and kept in the Left. Now the left join between the Left and Right side of result set brings all customers as the join type left outer join.

Note: The scripts for creating the Demo tables and NorthWnd database is available as a download.

Sample script for Download 

No comments:

Post a Comment

Leave your comment(s) here.

Like this site? Tell it to your Firend :)