Sunday, April 24, 2011

Sort Merge Outer Join : Oracle Tuning Tip#24



TOPIC:
Sort Merge Outer Join

DEFINITION:
In this, we are going to discuss the last outer join technique provided by Oracle, “Sort Merge Outer Join”.

To perform sort merge outer join, Oracle follow these steps[assume table-a is outer-joined with table-b]:
1.     Oracle read all the required records from table-a and sort the data by the joining column(s)
2.     Oracle read all the required records from table-b and sort the data by the same joining column(s)
3.     Oracle then compare all these sorted data from both these tables record-by-record based on the joining column(s)
4.     Oracle return the record from table-a if it has a matching record from table-b and returns the value for all the columns mentioned in the query
5.     Oracle even return the record from table-a if it can’t find a matching record from table-b and returns NULL for table-b’s columns(s) mentioned in the query

In this join, there is no concept of driving & driven table like nested loop outer join. Important point to understand is, unlike nested loop outer join, where-in driven(inner) table is read(or accessed) as many number of times as for the each record from the outer table but in sort merge outer join, each of the tables involved are accessed at most once. Actually sort merge outer join has two phases (like sort merge natural join), Sorting & Merging. Step-1 & Step-2 are of Sorting phase and Step-3, Step-4 & Step-5 are of Merging phase. In the sorting phase, all the required records from both the tables are read and then get sorted. In the merging phase, records from both these tables are joined and the matching records as well as the unmatching records from the parent table (table-a) will be returned in the output. So, Sort Merge Natural Join and Sort Merge Outer Join differ only in the merging phase (the phase where the records are going to be merged with each other) but the sorting phase works in the same way for both these joins.

The logical flow diagram for this methodology will be like this, [assume table-a is outer-joined with table-b. So, table-a is the parent table & table-b is the child table]

PHASE-I (Sorting)
   Read the records from table-a
   Sort the retrieved required records of table-a by the join_key 
     (which is nothing but the column(s) used for joining these two tables)
   Dump into temp_a (assume logically that temp_a is just a temporary 
      memory space to hold this sorted data of table-a)

   Read the records from table-b
   Sort the retrieved required records of table-b by the same join_key
   Dump into temp_b

PHASE-II (Merging)
   read 1st record from temp_a
   read 1st record from temp_b
   while [NOT End Of Record] on temp_a and temp_b
   loop
        if ( temp_a.join_key = temp_b.join_key )
                 then output joined record and return the value for all the column(s) mentioned 
                 in both the tables
        elsif ( temp_a.join_key < temp_b.join_key )
                 then also output the joined record but return the value for all table-a’s 
                 column(s) mentioned in the query & NULL for all table-b’s column(s) 
                 mentioned in the query
        elsif ( temp_a.join_key > temp_b.join_key )
                 then don’t output the record
                 (Note: If there is only one record in temp_a [which is a parent table] then 
                            return that record of temp_a & NULL for all table-b’s 
                            column(s) mentioned in the query. This is an exception.)
        end if
        if ( temp_a.join_key = temp_b.join_key )
                 then read the next record in temp_a
                 (Note: If there are multiple records for the same join_key in temp_b 
                            [which is a child table], then all the records in temp_b for a join_key 
                            value should be joined with all the records in temp_a for the same 
                            join_key value. So, consider for the same join_key value, if n records 
                            are in table-a & m records are in table-b, then the total number of 
                            iteration it suppose to do for this join_key value should be equal 
                            to n*m)
        elsif ( temp_a.join_key < temp_b.join_key )
                 then read the next record in temp_a
        elsif ( temp_a.join_key > temp_b.join_key )
                 then read the record(s) in temp_b till temp_b.join_key >= temp_a.join_key
        end if
   end loop

LITTLE-KNOWN FACTS TO BE REMEMBERED:
·         /*+ USE_MERGE(<<table-1>> <<table-2>>) */ is the hint that can be used to impose this sort merge outer join.
·         It is completely untrue that Sort Merge Outer join does only TABLE ACCESS FULL on both the tables because in some cases, it will access the index table also but only if that cost is less.
·         In sort merge outer join, Sort phase will be skipped if the data is read from the index table because it doesn’t require the sorting as the data is being read from the index table is already coming in the sorted order. Sort phase will be imposed only if the data is read from the data table directly(TABLE ACCESS FULL) because the sorting is required here as the data is not coming out in the sorted order.
·         This methodology is especially opted by Oracle when both the tables are joined by inequality operators like <,>,<=,>= or both the tables are big. This is because Hash Outer Joins can’t be imposed when inequality operators are used and Nested Loop Outer Join is definitely not an option if both these tables are too big.
·         This is very successful if both the tables are big and outplays nested loop outer join in this case.

ADVANTAGE:
·         If the columns mentioned in the ORDER BY clause of the sql statement is same as the joining columns & tables are big, Optimizer prefers sort merge outer join over hash outer joins as it is cheaper. Reason is, sorting doesn’t need to be done explicitly again as the data coming out from sort merge outer join(due to the sorting phase) would already be in the sorted order and that is not the case in hash outer joins. In this case[sort merge outer join], you will not find this operation, SORT BY or ORDER BY in the explain plan which will confirm that explicit sorting is not done.

DISADVANTAGE:
·         TEMP tablespace will be used extensively if the data read from both the tables don’t fit into the allocated memory space [which is SORT_AREA_SIZE component of PGA] and query will start to throw errors if even TEMP tablespace is not big enough to hold the data from both the tables
·         If you closely look at the logic of sort merge outer join, records start to get displayed only in the merging phase as none will be displayed till the sorting phase is completed. Due to this reason, this join has to be avoided in some OLTP applications if the initial matching records need to be displayed as quick as possible.
·         This is very resource intense process (especially CPU is utilized a lot) since both the big tables need to be sorted and merged in the memory & even in hard-disk [only if memory is not big enough to hold this data] as TEMP tablespace is created in hard-disk.

HOW TO VERIFY:
How to verify whether Oracle follows sort merge outer join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                   |                                                          |            |
|   1 |    MERGE JOIN (OUTER)                            |                                                          |            |
|   2 |       SORT JOIN                                               |                                                          |            |
|   3 |          TABLE ACCESS (FULL)                       | <<table-a>>                                |            |
|   4 |       SORT JOIN                                               |                                                          |            |
|   5 |          TABLE ACCESS (FULL)                       | <<table-b>>                               |            |
In the explain plan, whenever it chooses this methodology, it displays the keyword [MERGE JOIN(OUTER) and SORT JOIN] in the operation column.

EXAMPLE:
Create 2 tables.
One table is store the information of the sales done by the employees for all the months and inserts 8 records.
Another table is store the information of the bonus received by the employees and inserts 5 records.
The tables will look like this,

DATA TABLE:
SALES:
ROWID
Empid
Empname
Sales_Month
Sales_Amt
AAAAA1
1
CHARLES
Jan-2011
2000
AAAAA2
7
FERGUSON
Feb-2011
1000
AAAAA3
7
FERGUSON
Jan-2011
4000
AAAAA4
7
FERGUSON
Mar-2011
9000
AAAAA5
9
NADAL
Jan-2011
3000
AAAAA6
9
NADAL
Mar-2011
8000
AAAAA7
4
ERIN
Jan-2011
7000
AAAAA8
6
JORDAN
Feb-2011
5000

BONUS:
ROWID
Empid
Empname
Bonus_Amt
AAAAB1
9
NADAL
60
AAAAB2
6
JORDAN
85
AAAAB3
3
JOHN
40
AAAAB4
7
FERGUSON
90
AAAAB5
2
MIKE
75

Fire this query against these tables where the requirement is to display all the employee name(s) along with their sales information for all the months. In addition to this, employee’s bonus amount should also be displayed if that employee received it.
Select a.empname, a.sales_month, a.sales_amt, b.bonus_amt  
from sales a, bonus b
where a.empid = b.empid(+);

As per the design of sort merge join, the following steps will be followed while executing this query [as per the query, SALES is the parent table and BONUS is the child table so former will be retrieved first and then latter],
1.     Oracle reads all the records from SALES table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid
Empname
Sales_Month
Sales_Amt
1
CHARLES
Jan-2011
2000
4
ERIN
Jan-2011
7000
6
JORDAN
Feb-2011
5000
7
FERGUSON
Feb-2011
1000
7
FERGUSON
Jan-2011
4000
7
FERGUSON
Mar-2011
9000
9
NADAL
Jan-2011
3000
9
NADAL
Mar-2011
8000
2.     Oracle reads all the records from BONUS table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid
Empname
Sales_Amt
2
MIKE
75
3
JOHN
40
6
JORDAN
85
7
FERGUSON
90
9
NADAL
60
3.     It reads the first record from both these intermediate result sets (Now, empid:1 from 1st table & empid:2 from 2nd table) [as per initial steps of merging phase in logical flow diagram]
4.     Since empid:1 is less than empid:2, the joined record will be displayed but NULL for “Sales_Amt” column in the output [as per 1st IF clause of merging phase in logical flow diagram]
5.     It reads the next record from 1st table since 1 is less than 2 (Now, empid:4 from 1st & empid:2 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
6.     Since empid:4 is greater than empid:2, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
7.     It reads the record till empid:6 in 2nd table since empid:2 & 3 from 2nd table is less than the existing record, empid:4 in 1st table (Now, empid:4 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
8.     Since empid:4 is less than empid:6, the joined record will be displayed but NULL for “Sales_Amt” column in the output [as per 1st IF clause of merging phase in logical flow diagram]
9.     It reads the next record from 1st table since 4 is less than 6 (Now, empid:6 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
10.  Since empid:6 is equal to empid:6, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
11.  It reads the next record from 1st table since 6 is equal to 6 (Now, empid:7 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
12.  Since empid:7 is greater than empid:6, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
13.  It reads the next record from 2nd table since 7 is greater than 6 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
14.  Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
15.  It reads the next record from 1st table since 7 is equal to 7 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
16.  Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
17.  It reads the next record from 1st table since 7 is equal to 7 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
18.  Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
19.  It reads the next record from 1st table since 7 is equal to 7 (Now, empid:9 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
20.  Since empid:9 is greater than empid:7, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
21.  It reads the next record from 2nd table since 9 is greater than 7 (Now, empid:9 from 1st & empid:9 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
22.  Since empid:9 is equal to empid:9, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
23.  It reads the next record from 1st table since 9 is equal to 9 (Now, empid:9 from 1st & empid:9 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
24.  Since empid:9 is equal to empid:9, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
25.  It exits as all the records from both the tables are processed [as per WHILE condition of merging phase in logical flow diagram]

So, the output will look like this,
Empname
Sales_Month
Sales_Amt
Bonus_Amt
CHARLES
Jan-2011
2000

ERIN
Jan-2011
7000

JORDAN
Feb-2011
5000
85
FERGUSON
Feb-2011
1000
90
FERGUSON
Jan-2011
4000
90
FERGUSON
Mar-2011
9000
90
NADAL
Jan-2011
3000
60
NADAL
Mar-2011
8000
60


If you close look at the output, you can see that the value is displayed in “Bonus_Amt” column for the employees:  JORDAN, FERGUSON & NADAL and nothing is displayed in the same column for the employess: CHARLES & ERIN nonetheless all the records from SALES table are returned in the output because of the outcome of sort merge outer join.

Now, explain table will look like this,
--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   6 | SELECT STATEMENT                                   |                                                          |  8          |
|   5 |    MERGE JOIN (OUTER)                            |                                                          |  8          |
|   2 |       SORT JOIN                                               |                                                          |  8          |
|   1 |          TABLE ACCESS (FULL)                       | SALES                                             |  8          |
|   4 |       SORT JOIN                                               |                                                          |  5          |
|   3 |          TABLE ACCESS (FULL)                       | BONUS                                          |  5          |
From the explain plan, we can say that both the tables are joined by sort merge outer join methodology.