TOPIC:
Sort Merge Outer Join
DEFINITION:
In this, we are going to discuss the last outer join technique provided by Oracle, “Sort Merge Outer Join”.
To perform sort merge outer join, Oracle follow these steps[assume table-a is outer-joined with table-b]:
2. Oracle read all the required records from table-b and sort the data by the same joining column(s)
3. Oracle then compare all these sorted data from both these tables record-by-record based on the joining column(s)
4. Oracle return the record from table-a if it has a matching record from table-b and returns the value for all the columns mentioned in the query
5. Oracle even return the record from table-a if it can’t find a matching record from table-b and returns NULL for table-b’s columns(s) mentioned in the query
In this join, there is no concept of driving & driven table like nested loop outer join. Important point to understand is, unlike nested loop outer join, where-in driven(inner) table is read(or accessed) as many number of times as for the each record from the outer table but in sort merge outer join, each of the tables involved are accessed at most once. Actually sort merge outer join has two phases (like sort merge natural join), Sorting & Merging. Step-1 & Step-2 are of Sorting phase and Step-3, Step-4 & Step-5 are of Merging phase. In the sorting phase, all the required records from both the tables are read and then get sorted. In the merging phase, records from both these tables are joined and the matching records as well as the unmatching records from the parent table (table-a) will be returned in the output. So, Sort Merge Natural Join and Sort Merge Outer Join differ only in the merging phase (the phase where the records are going to be merged with each other) but the sorting phase works in the same way for both these joins.
The logical flow diagram for this methodology will be like this, [assume table-a is outer-joined with table-b. So, table-a is the parent table & table-b is the child table]
PHASE-I (Sorting)
Read the records from table-a
Sort the retrieved required records of table-a by the join_key
(which is nothing but the column(s) used for joining these two tables)
Dump into temp_a (assume logically that temp_a is just a temporary
memory space to hold this sorted data of table-a)
Read the records from table-b
Sort the retrieved required records of table-b by the same join_key
Dump into temp_b
PHASE-II (Merging)
read 1st record from temp_a
read 1st record from temp_b
while [NOT End Of Record] on temp_a and temp_b
loop
if ( temp_a.join_key = temp_b.join_key )
then output joined record and return the value for all the column(s) mentioned
in both the tables
elsif ( temp_a.join_key < temp_b.join_key )
then also output the joined record but return the value for all table-a’s
column(s) mentioned in the query & NULL for all table-b’s column(s)
mentioned in the query
elsif ( temp_a.join_key > temp_b.join_key )
then don’t output the record
(Note: If there is only one record in temp_a [which is a parent table] then
return that record of temp_a & NULL for all table-b’s
column(s) mentioned in the query. This is an exception.)
end if
if ( temp_a.join_key = temp_b.join_key )
then read the next record in temp_a
(Note: If there are multiple records for the same join_key in temp_b
[which is a child table], then all the records in temp_b for a join_key
value should be joined with all the records in temp_a for the same
join_key value. So, consider for the same join_key value, if n records
are in table-a & m records are in table-b, then the total number of
iteration it suppose to do for this join_key value should be equal
to n*m)
elsif ( temp_a.join_key < temp_b.join_key )
then read the next record in temp_a
elsif ( temp_a.join_key > temp_b.join_key )
then read the record(s) in temp_b till temp_b.join_key >= temp_a.join_key
end if
end loop
LITTLE-KNOWN FACTS TO BE REMEMBERED:
· /*+ USE_MERGE(<<table-1>> <<table-2>>) */ is the hint that can be used to impose this sort merge outer join.
· It is completely untrue that Sort Merge Outer join does only TABLE ACCESS FULL on both the tables because in some cases, it will access the index table also but only if that cost is less.
· In sort merge outer join, Sort phase will be skipped if the data is read from the index table because it doesn’t require the sorting as the data is being read from the index table is already coming in the sorted order. Sort phase will be imposed only if the data is read from the data table directly(TABLE ACCESS FULL) because the sorting is required here as the data is not coming out in the sorted order.
· This methodology is especially opted by Oracle when both the tables are joined by inequality operators like <,>,<=,>= or both the tables are big. This is because Hash Outer Joins can’t be imposed when inequality operators are used and Nested Loop Outer Join is definitely not an option if both these tables are too big.
· This is very successful if both the tables are big and outplays nested loop outer join in this case.
ADVANTAGE:
· If the columns mentioned in the ORDER BY clause of the sql statement is same as the joining columns & tables are big, Optimizer prefers sort merge outer join over hash outer joins as it is cheaper. Reason is, sorting doesn’t need to be done explicitly again as the data coming out from sort merge outer join(due to the sorting phase) would already be in the sorted order and that is not the case in hash outer joins. In this case[sort merge outer join], you will not find this operation, SORT BY or ORDER BY in the explain plan which will confirm that explicit sorting is not done.
DISADVANTAGE:
· TEMP tablespace will be used extensively if the data read from both the tables don’t fit into the allocated memory space [which is SORT_AREA_SIZE component of PGA] and query will start to throw errors if even TEMP tablespace is not big enough to hold the data from both the tables
· If you closely look at the logic of sort merge outer join, records start to get displayed only in the merging phase as none will be displayed till the sorting phase is completed. Due to this reason, this join has to be avoided in some OLTP applications if the initial matching records need to be displayed as quick as possible.
· This is very resource intense process (especially CPU is utilized a lot) since both the big tables need to be sorted and merged in the memory & even in hard-disk [only if memory is not big enough to hold this data] as TEMP tablespace is created in hard-disk.
HOW TO VERIFY:
How to verify whether Oracle follows sort merge outer join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,
-------------------------------------------------------------------------------------------------------- |
| Id | Operation | Name | Rows | |
-------------------------------------------------------------------------------------------------------- |
| 0 | SELECT STATEMENT | | | | 1 | MERGE JOIN (OUTER) | | | | 2 | SORT JOIN | | | |
| 3 | TABLE ACCESS (FULL) | <<table-a>> | | | 4 | SORT JOIN | | | | 5 | TABLE ACCESS (FULL) | <<table-b>> | | |
In the explain plan, whenever it chooses this methodology, it displays the keyword [MERGE JOIN(OUTER) and SORT JOIN] in the operation column.
EXAMPLE:
Create 2 tables.
One table is store the information of the sales done by the employees for all the months and inserts 8 records.
Another table is store the information of the bonus received by the employees and inserts 5 records.
The tables will look like this,
DATA TABLE:
SALES:
ROWID | Empid | Empname | Sales_Month | Sales_Amt |
AAAAA1 | 1 | CHARLES | Jan-2011 | 2000 |
AAAAA2 | 7 | FERGUSON | Feb-2011 | 1000 |
AAAAA3 | 7 | FERGUSON | Jan-2011 | 4000 |
AAAAA4 | 7 | FERGUSON | Mar-2011 | 9000 |
AAAAA5 | 9 | NADAL | Jan-2011 | 3000 |
AAAAA6 | 9 | NADAL | Mar-2011 | 8000 |
AAAAA7 | 4 | ERIN | Jan-2011 | 7000 |
AAAAA8 | 6 | JORDAN | Feb-2011 | 5000 |
BONUS:
ROWID | Empid | Empname | Bonus_Amt |
AAAAB1 | 9 | NADAL | 60 |
AAAAB2 | 6 | JORDAN | 85 |
AAAAB3 | 3 | JOHN | 40 |
AAAAB4 | 7 | FERGUSON | 90 |
AAAAB5 | 2 | MIKE | 75 |
Fire this query against these tables where the requirement is to display all the employee name(s) along with their sales information for all the months. In addition to this, employee’s bonus amount should also be displayed if that employee received it.
Select a.empname, a.sales_month, a.sales_amt, b.bonus_amt
from sales a, bonus b
where a.empid = b.empid(+);
As per the design of sort merge join, the following steps will be followed while executing this query [as per the query, SALES is the parent table and BONUS is the child table so former will be retrieved first and then latter],
1. Oracle reads all the records from SALES table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid | Empname | Sales_Month | Sales_Amt |
1 | CHARLES | Jan-2011 | 2000 |
4 | ERIN | Jan-2011 | 7000 |
6 | JORDAN | Feb-2011 | 5000 |
7 | FERGUSON | Feb-2011 | 1000 |
7 | FERGUSON | Jan-2011 | 4000 |
7 | FERGUSON | Mar-2011 | 9000 |
9 | NADAL | Jan-2011 | 3000 |
9 | NADAL | Mar-2011 | 8000 |
2. Oracle reads all the records from BONUS table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid | Empname | Sales_Amt |
2 | MIKE | 75 |
3 | JOHN | 40 |
6 | JORDAN | 85 |
7 | FERGUSON | 90 |
9 | NADAL | 60 |
3. It reads the first record from both these intermediate result sets (Now, empid:1 from 1st table & empid:2 from 2nd table) [as per initial steps of merging phase in logical flow diagram]
4. Since empid:1 is less than empid:2, the joined record will be displayed but NULL for “Sales_Amt” column in the output [as per 1st IF clause of merging phase in logical flow diagram]
5. It reads the next record from 1st table since 1 is less than 2 (Now, empid:4 from 1st & empid:2 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
6. Since empid:4 is greater than empid:2, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
7. It reads the record till empid:6 in 2nd table since empid:2 & 3 from 2nd table is less than the existing record, empid:4 in 1st table (Now, empid:4 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
8. Since empid:4 is less than empid:6, the joined record will be displayed but NULL for “Sales_Amt” column in the output [as per 1st IF clause of merging phase in logical flow diagram]
9. It reads the next record from 1st table since 4 is less than 6 (Now, empid:6 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
10. Since empid:6 is equal to empid:6, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
11. It reads the next record from 1st table since 6 is equal to 6 (Now, empid:7 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
12. Since empid:7 is greater than empid:6, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
13. It reads the next record from 2nd table since 7 is greater than 6 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
14. Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
15. It reads the next record from 1st table since 7 is equal to 7 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
16. Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
17. It reads the next record from 1st table since 7 is equal to 7 (Now, empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
18. Since empid:7 is equal to empid:7, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
19. It reads the next record from 1st table since 7 is equal to 7 (Now, empid:9 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
20. Since empid:9 is greater than empid:7, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
21. It reads the next record from 2nd table since 9 is greater than 7 (Now, empid:9 from 1st & empid:9 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
22. Since empid:9 is equal to empid:9, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
23. It reads the next record from 1st table since 9 is equal to 9 (Now, empid:9 from 1st & empid:9 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
24. Since empid:9 is equal to empid:9, the joined record will be returned and value for the columns of both the tables will be displayed in the output [as per 1st IF clause of merging phase in logical flow diagram]
25. It exits as all the records from both the tables are processed [as per WHILE condition of merging phase in logical flow diagram]
So, the output will look like this,
Empname | Sales_Month | Sales_Amt | Bonus_Amt |
CHARLES | Jan-2011 | 2000 | |
ERIN | Jan-2011 | 7000 | |
JORDAN | Feb-2011 | 5000 | 85 |
FERGUSON | Feb-2011 | 1000 | 90 |
FERGUSON | Jan-2011 | 4000 | 90 |
FERGUSON | Mar-2011 | 9000 | 90 |
NADAL | Jan-2011 | 3000 | 60 |
NADAL | Mar-2011 | 8000 | 60 |
If you close look at the output, you can see that the value is displayed in “Bonus_Amt” column for the employees: JORDAN, FERGUSON & NADAL and nothing is displayed in the same column for the employess: CHARLES & ERIN nonetheless all the records from SALES table are returned in the output because of the outcome of sort merge outer join.
Now, explain table will look like this,
-------------------------------------------------------------------------------------------------------- |
| Id | Operation | Name | Rows | |
-------------------------------------------------------------------------------------------------------- |
| 6 | SELECT STATEMENT | | 8 | | 5 | MERGE JOIN (OUTER) | | 8 | | 2 | SORT JOIN | | 8 | |
| 1 | TABLE ACCESS (FULL) | SALES | 8 | | 4 | SORT JOIN | | 5 | | 3 | TABLE ACCESS (FULL) | BONUS | 5 | |
From the explain plan, we can say that both the tables are joined by sort merge outer join methodology.