Friday, December 31, 2010

Oracle Tuning Tip#19: Sort Merge Join



TOPIC:
Sort Merge Join

DEFINITION:
To perform sort merge join, Oracle follow these steps[assume table-a & table-b have to be joined]:
  1. 1.     Oracle read all the required records from table-a and sort the data by the joining column(s)
  2. 2.     Oracle read all the required records from table-b and sort the data by the same joining column(s)
  3. 3.     Oracle then compare all these sorted data from both these tables record-by-record based on the joining column(s) and returns the output if it gets matched

In this join, there is no concept of driving & driven table like nested loop join. Important point to understand is, unlike nested loop join where driven(inner) table is read as many number of times as the input from outer table, in sort merge join each of the tables involved are accessed at most once. Actually sort merge join has two phases, Sorting & Merging. Step-1 & Step-2 are of Sorting phase and Step-3 is of Merging phase. In the sorting phase, all the required records from both the tables are read and then get sorted. In the merging phase, records from both these tables are joined and only the matching records will be returned in the output.

The logical flow diagram for this methodology will be like this,

PHASE-I (Sorting)
   Read the records from table-A
   Sort the retrieved records of table-A by the join_key (which is nothing but the column(s) used for joining these two tables)
   Dump into temp_a (assume logically that temp_a is just a temporary memory space to hold this sorted data of table-A)

   Read the records from table-B
   Sort the retrieved records of table-B by the same join_key
   Dump into temp_b

PHASE-II (Merging)
   read 1st record from temp_a
   read 1st record from temp_b
   while [NOT End Of Record] on temp_a and temp_b
   loop
        if ( temp_a.join_key = temp_b.join_key )
                 then output joined record
        else if ( temp_a.join_key <> temp_b.join_key )
                 then don’t output the joined record
        end if
        if ( temp_a.join_key <= temp_b.join_key )
                 then read the next record in temp_a
        else if ( temp_a.join_key > temp_b.join_key )
                 then read the next record in temp_b
        end if
   end loop

LITTLE-KNOWN FACTS TO BE REMEMBERED:
  • ·         /*+ USE_MERGE(<<table-1>> <<table-2>>) */ is the hint that can be used to impose this sort merge join.
  • ·         It is completely untrue that Sort Merge join does only TABLE ACCESS FULL on both the tables because in some cases, it will access the index table also but only if that cost is less.
  • ·         In sort merge join, Sort phase will be skipped if the data is read from the index because it doesn’t require the sorting as the data is being read from the index is already in the sorted order. Sort phase will be imposed only if the data is read from the data table directly(TABLE ACCESS FULL) because the sorting is required here as the data is not coming out in the sorted order.
  • ·         This methodology is especially opted by Oracle when both the tables are joined by inequality operators like <,>,<=,>=. This is because Hash Join can’t be imposed when inequality operators are used and Nested Loop Join is definitely not an option if both these tables are too big.
  • ·         This is very successful if both the tables are bigger and outplays nested loop join in this case.

ADVANTAGE:
  • ·         If the columns mentioned in the ORDER BY clause of the sql statement is same as the joining columns, Optimizer prefers sort merge join over hash join as it is cheaper. Reason is, sorting doesn’t need to be done explicitly again as the data coming out from sort merge join would already be in the sorted order and that is not the case in hash join. In this case, you will not find this operation, SORT BY or ORDER BY in the explain plan which will confirm that explicit sorting is not done.

DISADVANTAGE:
  • ·         TEMP tablespace will be used extensively if the data read from both the tables don’t fit into the allocated memory space [which is exactly SORT_AREA_SIZE component of PGA] and query will start to throw errors if even TEMP tablespace is not big enough to hold the data from both the tables
  • ·         If you closely look at the logic of sort merge join, matching records start to get displayed only in the merging phase as none will be displayed till the sorting phase is completed. Due to this reason, this join has to be avoided in some OLTP applications if the initial matching records need to be displayed as quick as possible.
  • ·         This is very resource intense process (especially CPU is utilized a lot) since both the bigger tables need to be sorted and merged in the memory

HOW TO VERIFY:
How to verify whether Oracle follows sort merge join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                   |                                                          |            |
|   1 |    MERGE JOIN                                              |                                                          |            |
|   2 |       SORT JOIN                                               |                                                          |            |
|   3 |          TABLE ACCESS (FULL)                       | <<table-a>>                                |            |
|   4 |       SORT JOIN                                               |                                                          |            |
|   5 |          TABLE ACCESS (FULL)                       | <<table-b>>                               |            |

In the explain plan, whenever it chooses this methodology, it displays the keyword (MERGE JOIN and SORT JOIN) in the operation column.

EXAMPLE:
Create 2 tables.
One table is store the information of the sales done by the employees for January Month and inserts 5 records.
Another table is store the information of the sales done by the employees for February Month and inserts 5 records.
The tables will look like this,

DATA TABLE:
EMP_JAN:
ROWID
Empid
empname
Sales_Amt
AAAAA1
5
CHARLES
2500
AAAAA2
7
FERGUSON
1000
AAAAA3
9
NADAL
3000
AAAAA4
4
ERIN
7000
AAAAA5
6
ROONEY
5500

EMP_FEB:
ROWID
Empid
empname
Sales_Amt
AAAAA6
9
NADAL
6000
AAAAA7
6
ROONEY
8500
AAAAA8
8
JOHN
4000
AAAAA9
7
FERGUSON
9000
AAAAA10
2
MIKE
4500

Fire this query against these tables where the requirement is to display all the employee name(s) along with their sales information whoever able to do the sales on both the months,
Select a.empname, a.sales_amt january_amt, b.sales_amt february_amt
from emp_jan a, emp_feb b
where a.empid = b.empid;

As per the design of sort merge join, the following steps will be followed while executing this query,
1.     Oracle reads all the records from EMP_JAN table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid
empname
Sales_Amt
4
ERIN
7000
5
CHARLES
2500
6
ROONEY
5500
7
FERGUSON
1000
9
NADAL
3000
2.     Oracle reads all the records from EMP_FEB table and sorts it based on empid column. The intermediate result set will be like this, [as per sorting phase of logical flow diagram]
Empid
Empname
Sales_Amt
2
MIKE
4500
6
ROONEY
8500
7
FERGUSON
9000
8
JOHN
4000
9
NADAL
6000
3.     It reads the first record from both these intermediate result sets (empid:4 from 1st & empid:2 from 2nd) [as per initial steps of merging phase in logical flow diagram]
4.     Since 4 is not equal to 2, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
5.     It reads the next record from 2nd table since 4 is greater than 2 (empid:4 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
6.     Since 4 is not equal to 6, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
7.     It reads the next record from 1st table since 4 is less than 6  (empid:5 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
8.     Since 5 is not equal to 6, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
9.     It reads the next record from 1st table since 5 is less than 6 (empid:6 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
10.  Since 6 is equal to 6, the joined record will be displayed [as per 1st IF clause of merging phase in logical flow diagram]
11.  It reads the next record from 1st table since 6 is equal to 6 (empid:7 from 1st & empid:6 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
12.  Since 7 is not equal to 6, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
13.  It reads the next record from 2nd table since 7 is greater than 6 (empid:7 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
14.  Since 7 is equal to 7, the joined record will be displayed [as per 1st IF clause of merging phase in logical flow diagram]
15.  It reads the next record from 1st table since 7 is equal to 7 (empid:9 from 1st & empid:7 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
16.  Since 9 is greater than 7, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
17.  It reads the next record from 2nd table since 9 is greater than 7 (empid:9 from 1st & empid:8 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
18.  Since 9 is greater than 8, the joined record will not be displayed [as per 1st IF clause of merging phase in logical flow diagram]
19.  It reads the next record from 2nd table since 9 is greater than 8 (empid:9 from 1st & empid:9 from 2nd) [as per 2nd IF clause of merging phase in logical flow diagram]
20.  Since 9 is equal to 9, the joined record will be displayed [as per 1st IF clause of merging phase in logical flow diagram]
21.  It exits as all the records from both the tables are processed [as per WHILE condition of merging phase in logical flow diagram]

So, the output will look like this,
Empname
January_amt
February_amt
ROONEY
5500
8500
FERGUSON
1000
9000
NADAL
3000
6000

If you close look at the output, you can see that the output is displayed in the order of empid column (empids, 6,7 &9 of the employees, ROONEY, FERGUSON & NADAL accordingly). This is because of the outcome of sorting phase of sort merge join.

Now, explain table will look like this,
--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   6 | SELECT STATEMENT                                   |                                                          |  3          |
|   5 |    MERGE JOIN                                              |                                                          |  3          |
|   2 |       SORT JOIN                                               |                                                          |  5          |
|   1 |          TABLE ACCESS (FULL)                       | EMP_JAN                                     |  5          |
|   4 |       SORT JOIN                                               |                                                          |  5          |
|   3 |          TABLE ACCESS (FULL)                       | EMP_FEB                                      |  5          |

From the explain plan, we can say that both the tables are joined by sort merge natural join methodology.

Sunday, October 24, 2010

Oracle Tuning Tip#18: Nested Loop Join

TOPIC:
Nested Loop Join

DEFINITION:
When the optimizer tries to finalize the execution plan of a query, it considers a lot of items. It must take the interrelated decisions based on those items. Most important of those items are,
Ø  Access path
Ø  Join Order
Ø  Join Operation

Access path tells how the required data is going to be retrieved from a table. So, this tells nothing but which index scan is imposed on that table like index range, index skip scan and so on.
Join Order means, to execute a query that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result. It means oracle can join only two tables at most in a time though more than two tables are referred in a query. Oracle always tries to join the small tables first and then joins with the large tables. The reason behind is, it always tries to lower the number of resultant records formed while in the process of joining all the tables mentioned in the query. 
Join operations tells how the two tables are going to be joined. There are a lot of methodology available to join the two tables. They are,
·         Natural join
Ø  Nested loop natural join
Ø  Sort merge natural join
Ø  Hash natural join
·         Outer join
Ø  Nested loop outer join
Ø  Sort merge outer join
Ø  Hash outer join
·         Semi join
Ø  Nested loop semi join
Ø  Sort merge semi join
Ø  Hash semi join
·         Anti join
Ø  Nested loop anti join
Ø  Sort merge anti join
Ø  Hash anti join
·         Cartesian join
·         Star join
·         Star transformation

Natural join is otherwise called as either inner join or equi join.
In this join, a record will be displayed in an output only if it is available in the both the tables.
In order to achieve this, oracle can adopt either one of the three methodologies mentioned above.
In this, we will discuss about nested loop natural join.

To perform a nested loops join, Oracle follows these steps:
1.     The optimizer chooses one of the two tables those are going to be joined as the outer table, or the driving table. The other table is chosen as the inner table (or the driven table).
2.     For each row in the driving table, Oracle finds all rows in the driven table that satisfy the join condition.
3.     Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

So, in this approach, finalizing which table is going to be the driving table is the deciding factor. Generally, Oracle chooses the small table out of two as the driving table. Reason is, only if the small table is chosen as the driving table, it needs to refer the driven table for the least number of times. Because, as per the logic, 2nd & 3rd steps are going to executed in a loop and this number of iteration has to be reduced as much as possible in order to improve the performance. So, this can be achieved only if the small table is chosen as the driving table.

The logical activity diagram for this methodology will be like this,

loop (for all the records in the driving table)
Ø  Join with the records of the driven table
Ø  Output (or display) the matching record
End loop

LITTLE-KNOWN FACTS TO BE REMEMBERED:
·         /*+ USE_NL(<<inner table>>) */ is the hint that can be used to impose this nested loop natural join.
·         This methodology is opted by Oracle when both the tables are joined by equal (=) operator.
·         This is very successful when one table is smaller and another table is bigger.

ADVANTAGE:
·         If your requirement is to see the initial matching records as quick as possible, this is the best methodology to rely on. Reason is, as per the logic, though it is kinda of having the iterative structure, the matching records would be displayed in step-3 of each iteration.

DISADVANTAGE:
·         When this is opted for joining 2 big tables, you may see the initial matching records very fast but it would take a lot of time to display the final matching records.
·         This is very resource intense process (especially CPU is utilized a lot) since it has the iterative structure in the process

HOW TO VERIFY:
How to verify whether Oracle follows nested loop natural join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                   |                                                          |            |
|   1 |    NESTED LOOPS                                         |                                                          |            |
|   2 |       TABLE ACCESS (FULL)                          | <<small table’s name>>         |            |
|   3 |       TABLE ACCESS (FULL)                          | <<big table’s name>>             |            |

In the explain plan, whenever it chooses this methodology, it displays the keyword (NESTED LOOPS) in the operation column. Whichever the table name that is getting displayed immediately after this word is nothing but the driving table and the next one is the driven table.

EXAMPLE:
Create an employee table and inserts 10 records. The table will look like this,

DATA TABLE:
ROWID
empid
column2
column3
column4
empname
AAAAA1
5
..
..
..
CHARLES
AAAAA2
10
..
..
..
JOHN
AAAAA3
9
..
..
..
NADAL
AAAAA4
4
..
..
..
ERIN
AAAAA5
6
..
..
..
MATHEWS
AAAAA6
7
..
..
..
JAMAL
AAAAA7
1
..
..
..
PETER
AAAAA8
8
..
..
..
LUSY
AAAAA9
2
..
..
..
MARTHA
AAAAA10
3
..
..
..
ROSS

Create an unique index on this table for empid column. (create unique index emp_no_indx on emp(empid)).
Index table will logically look like this,

UNIQUE INDEX TABLE:
INDEX
ROWID
1
AAAAA7
2
AAAAA9
3
AAAAA10
4
AAAAA4
5
AAAAA1
6
AAAAA5
7
AAAAA6
8
AAAAA8
9
AAAAA3
10
AAAAA2

First column (INDEX) : it stores all the values of empid column in the ascending order.
Second column (ROWID) : it stores the ROWID of the corresponding record.

Then, Create an another table called, “bonus” which stores only the information about the employees who are getting the bonus and how much they get. The table will look like this,

DATA TABLE:
ROWID
empid
Group
Bonus_amt
AAAAA1
8
Employee
5000
AAAAA2
941
Contract
2500


Fire this query against this table where the requirement is to display the employee name whoever getting the bonus and how much they are getting,
Select emp.empname,bonus.bonus_amt
from emp, bonus
where emp.empid = bonus.empid;

When oracle executes this sql, since it has to join these two tables, oracle comes to know that one table (bonus) is very small and another table(emp) is kinda of big one. So, it imposes nested loop natural join.
As per the design of nested loop natural join, the following steps are followed while executing this query,
1.     Oracle chooses “bonus” table as the driving table and “emp” table as the driven table.
2.     It gets the first record of “bonus” table, that is empid:8.
3.     It hits the index table of “emp” table in order to find out if any index record is available with this value.
4.     It finds that 8th index record’s value is matching with this value. So, it retrieves the corresponding rowid, that is AAAAA8.
5.     It refers that 8th record in the data table of “emp” to get the employee name since it knows the corresponding rowid.
6.     It displays both empname and its bonus in the output.
7.     It gets the second record of “bonus” table, that is empid:941.
8.     It hits the index table of “emp” table in order to find out if any index record is available with this value.
9.     It finds none in the index table (since this record is of a contractor, the entry is not available in the “emp” table).
10.  This record doesn’t get displayed since it doesn’t have a matching entry in “emp” table.
11.  Since there are no records to be processed in “bonus” table, it exits.

So, the output will look like this,
empname
Bonus_amt
LUSY
5000

Now, explain table will look like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                          | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   5 | SELECT STATEMENT                                        |                                                          |     1       |
|   4 |   NESTED LOOPS                                               |                                                          |     1       |
|   1 |     TABLE ACCESS (FULL)                                 | BONUS                                          |     2       |
|   3 |     TABLE ACCESS (BY INDEX ROWID)        | EMP                                                |     1       |
|   2 |        INDEX (UNIQUE SCAN)                          | EMP_NO_INDX (UNIQUE)    |     1       |

From the explain plan, we can say that both the tables are joined by nested loop natural join methodology.”Bonus” table has been considered as the driving table and “Emp” table as the driven table.