Sunday, October 24, 2010

Oracle Tuning Tip#18: Nested Loop Join

TOPIC:
Nested Loop Join

DEFINITION:
When the optimizer tries to finalize the execution plan of a query, it considers a lot of items. It must take the interrelated decisions based on those items. Most important of those items are,
Ø  Access path
Ø  Join Order
Ø  Join Operation

Access path tells how the required data is going to be retrieved from a table. So, this tells nothing but which index scan is imposed on that table like index range, index skip scan and so on.
Join Order means, to execute a query that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result. It means oracle can join only two tables at most in a time though more than two tables are referred in a query. Oracle always tries to join the small tables first and then joins with the large tables. The reason behind is, it always tries to lower the number of resultant records formed while in the process of joining all the tables mentioned in the query. 
Join operations tells how the two tables are going to be joined. There are a lot of methodology available to join the two tables. They are,
·         Natural join
Ø  Nested loop natural join
Ø  Sort merge natural join
Ø  Hash natural join
·         Outer join
Ø  Nested loop outer join
Ø  Sort merge outer join
Ø  Hash outer join
·         Semi join
Ø  Nested loop semi join
Ø  Sort merge semi join
Ø  Hash semi join
·         Anti join
Ø  Nested loop anti join
Ø  Sort merge anti join
Ø  Hash anti join
·         Cartesian join
·         Star join
·         Star transformation

Natural join is otherwise called as either inner join or equi join.
In this join, a record will be displayed in an output only if it is available in the both the tables.
In order to achieve this, oracle can adopt either one of the three methodologies mentioned above.
In this, we will discuss about nested loop natural join.

To perform a nested loops join, Oracle follows these steps:
1.     The optimizer chooses one of the two tables those are going to be joined as the outer table, or the driving table. The other table is chosen as the inner table (or the driven table).
2.     For each row in the driving table, Oracle finds all rows in the driven table that satisfy the join condition.
3.     Oracle combines the data in each pair of rows that satisfy the join condition and returns the resulting rows.

So, in this approach, finalizing which table is going to be the driving table is the deciding factor. Generally, Oracle chooses the small table out of two as the driving table. Reason is, only if the small table is chosen as the driving table, it needs to refer the driven table for the least number of times. Because, as per the logic, 2nd & 3rd steps are going to executed in a loop and this number of iteration has to be reduced as much as possible in order to improve the performance. So, this can be achieved only if the small table is chosen as the driving table.

The logical activity diagram for this methodology will be like this,

loop (for all the records in the driving table)
Ø  Join with the records of the driven table
Ø  Output (or display) the matching record
End loop

LITTLE-KNOWN FACTS TO BE REMEMBERED:
·         /*+ USE_NL(<<inner table>>) */ is the hint that can be used to impose this nested loop natural join.
·         This methodology is opted by Oracle when both the tables are joined by equal (=) operator.
·         This is very successful when one table is smaller and another table is bigger.

ADVANTAGE:
·         If your requirement is to see the initial matching records as quick as possible, this is the best methodology to rely on. Reason is, as per the logic, though it is kinda of having the iterative structure, the matching records would be displayed in step-3 of each iteration.

DISADVANTAGE:
·         When this is opted for joining 2 big tables, you may see the initial matching records very fast but it would take a lot of time to display the final matching records.
·         This is very resource intense process (especially CPU is utilized a lot) since it has the iterative structure in the process

HOW TO VERIFY:
How to verify whether Oracle follows nested loop natural join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                   |                                                          |            |
|   1 |    NESTED LOOPS                                         |                                                          |            |
|   2 |       TABLE ACCESS (FULL)                          | <<small table’s name>>         |            |
|   3 |       TABLE ACCESS (FULL)                          | <<big table’s name>>             |            |

In the explain plan, whenever it chooses this methodology, it displays the keyword (NESTED LOOPS) in the operation column. Whichever the table name that is getting displayed immediately after this word is nothing but the driving table and the next one is the driven table.

EXAMPLE:
Create an employee table and inserts 10 records. The table will look like this,

DATA TABLE:
ROWID
empid
column2
column3
column4
empname
AAAAA1
5
..
..
..
CHARLES
AAAAA2
10
..
..
..
JOHN
AAAAA3
9
..
..
..
NADAL
AAAAA4
4
..
..
..
ERIN
AAAAA5
6
..
..
..
MATHEWS
AAAAA6
7
..
..
..
JAMAL
AAAAA7
1
..
..
..
PETER
AAAAA8
8
..
..
..
LUSY
AAAAA9
2
..
..
..
MARTHA
AAAAA10
3
..
..
..
ROSS

Create an unique index on this table for empid column. (create unique index emp_no_indx on emp(empid)).
Index table will logically look like this,

UNIQUE INDEX TABLE:
INDEX
ROWID
1
AAAAA7
2
AAAAA9
3
AAAAA10
4
AAAAA4
5
AAAAA1
6
AAAAA5
7
AAAAA6
8
AAAAA8
9
AAAAA3
10
AAAAA2

First column (INDEX) : it stores all the values of empid column in the ascending order.
Second column (ROWID) : it stores the ROWID of the corresponding record.

Then, Create an another table called, “bonus” which stores only the information about the employees who are getting the bonus and how much they get. The table will look like this,

DATA TABLE:
ROWID
empid
Group
Bonus_amt
AAAAA1
8
Employee
5000
AAAAA2
941
Contract
2500


Fire this query against this table where the requirement is to display the employee name whoever getting the bonus and how much they are getting,
Select emp.empname,bonus.bonus_amt
from emp, bonus
where emp.empid = bonus.empid;

When oracle executes this sql, since it has to join these two tables, oracle comes to know that one table (bonus) is very small and another table(emp) is kinda of big one. So, it imposes nested loop natural join.
As per the design of nested loop natural join, the following steps are followed while executing this query,
1.     Oracle chooses “bonus” table as the driving table and “emp” table as the driven table.
2.     It gets the first record of “bonus” table, that is empid:8.
3.     It hits the index table of “emp” table in order to find out if any index record is available with this value.
4.     It finds that 8th index record’s value is matching with this value. So, it retrieves the corresponding rowid, that is AAAAA8.
5.     It refers that 8th record in the data table of “emp” to get the employee name since it knows the corresponding rowid.
6.     It displays both empname and its bonus in the output.
7.     It gets the second record of “bonus” table, that is empid:941.
8.     It hits the index table of “emp” table in order to find out if any index record is available with this value.
9.     It finds none in the index table (since this record is of a contractor, the entry is not available in the “emp” table).
10.  This record doesn’t get displayed since it doesn’t have a matching entry in “emp” table.
11.  Since there are no records to be processed in “bonus” table, it exits.

So, the output will look like this,
empname
Bonus_amt
LUSY
5000

Now, explain table will look like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                          | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   5 | SELECT STATEMENT                                        |                                                          |     1       |
|   4 |   NESTED LOOPS                                               |                                                          |     1       |
|   1 |     TABLE ACCESS (FULL)                                 | BONUS                                          |     2       |
|   3 |     TABLE ACCESS (BY INDEX ROWID)        | EMP                                                |     1       |
|   2 |        INDEX (UNIQUE SCAN)                          | EMP_NO_INDX (UNIQUE)    |     1       |

From the explain plan, we can say that both the tables are joined by nested loop natural join methodology.”Bonus” table has been considered as the driving table and “Emp” table as the driven table.