Saturday, February 19, 2011

Nested Loop Outer Join : Oracle Tuning Tip#21



TOPIC:
Nested Loop Outer Join

DEFINITION:

Before we go into this topic in detail, first we will understand what outer join is meant by.
For an example, take 2 tables which have to be joined, say “Parent” table & “Child” table [I have taken these table names, only to discuss this topic in an easy way].

If Child table is outer joined with Parent table, then you can say, the matching records from both these tables will be retrieved and then the records which are available only in Parent table will also be retrieved though the corresponding matching records are not available in Child table. This is called outer join. [in natural or equi-join, only the matching records from both the tables will be retrieved.]

To perform this outer join, 3 different join techniques have been provided by oracle.
  • ·         Nested loop outer join
  • ·         Hash outer join
  • ·         Sort merge outer join

In this, we will discuss about nested loop outer join.

To perform this nested loop outer join, Oracle follows these steps (assume, child table has to be outer-joined with parent table):
  • 1.     The optimizer chooses parent table as the outer table, or the driving table. Child table is chosen as the inner table (or the driven table).
  • 2.     For each row in the driving table, if a matching record exists in the driven table, columns (mentioned in the query) from both the tables will be returned.
  • 3.     If a matching record doesn’t exist in the driven table, columns from parent table alone will be returned and NULL will be returned in the places of child table’s columns mentioned in the query. 
As you aware, in the nested loop equi-join, deciding which table is going to be the driving table is the deciding factor. Generally, the smallest of the two tables will be considered as the driving table so that cost will be low. [cost will be low because only minimal number of looping has to be done]. However, in this nested loop outer join, this decision making process becomes immaterial since Parent table will be chosen as the driving table by default irrespective of whether it is a small or big table. Reason is, joining condition determines which table is going to be the driving table in this nested loop outer join instead of cost which is the metric used to identify the driving table in the nested loop equi-join.

The logical activity diagram for this methodology will be like this,

LOOP (for all the required rows in the driving table)
  • ·         IF (a matching record from the driven table is available) THEN

Ø  output the matching record and display both the tables’ columns mentioned in the query;
  • ·         ELSE (a matching record from the driven table is not available)

Ø  Output this unmatched driving table’s record anyway and display NULL for the driven table’s columns mentioned in the query;
  • ·         END IF;

END LOOP;

LITTLE-KNOWN FACTS TO BE REMEMBERED:
·         /*+ USE_NL(<<inner table>>) */ is the hint that can be used to impose this nested loop outer join.
·         This methodology is opted by Oracle when the tables are joined by outer join “(+)” operator.
·         This is very successful when the parent table is smaller, child table is bigger and child table is outer-joined with parent table

ADVANTAGE:
·         If your requirement is to see the initial matching records (though outer-join is mentioned) as quick as possible, this is the best methodology to rely on. Reason is, as per the logic, though it is kinda having the iterative structure, the matching records would be displayed in each iteration.
·         If parent table is small, then looping has to be done for minimal number of times and this technique will work efficiently.

DISADVANTAGE:
·         When this is opted for joining 2 big tables, you may see the initial matching records very fast but it would take a lot of time to display the final matching records.
·         This is very resource intense process (especially CPU is utilized a lot) since it has the iterative structure in the process

HOW TO VERIFY:
How to verify whether Oracle follows nested loop outer join or not while executing the sql query. If a query follows this, then you will find similar execution plan like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                     | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                   |                                                          |            |
|   1 |    NESTED LOOPS (OUTER)                       |                                                          |            |
|   2 |       TABLE ACCESS (FULL)                          | <<parent table’s name>>      |            |
|   3 |       TABLE ACCESS (FULL)                          | <<child table’s name>>          |            |
In the explain plan, whenever it chooses this methodology, it displays the keyword [NESTED LOOPS (OUTER)] in the operation column. Whichever the table name that is getting displayed immediately after this word is nothing but the driving table and that’s why you will find parent table’s name in that position followed by child table’s name.

EXAMPLE:

In order to understand how the mechanism of nested loop outer join is different from nested loop equi-join, I am taking the same example which I used there (Tip#18) with a few modifications and imposing the outer join.
Create an employee table and inserts 10 records. The table will look like this,

DATA TABLE:
ROWID
Empid
column2
column3
salary
Empname
AAAAA1
5
..
..
1130
CHARLES
AAAAA2
10
..
..
1890
JOHN
AAAAA3
9
..
..
1800
NADAL
AAAAA4
4
..
..
1270
ERIN
AAAAA5
6
..
..
1900
FEGRUSON
AAAAA6
7
..
..
1740
MIKE
AAAAA7
1
..
..
1620
PETER
AAAAA8
8
..
..
1290
SILVA
AAAAA9
2
..
..
1580
MARTHA
AAAAA10
3
..
..
1200
BECKY

Create an unique index on this table for empid column. (create unique index emp_no_indx on emp(empid)).
Index table will logically look like this,

UNIQUE INDEX TABLE:
INDEX
ROWID
1
AAAAA7
2
AAAAA9
3
AAAAA10
4
AAAAA4
5
AAAAA1
6
AAAAA5
7
AAAAA6
8
AAAAA8
9
AAAAA3
10
AAAAA2

First column (INDEX) : it stores all the values of empid column in the ascending order.
Second column (ROWID) : it stores the ROWID of the corresponding record.

Then, Create an another table called, “bonus” which stores only the information about the employees who are getting the bonus and how much they get. The table will look like this,

DATA TABLE:
ROWID
Empid
Group
Bonus_amt
AAAAB1
6
SERVICE
90
AAAAB2
3
SERVICE
80
AAAAB3
9
HR
80


Fire this query against these tables where the requirement is to display the name, salary, bonus (if they are getting) of all the employess, so the query will be like this,
Select emp.empname,emp.salary,bonus.bonus_amt
from emp, bonus
where emp.empid = bonus.empid(+);

As per this query, emp is the parent table and bonus is the child table since the latter[bonus.empid(+)] is outer-joined with the former.
So, oracle will consider emp as the driving able and bonus as the driven table though no indexes are available in the latter. That’s the reason cost is not the deciding factor and join condition is the deciding factor in nested loop outer join.

As per the design of nested loop outer join, the following steps are followed while executing this query,
1.     Oracle chooses “emp” table as the driving table and “bonus” table as the driven table.
2.     It gets the first record of “emp” table, that is empid:5.
3.     It hits “bonus” table and try to find the matching record for empid:5.
4.     Since there is no empid:5 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
5.     It gets the next record of “emp” table, that is empid:10.
6.     It hits “bonus” table and try to find the matching record for empid:10.
7.     Since there is no empid:10 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
8.     It gets the next record of “emp” table, that is empid:9.
9.     It hits “bonus” table and try to find the matching record for empid:9.
10.  Since there is empid:9 in “bonus” table, Empname & salary columns from “emp” table gets displayed and bonus_amt column from “bonus” table gets displayed.
11.  It gets the next record of “emp” table, that is empid:4.
12.  It hits “bonus” table and try to find the matching record for empid:4.
13.  Since there is no empid:4 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
14.  It gets the next record of “emp” table, that is empid:6.
15.  It hits “bonus” table and try to find the matching record for empid:6.
16.  Since there is empid:6 in “bonus” table, Empname & salary columns from “emp” table gets displayed and bonus_amt column from “bonus” table gets displayed.
17.  It gets the next record of “emp” table, that is empid:7.
18.  It hits “bonus” table and try to find the matching record for empid:7.
19.  Since there is no empid:7 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
20.  It gets the next record of “emp” table, that is empid:1.
21.  It hits “bonus” table and try to find the matching record for empid:1.
22.  Since there is no empid:1 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
23.  It gets the next record of “emp” table, that is empid:8.
24.  It hits “bonus” table and try to find the matching record for empid:8.
25.  Since there is no empid:8 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
26.  It gets the next record of “emp” table, that is empid:2.
27.  It hits “bonus” table and try to find the matching record for empid:2.
28.  Since there is no empid:2 in “bonus” table, Empname & salary columns from “emp” table gets displayed and nothing gets displayed for bonus_amt column.
29.  It gets the next record of “emp” table, that is empid:3.
30.  It hits “bonus” table and try to find the matching record for empid:3.
31.  Since there is empid:3 in “bonus” table, Empname & salary columns from “emp” table gets displayed and bonus_amt column from “bonus” table gets displayed.
32.  Since there is no more records to be processed in “emp” table, it exits.

So, the output will look like this,
Empname
salary
Bonus_amt
CHARLES
1130
JOHN
1890
NADAL
1800
80
ERIN
1270
FEGRUSON
1900
90
MIKE
1740
PETER
1620
SILVA
1290
MARTHA
1580
BECKY
1200
80


Now, explain table will look like this,

--------------------------------------------------------------------------------------------------------
| Id  | Operation                                                          | Name                                             | Rows  |
--------------------------------------------------------------------------------------------------------
|   4 | SELECT STATEMENT                                        |                                                          |     10     |
|   3 |   NESTED LOOPS (OUTER)                             |                                                          |     10     |
|   1 |     TABLE ACCESS (FULL)                                 | EMP                                                |     10     |
|   2 |     TABLE ACCESS (FULL)                                 | BONUS                                          |     4       |

From the explain plan, we can say that both the tables have been outer-joined by nested loop outer join methodology.”Emp” table has been considered as the driving table and “Bonus” table as the driven table.

6 comments: