HOW TO: Analyzing and Interpreting AWR Report

Hello

Today I wanted to share with you my personal approach for how to get into the details on AWR Report and where to see and where should be our attention. Keep in mind just two main focuses : the Application and the Database instance

1) Application: Most of the times you should go directly to review how was the SQL statements Performance. That is MANDATORY and the result of your analysis will be the classic TOP 10 SQL with the worst performance. That should be called : Oracle Application’s Performance Tuning .

2) Instance: read once, twice or three times the AWR looking for issues. What kind of issues or metrics ?

For example here I have  a Conclusion and Workaround for one database instance issue regarding performance :

1) Tuning recommendations for PROD-ACME  database instance:

  • Based on the statistics of  I/O per event within the database  I recommended :
    • I saw  that for a total of  22MB most of the times their waits are due to lack of buffers cache  (used to put there disk blocks with table’s data) with the rows used. I found 86% of wait events are due this event for the database instance.
    • Action Item: Need to increase  of MB for buffers cache within the database in order to increase the available blocks and space in memory to keep the most used rows there.
    • How much ? In order to see how much we need to increase my recommendation is to put DB_CACHE_ADVICE initialization parameter to ON. Then review (after some weeks) and query the V$DB_CACHE_ADVICE system view so there you will get what is the optimal point to set buffers cache for this instance. (That Advisor was a new feature of 9i release) .

e.g. Waits and What Represents.

What does it mean ?

Function Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
Direct Reads 1.9T 54.28 23.4386 628M 0.06 .007273 0
Buffer Cache Reads 533.8G 313.70 6.32998 0M 0.00 0M 19M 2.79
Direct Writes 7.3G 0.12 .086315 283.4G 15.42 3.36128 0
Others 17.9G 4.78 .212123 45.1G 3.30 .534663 445.2K 2.05
LGWR 6M 0.01 .000069 24.7G 68.79 .292555 2966.9K 2.00
DBWR 29M 0.02 .000335 12.1G 8.12 .143538 1878 3.69
Streams AQ 3M 0.00 .000034 0M 0.00 0M 292 33.73
TOTAL: 2.5T 372.90 30.0675 365.9G 95.68 4.33931 22.4M 2.67

 

  • So you can see above that WAITS amount of Buffer Cache Reads represents the 85% of Waits for this instance.  In this case this was one of the bottlenecks.

 

 

2) Tuning recommendations for ERP  ACME Suite and Customs Applications :

Let’s see next section of the same example …as follows :

Based on the “awrrpt_1_12809_12883.html“ that I got here in my desk I reviews this section to see top 5 events of waits :

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
db file sequential read 53,182,945 124,071 2 30.56 User I/O
db file scattered read 11,816,482 103,231 9 25.43 User I/O
DB CPU 72,197 17.79
direct path read 3,195,308 32,314 10 7.96 User I/O
log file sync 4,246,317 17,633 4 4.34 Commit

,those numbers represent  :

Wait Event Represents
db file sequential read 73%
db file scattered read 16%
TOTAL OF DB WAITS: 90%

(Good Catch!)  Said that, let’s analyze those  db wait events :

Event 1: DB FILE SEQUENTIAL READ :

These waits usually indicate a single block read (e.g. Idex full scan : an index read with order by clause).

Causes and solutions :

1. Use of unselective index. Solution: 1) Check indexes on the table to ensure that the right index is being use, 2) Check the column order of the index with the WHERE clause of the Top SQL statements

2.  Index Fragmentation: If the DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable.  However, if the clustering factor approaches the number of rows in the table, it means the rows in the table are randomly ordered and thus it requires more I/Os to complete the operation.

Solution:1) You can improve the index’s clustering factor by rebuilding the table so that rows are ordered according to the index key and rebuilding the index thereafter.

3. High I/O on particular disk or mountpoint. Solution:1) Use in partitioning to reduce the amount of blocks being visited, 2) Make sure optimizer statistics are up to date, 3) Relocate ‘hot’ datafiles :    Place the tables used in SQL statement on a faster part of the disk. 4) Consider the usage of multiple buffer pools and cache frequently used indexes/tables in the KEEP pool, 5) Inspect the execution plans of the SQL statements that access data through indexes, 6) Tune the I/O subsystem
to return data faster.

4. Bad application design – (1st section of the Perf.tuning report). Solution: 1) Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical and logical reads.Inspect the execution plans of the SQL statements that access data through indexes

1.1).     Examine the SQL statement to see if a). it is doing a full-table scan when it should be using an index, b). it is using a wrong index or c). it can be rewritten to reduce the amount of data it retrieves d). it is appropriate for the SQL statements to access data through index lookups or would full table scan be more efficient? E). the statements use the right drivin table i.e. join order is proper?

5. Range scans on data spread in many different blocks. Solution: 1) check that range scans should not be using reverse indexes. 2)Load the data in sorted manner on the colums on which range scans will be there.

6.  Consider increasing the buffer cache to see if the expanded size will accommodate the additional blocks, therefore reducing the I/O and the wait.

 

Event 2: DB FILE SCATTERED READS :

This is a wait for multiple blocks to be read concurrently from disk.

Waits for multiple blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be read from disk while performing FTS  / Index fast full scans (no order by) .

Causes:

This wait is encountered because:1) As FTSs are pulled into memory, they are scattered throughout the buffer cache , since it is highly unlikely that they fall into contiguous buffers.2) Since a large number of blocks have to be read into the buffer cache, server process has to search for a large no. of free/usable blocks in buffer cache which leads to waits.

..and Solutions

– May try to cache frequently used small tables to avoid reading them into the memory over and over again.

– Need to optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT

– Need to use partition pruning: Partition tables/indexes so that only a partition is scanned each time.

– May you need to consider the usage of multiple buffer pools.

– Need to optimize the SQL statements that initiated most of this waits. We need to minimize the # of physical&logical reads.

For instance the developer need to ask as follows:

Should the SQL statement access the data by a FTS  or index FFS? Or would an index range or unique scan  be more efficient? Or does the query use a right driving table? Or are the SQL predicates appropriate for hash or merge join? Or ,  If FTSs are appropriate, can parallel query improve the RT (response time)?

We need to make sure all statistics are representative of the actual data. Check the LAST_ANALYZED date (I already checked).

The GOAL is :  to reduce the demands for the logical and physical I/Os, and this is best achieved through SQL/Application tuning.

3) Tuning recommendations for Data Access (how to store the information within the Oracle Database).

Typical objections: Your suggestion to have different block sizes to some tables is not good for OLTP databases. We need to create tablespaces with different block sizes and also different buffer cache for each block size. For this, we need to reduce the cache area for some tables and provide this area to another. So, depending of the situation, we know that OLTP transactions are small and with short duration and for this reason we need a optimal block size. If we increase the block size we can increase the I/O for these small transactions because for each read operation need to processes more data. For OLAP transaction that uses larger block sizes, the situation is different and almost all the SQLs process a lot of data and for this reason larger block sizes is better.

What can I do ? What can I say ?

That is IDEAL In a perfect world …and I would say : YES. I AGREE.

…but we are living in a normal world (with unperfect people) …and I would add that both the OLTP and OLAP systems have to coexist within the same database so we need to realize that in this world with this people you should analyze CASE by CASE and make a decision.

Most of the database today are running OLTP and OLAP in a mixed environment. No chance for normalizations within the OLAP tables and yes, small blocks used to store the data wont have good  performance.

In the other hand is also true that big blocks used for OLTP tables will create unnecessary problems.

Good or not it is a trend in the Data warehousing today…and is a technology challenge.

(Those are my 2 cents for today).

Jorge Acevedo