db
file sequential reads
Possible Causes :
· Use of an unselective index
· Fragmented Indexes
· High I/O on a particular disk or mount
point
· Bad application design
· Index reads performance can be affected
by slow I/O subsystem and/or poor database files layout, which
result in a higher average wait time
Actions :
· Check indexes on the table to ensure that
the right index is being used
· Check the column order of the index
with the WHERE clause of the Top SQL statements
· Rebuild indexes with a high clustering
factor
· Use partitioning to reduce the amount of
blocks being visited
· Make sure optimizer statistics are up to
date
· Relocate ‘hot’ datafiles
· Consider the usage of multiple buffer pools
and cache frequently used indexes/tables in the KEEP pool
· Inspect the execution plans of the SQL
statements that access data through indexes
· Is it appropriate for the SQL statements to
access data through index lookups?
· Would full table scans be more efficient?
· Do the statements use the right
driving table?
· The optimization goal is to minimize
both the number of logical and physical I/Os.
Remarks:
· The Oracle process wants a block that is
currently not in the SGA, and it is waiting for the database block to be read
into the SGA from disk.
· Significant db file sequential read wait
time is most likely an application issue.
· If the DBA_INDEXES.CLUSTERING_FACTOR of the
index approaches the number of blocks in the table, then most of the rows in
the table are ordered. This is desirable.
· However, if the clustering factor approaches
the number of rows in the table, it means the rows in the table are randomly
ordered and thus it requires more I/Os to complete the operation. You can
improve the index’s clustering factor by rebuilding the table so that rows are
ordered according to the index key and rebuilding the index thereafter.
· The OPTIMIZER_INDEX_COST_ADJ and
OPTIMIZER_INDEX_CACHING initialization parameters can influence the optimizer
to favour the nested loops operation and choose an index access path over a
full table scan.
db
file scattered reads
Possible Causes :
· The Oracle session has requested and is
waiting for multiple contiguous database blocks (up to
DB_FILE_MULTIBLOCK_READ_COUNT) to be read into the SGA from
disk.
· Full Table scans
· Fast Full Index Scans
Actions :
· Optimize multi-block I/O by setting the
parameter DB_FILE_MULTIBLOCK_READ_COUNT
· Partition pruning to reduce number of blocks
visited
· Consider the usage of multiple buffer pools
and cache frequently used indexes/tables in the KEEP pool
· Optimize the SQL statement that initiated
most of the waits. The goal is to minimize the number of physical
and logical reads.
· Should the statement access the data by a
full table scan or index FFS? Would an index range or unique scan
be more efficient? Does the query use the
right driving table?
· Are the SQL predicates appropriate for hash
or merge join?
· If full scans are appropriate, can
parallel query improve the response time?
· The objective is to reduce the demands for
both the logical and physical I/Os, and this is best
achieved through SQL and application tuning.
· Make sure all statistics are representative
of the actual data. Check the LAST_ANALYZED date
Remarks:
· If an application that has been running fine
for a while suddenly clocks a lot of time on the db file scattered read event
and there hasn’t been a code change, you might want to check to see if one or
more indexes has been dropped or become unusable.
· Or whether the stats has been stale.
log
file parallel write
Possible Causes :
· LGWR waits while writing contents of the
redo log buffer cache to the online log files on disk
· I/O wait on sub system holding the
online redo log files
Actions :
· Reduce the amount of redo being generated
· Do not leave tablespaces in hot backup mode
for longer than necessary
· Do not use RAID 5 for redo log files
· Use faster disks for redo log files
· Ensure that the disks holding the archived
redo log files and the online redo log files are separate so as to avoid
contention
· Consider using NOLOGGING or UNRECOVERABLE
options in SQL statements
log
file sync:
Possible Causes :
· Oracle foreground processes are waiting for
a COMMIT or ROLLBACK to complete
Actions :
· Tune LGWR to get good throughput to
disk eg: Do not put redo logs on RAID5
· Reduce overall number of commits by batching
transactions so that there are fewer distinct COMMIT operations
Actions :
- Tune LGWR to get good throughput to disk eg: Do not put redo logs on RAID5
- Reduce overall number of commits by batching transactions so that there are fewer distinct COMMIT operations
buffer
busy waits:
Possible Causes :
· Buffer busy waits are common in an I/O-bound
Oracle system.
· The two main cases where this can occur
are:
· Another session is reading the block into
the buffer
· Another session holds the buffer in an
incompatible mode to our request
· These waits indicate read/read, read/write,
or write/write contention.
· The Oracle session is waiting to pin a
buffer .A buffer must be pinned before it can be read or modified. Only one
process can pin a
buffer at any one time.
· This wait can be intensified by a large
block size as more rows can be contained within the block
· This wait happens when a session wants to
access a database block in the buffer cache but it cannot as the buffer is
“busy
· It is also often due to several processes
repeatedly reading the same blocks (eg: i lots of people scan the same index or
data block)
Actions :
· The main way to reduce buffer busy waits is
to reduce the total I/O on the system
· Depending on the block type, the actions
will differ
Data Blocks
· Eliminate HOT blocks from the application.
Check for repeatedly scanned / unselective indexes.
· Try rebuilding the object with a higher
PCTFREE so that you reduce the number of rows per block.
·
Check for ‘right- hand-indexes’ (indexes that
get inserted into at the same point by many processes).
· Increase INITRANS and MAXTRANS and reduce
PCTUSED This will make the table less dense .
· Reduce the number of rows per block
Segment Header
· Increase of number of FREELISTs
and FREELIST GROUPs
Undo Header
· Increase the number of Rollback Segments
free
buffer waits:
Possible Causes :
· This means we are waiting for a free buffer
but there are none available in the cache because there are too many dirty
buffers in the cache
· Either the buffer cache is too small or the
DBWR is slow in writing modified buffers to disk
· DBWR is unable to keep up to the write
requests
· Checkpoints happening too fast – maybe
due to high database activity and under-sized online redo log files
· Large sorts and full table scans are filling
the cache with modified blocks faster than the DBWR is able to write to
disk
· If the number of dirty buffers that
need to be written to disk is larger than the number that DBWR can write
per batch, then these waits can be observed
Actions :
Reduce checkpoint frequency – increase the
size of the online redo log files
Examine the size of the buffer cache – consider
increasing the size of the buffer cache in the SGA
Set disk_asynch_io = true set
If not using asynchronous I/O increase the number
of db writer processes or dbwr slaves
Ensure hot spots do not exist by spreading
datafiles over disks and disk controllers
Pre-sorting or reorganizing data can help
enqueue
waits
Possible Causes :
· This wait event indicates a wait for a
lock that is held by another session (or sessions) in an incompatible
mode to the requested mode.
TX Transaction Lock
· Generally due to table or application set up
issues
· This indicates contention for row-level
lock. This wait occurs when a transaction tries to update or delete rows that
are currently
locked by another transaction.
· This usually is an application issue.
TM DML enqueue lock
· Generally due to application issues,
particularly if foreign key constraints have not been indexed.
ST lock
· Database actions that modify the UET$ (used
extent) and FET$ (free extent) tables require the ST lock, which includes
actions such as drop, truncate, and coalesce.
· Contention for the ST lock indicates there
are multiple sessions actively performing
· dynamic disk space allocation or
deallocation
· in dictionary managed tablespaces
Actions :
· Reduce waits and wait times
· The action to take depends on the lock
type which is causing the most problems
· Whenever you see an enqueue wait event for
the TX enqueue, the first step is to find out who the blocker is and if there
are multiple waiters for the same resource
· Waits for TM enqueue in Mode 3 are primarily
due to unindexed foreign key columns.
· Create indexes on foreign keys <
10g
· Following are some of the things you can do
to minimize ST lock contention in your database:
· Use locally managed tablespaces
· Recreate all temporary tablespaces
using the CREATE TEMPORARY TABLESPACE TEMPFILE… command.
Cache
buffer chain latch
Possible Causes :
· Processes need to get this latch when
they need to move buffers based on the LRU block replacement policy in
the buffer cache
· The cache buffer lru chain latch is acquired
in order to introduce a new block into the buffer cache and when writing a
buffer
back to disk, specifically when trying to
scan the LRU (least recently used) chain containing all the dirty blocks in the
buffer
cache. Competition for the cache buffers lru chain
.
· latch is symptomatic of intense buffer
cache activity caused by inefficient SQL statements. Statements
that repeatedly scan
· large unselective indexes or perform full
table scans are the prime culprits.
· Heavy contention for this latch is
generally due to heavy buffer cache activity which can be caused,
for example, by:
Repeatedly scanning large unselective indexes
Actions :
Contention in this latch can be avoided
implementing multiple buffer pools or increasing the number of LRU latches with
the parameter DB_BLOCK_LRU_LATCHES (The default value is generally
sufficient for most systems).
Its possible to reduce contention for the cache
buffer lru chain latch by increasing the size of the buffer cache
and thereby reducing the rate at which new blocks are introduced
into the buffer cache.
Direct Path Reads
Possible Causes :
· These waits are associated with direct read
operations which read data directly into the sessions PGA bypassing the SGA
· The “direct path read” and “direct path
write” wait events are related to operations that are performed in PGA like
sorting, group by operation, hash join
· In DSS type systems, or during heavy batch
periods, waits on “direct path read” are quite normal However, for an OLTP
system these waits are significant
· These wait events can occur during sorting
operations which is not surprising as direct path reads and writes usually
occur in connection with temporary tsegments
· SQL statements with functions that require
sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs
to the temporary tablespace when the input size is larger than the work area in
the PGA
Actions :
Ensure the OS asynchronous IO is configured
correctly.
Check for IO heavy sessions / SQL and see if the
amount of IO can be reduced.
Ensure no disks are IO bound.
Set your PGA_AGGREGATE_TARGET to appropriate value
(if the parameter WORKAREA_SIZE_POLICY = AUTO) Or set *_area_size manually
(like sort_area_size and then you have to set WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION,
and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS
instead of HASH JOIN.
Make sure the optimizer selects the right
driving table. Check to see if the composite index’s columns can be rearranged
to match the ORDER BY clause to avoid sort entirely.
Also, consider automating the SQL work areas using
PGA_AGGREGATE_TARGET in Oracle9i Database.
Query V$SESSTAT> to identify sessions with high
“physical reads direct”
Remark:
· Default size of HASH_AREA_SIZE is
twice that of SORT_AREA_SIZE
· Larger HASH_AREA_SIZE will influence
optimizer to go for hash joins instead of nested loops
· Hidden parameter DB_FILE_DIRECT_IO_COUNT can
impact the direct path read performance.It sets the maximum I/O buffer size of
direct read and write operations. Default is 1M in 9i
Direct
Path Writes:
Possible Causes :
· These are waits that are associated with
direct write operations that write data from users’ PGAs to data files or
temporary tablespaces
· Direct load operations (eg: Create Table
as Select (CTAS) may use this)
· Parallel DML operations
· Sort IO (when a sort does not fit in memory
Actions :
If the file indicates a temporary tablespace
check for unexpected disk sort operations.
Ensure
is TRUE . This is unlikely to reduce wait times
from the wait event timings but
may reduce sessions elapsed times (as synchronous
direct IO is not accounted for in wait event timings).
Ensure the OS asynchronous IO is configured
correctly.
Ensure no disks are IO bound
Latch
Free Waits
Possible Causes :
· This wait indicates that the process is
waiting for a latch that is currently busy (held by another
process).
· When you see a latch free wait event in the
V$SESSION_WAIT view, it means the process failed to obtain the latch in
the
willing-to-wait mode after spinning
_SPIN_COUNT times and went to sleep. When processes compete heavily for
latches, they will also consume more CPU resources because of spinning. The
result is a higher response time
Actions :
· If the TIME spent waiting for latches is
significant then it is best to determine which latches are suffering from
contention.
Remark:
· A latch is a kind of low level lock. Latches
apply only to memory structures in the SGA. They do not apply to database
objects. An Oracle SGA has many latches, and they exist to protect various
memory structures from potential corruption by concurrent access.
· The time spent on latch waits is an effect,
not a cause; the cause is that you are doing too many block gets, and block
gets require cache buffer chain latching
Library cache latch
Possible Causes :
· The library cache latches protect the
cached SQL statements and objects definitions held in the library cache within
the shared pool. The library cache latch must be acquired in order to add a new
statement to the library cache.
· Application is making heavy use of literal
SQL- use of bind variables will reduce this latch considerably
Actions :
· Latch is to ensure that the application is
reusing as much as possible SQL statement representation. Use bind variables
whenever ossible in the application.
· You can reduce the library cache latch hold
time by properly setting the SESSION_CACHED_CURSORS parameter.
· Consider increasing shared pool.
Remark:
· Larger shared pools tend to have long
free lists and processes that need to allocate space in them must spend
extra time scanning the long free lists while holding the shared pool latch
· if your database is not yet on
Oracle9i Database, an oversized shared pool can increase the contention for the
shared pool latch..
Shared
pool latch
Possible Causes :
The shared pool latch is used to protect critical
operations when allocating and freeing memory in the shared pool
Contentions for the shared pool and library cache
latches are mainly due to intense hard parsing. A hard parse applies to
new cursors and cursors that are aged out and must be re-executed
The cost of parsing a new SQL statement is
expensive both in terms of CPU requirements and the number of times the
library cache and shared pool latches may need to be acquired and
released.
Actions :
· Ways to reduce the shared pool latch
are, avoid hard parses when possible, parse once, execute many.
· Eliminating literal SQL is also useful to
avoid the shared pool latch. The size of the shared_pool and use of
MTS (shared server option) also greatly influences the shared pool
latch.
· The workaround is to set the initialization
parameter CURSOR_SHARING to FORCE. This allows statements that
differ in literal
values but are otherwise identical to share a
cursor and therefore reduce latch contention, memory usage, and hard
parse.
Row
cache objects latch
Possible Causes :
This latch comes into play when user processes are
attempting to access the cached data dictionary values.
Actions :
· It is not common to have contention in this
latch and the only way to reduce contention for this latch is by increasing the
size of the shared pool (SHARED_POOL_SIZE).
· Use Locally Managed tablespaces for your
application objects especially indexes
· Review and amend your database logical
design , a good example is to merge or decrease the number of indexes on
tables with heavy inserts
Remark:
· Configuring the library cache to an
acceptable size usually ensures that the data dictionary cache is also
properly sized. So tuning Library Cache will tune Row Cache indirectly.
What waits are occurring in your database?
In
our final example of the power of the V$ tables, we can see what waits are
happening in our database. Waits are conditions where a session is waiting for
something to happen. Waits can be caused by a number of things from slow disks,
to locking situations (like the one we saw above) to various kinds of internal
Oracle contention.
Waits
come in two main kinds of flavors, system-level and session level. The
system-level waits represent a high level summary of all session-level waits.
Session-level waits then are session specific waits for specific sessions.
System
waits come in different wait classes. Classes such as idle waits
have no real impact on the database at all in most cases (there are some
rare exceptions). You can see the current wait class waits by querying the
v$system_wait_class view as seen in this example:
SQL> Select
wait_class, sum(time_waited), sum(time_waited)/sum(total_waits)2 Sum_Waits3
From v$system_wait_class
Group by wait_class5 Order by 3
desc;
WAIT_CLASS
SUM(time_waited) SUM_WAITS--------------- ----------------
----------Idle
9899040431
151.750403Application
3147344
77.2183812Concurrency
491226
26.0846432Other
431875
6.65036957Administrative
718
5.52307692Configuration
23691
1.85114862Commit
89282 .302570846User
I/O
2826520 .289489185System
I/O
646700
.1372763Network
415446 .007569151
Note
that in this output we find that the idle wait class far outweighs the other
wait event classes, which is often the case for healthy databases. We do see
some other waits of interest; particularly the application and concurrency
waits have some time accumulated. Let s see what waits are causing us problems.
To
do this, we drill down to the next level of wait events, using the
v$system_event view. This gives us more detailed wait event information, and we
can associate the waits with our wait classes as we have done in the following
SQL:
Select a.event, a.total_waits, a.time_waited, a.average_wait
From
v$system_event a, v$event_name b, v$system_wait_class c
Where
a.event_id=b.event_id
And b.wait_class#=c.wait_class#
And c.wait_class in
('Application','Concurrency') 6 order by average_wait
desc;
EVENT
TOTAL_WAITS time_waited average_wait------------------------------ -----------
----------- ------------enq: TX - row lock
contention
10669 3197011
300buffer busy
waits
14218
470221 33library
cache
pin
270
4462 17library
cache load
lock
177
1783 10latch:
library
cache
3673
14115 4latch:
cache buffers
chains
329
494 2latch:
In memory undo latch
13
26 2row cache
lock
2
4 2latch:
library cache lock
55
46 1latch:
library cache pin
95
74 1enq: RO -
fast object reuse
303
49 0enq: TM -
contention
1
0 0SQL*Net
break/reset to client
29689
1106 0SQL*Net
break/reset to dblink
280
1 0
In
this report we find that the top problem appears to be Enqueue waits, that's
the enq: TX row lock contention event is an enqueue. There are a huge
number of events in Oracle Database 10g (811 in my 10g database!) so you can't
possibly know what each one means. They are all documented in the Oracle
reference guide. Also, if you go out to a search engine such as Google and
search for the event, you will often find someone who has had problems with it,
and you will find lots of help in correcting the problem.
In
this case an Enqueue wait has to do with the blocking locks we looked at
earlier in this chapter. So in Oracle an enqueue is just another word for lock.
If we saw that this was the big problem, I'd start monitoring my sessions to
try to figure out what is causing locking issues. We can drill down
even further into the session level if we like. In this case, let's see if
anyone is causing locking still:
select
a.sid, a.event, a.total_waits, a.time_waited, a.average_waitfrom
v$session_event a, v$session bwhere time_waited > 0and a.sid=b.sid
and b.username is not NULLand a.event='enq: TX - row lock
contention';
SID
EVENT
TOTAL_WAITS time_waited average_wait---------- ------------------------------
----------- ----------- ------------ 110
enq: TX - row lock
contention
14 4211
301
Note
the time_waited and average_wait columns are in Centaseconds in this
view. If we wanted to see the time in seconds we would have to divide the
time by 100. Hence, we can see that we have waited for this blocking lock for 4
seconds, up until now.
As
we see, SID 110 is blocked right now!! Quickly, let s see who is blocking this
session by re-running our earlier query:
SQL> Select blocking_session, sid, serial#,
wait_class,
2 seconds_in_wait
3 From v$session
4 where blocking_session is not NULL
5 order by blocking_session;
BLOCKING_SESSION
SID SERIAL# WAIT_CLASS
SECONDS_IN_WAIT---------------- ---------- ---------- ---------------
---------------
161
110 561
Application
246
This
is clearly a problem. We see that the block continues to be held (now 246
seconds into the blocking event!). We need to go find out who SID 161 is and
run them out of town. We do that by returning to the query at the beginning of
this section with a slight modification:
SQL> select sid, serial#, username, osuser, machine
from v$session
2 where username is not NULL;
SID SERIAL#
USERNAME
OSUSER MACHINE---------- ---------- -------------------------
---------- -------------------
161 43123
GRUMPY
grummy htmldb.com
COLUMN
wait_class format a20
COLUMN
name format a30
COLUMN
time_secs format 999,999,999,999.99
COLUMN
pct format 99.99
SELECT
wait_class,
NAME,
ROUND
(time_secs, 2) time_secs,
ROUND (time_secs * 100 / SUM (time_secs) OVER (), 2) pct
FROM
(SELECT
n.wait_class,
e.event
NAME,
e.time_waited
/ 100 time_secs
FROM
v$system_event
e,
v$event_name
n
WHERE
n.NAME
= e.event AND n.wait_class <> 'Idle'
AND
time_waited
> 0
UNION
SELECT
'CPU',
'server
CPU',
SUM
(VALUE / 1000000) time_secs
FROM
v$sys_time_model
WHERE
stat_name
IN ('background cpu time', 'DB CPU'))
ORDER
BY
time_secs
DESC;
----------------All of the blogs are for my own reference only---------------------
No comments:
Post a Comment