Sunday 19 August 2012

11g R2 – Trace Files and Alert Logs


Hi,

My name is Mark Tiger, creator of this blog.  I am an Oracle Certified Professional (OCP DBA 11g).

Gathering information for some DBA tasks can be time-consuming, even although the commands that you need to issue eventually can be over quite quickly.  I have gone through this process over and over again, and have decided to help other Oracle DBA’s in the community.

In this blog, I will give you the details of how to carry out those tasks; that typically need a lot of research, before you can do them.  I will try to present the information in an easy to understand way.  My hope is that this will save you lots of time in research, and help to make you more productive as an Oracle DBA.  The illustrations are primarily meant for Linux, since this is a Platform; that enjoys preference from Oracle.  However they are easily adaptable for versions of UNIX/AIX and windows etc.

11g R2 – Trace Files and Alert Logs

You use the trace files and alert log to monitor your database on a regular basis.  This way you are familiar with the normal operation of the database, when you start looking at an error condition.  Being familiar with the normal operation will help you to understand an error condition better.  The home page of Enterprise manager contains ways to monitor the Oracle Database.

Each server and background process can write to an associated trace file.  When an error condition is detected by a process, it dumps information about the error condition to its associated trace file.  Some of the information is intended for the Oracle DBA, some of the information is intended for Oracle Support Services.  The trace file information is also useful for tuning applications and instances.
The alert log is a chronological log of messages and errors, and includes:
·         All internal errors (ORA-00600), block corruption errors (ORA-01578), and deadlock errors (ORA-00060)
·         Database admin operations such as: CREATE, ALTER, DROP, STARTUP, SHUTDOWN, and ARCHIVELOG statements
·         Messages and errors relating to the functions of shared server and dispatcher processes
·         Errors occurring during the automatic refresh of a materialized view
·         The values of all initialization parameters that had non-default values when the instance and database were started up and opened

Oracle uses the alert log to record these operations, as an alternative to displaying them on an operators console.  If an operation is successful, then a completed message is written to the alert log, which includes a timestamp.
In Unix/Linux you can monitor the alert log file on a monitor by using the command:
$ tail –f alert_ORACLE_SID.log
You will then see the entries in real time, as they are appended to the alert_ORACLE_SID.log file.

In windows it is a bit more tricky to get it working.  However I have had success with powershell, using  this command:
> get-content alert_ORACLE_SID.log -wait
The only thing is it will first display the entire alert log, and then get to the mode, where it spools the latest changes out.  Of course you can rename your alert log to alert.timestamp.log, when appropriate, to make it more manageable.

The alert log is maintained in test format and XML format.  You can read both forms with a text editor.  You can read the XML version with the ADRCI utility, which strips the tags from the view displayed.

You should check the alert log and trace files of an instance periodically to learn whether the background processes have encountered errors.  For example if the LGWR can’t write to a member of a log group, an error message indicating the nature of the problem is written to the alert log, and to the LGWR trace file.  Such an error indicates media failure or an I/O error, and should be attended to immediately.

You can also monitor the alert log from within SQL*Plus:
SQL> select distinct
   originating_timestamp,
   message_text
from   x$dbgalertext
where  originating_timestamp > sysdate-1
order by 1;

The Oracle database also writes the values of initialization parameters to the alert log, as well as various important performance statistics.  The alert log and all the trace files are written to the “Automatic Diagnostic Repository”.  The location of the Automatic diagnostic repository is documented in the DIAGNOSTIC_DEST initialization parameter.  The names of the trace files are operating system specific, but will usually contain the name of the process wring the file, for example LGWR, RECO  etc.  The name of the alert log would usually be something like “alert_ORACLE_SID.log”.

Controlling trace files
You can limit the maximum size of all trace files by using the initialization parameter: MAX_DUMP_FILE_SIZE.  Fortunately this is a dynamic parameter, so you can set it without a database restart.
SQL> alter system set max_dump_file_size=’20m’ scope=both;

The alert log is a different situation. The information will continue to be appended to the alert log, regardless of the size it grows to.  It is a good idea to delete it from time to time to control the size.  Better to rename it to something like alert.timestamp.log on a regular basis, so that you have a history of what happened in the past, in case you have a future problem; that requires investigating the history of error conditions of the instance.

Background processes always write information to a trace file when appropriate.  With the ARCn background process you can control the amount of trace written, by setting the LOG_ARCHIVE_TRACE initialization parameter:
SQL> alter system set log_archive_trace=12 scope=both;

Trace files are always written on behalf of server processes, whenever critical conditions occur.

You can also control the amount of SQL trace information.  If you set the SQL_TRACE initialization parameter to true, then performance statistics for all SQL statements is generated and saved to the Automatic Diagnostic Repository.

You can also control the SQL_TRACE setting from a session level.  For example:
SQL> alter session set SQL_TRACE=TRUE;
You can use the DBMS_SESSION and DBMS_MONITOR packages to control SQL Tracing for a session.

The SQL Trace facility can carry a significant system overhead, which could result in a substantial performance impact.  So SQL Trace should only be enabled to collect statistics, and not for normal database operation.

If shared server is enabled, then each session using a dispatcher is routed to a shared server process; and trace information is written to the trace file only if the session has tracing enabled or if an error conditions occurs.  In this situation tracing for a specific session may involve looking through several trace files.  Oracle provides a trace utility to assist: TRCSESS, which consolidates all the trace information for a session in one place, and orders it by timestamp.

Monitoring Database operations with Server Generated Alerts
A server generated alert is typically a notification from the Oracle database server of an impending problem.  The notification may also contain suggestions to remedy the situation.  Notifications will also be provided when the error condition has been cleared.

Alerts are automatically generated for error conditions, or when data does not match expected values for metrics, for example.
·         Physical reads per second
·         User commits per second
·         SQL Service response time

Server-generated alerts can be based on threshold levels or be generated because of an error condition. Threshold-based alerts can be triggered at both warning and critical levels.  These levels can be either internal values or customer defined values.  Tablespace usage has a warning generated at 85% usage and at the critical level of 97%.  These can be adjusted to requirements.

Alerts are not always based on threshold levels:
·         Snapshot too old
·         Resumable Session Suspended
·         Recovery Area Space Usage

An alert message is periodically sent to the pre-defined persistent queue ALERT_QUE owned by user SYS.  Oracle Enterprise Manager reads this queue and provides notifications about outstanding Server alerts, and may also provide suggestions for correcting the problem.  The alerts are displayed on the Enterprise Manager Database Home Page.  You can configure notifications for these alerts to be emailed or paged to selected administrators.  If an alert can’t be written to the ALERT_QUE, then a message about the alert is written to the database alert_ORACLE_SID.log. 

The data is periodically flushed to the Automatic Workload Repository, to capture or create a history of metric values.  The alert history table and ALERT_QUE are purged automatically by the system at regular intervals.

You can view and set threshold settings for the server alert metrics using the SET_THRESHOLD and GET_THRESHOLD procedures of the DBMS_SERVER_ALERT package.  You can  do this through Enterprise Manager.

For example setting the CPU threshold time:
DBMS_SERVER_ALERT.SET_THRESHOLD
(
    DBMS_SERVER_ALERT.CPU_TIME_PER_CALL,
    DBMS_SERVER_ALERT.OPERATOR_GE,
    ‘8000’,
    DBMS_SERVER_ALERT.OPERATOR_GE,
    ‘10000’,
    1,
    2,
    ‘inst1’,
    DBMS_SERVER_ALERT.OBJECT_TYPE_SERVICE,
    ‘main.regress.rdbms.dev.testcompany.com’
);

In the above example a warning message is issued if the CPU time exceeds 8000 microseconds for a user call.  A critical alert is issued if a user call exceeds 10,000 microseconds of CPU time.
·         DBMS_SERVER_ALERT.CPU_TIME_PER_CALL, specifies the metric to be addressed
·         The observation period is set to 1 minute.  This indicates the number of minutes that the condition must deviate from the threshold value before the alert is issued.  This helps to filter out transient conditions
·         The number of consecutive occurrences is set to 2.  This specifies how many times the metric must violate the metric before the alert is generated
·         The name of the instance is set to ‘inst1’
·         DBMS_SERVER_ALERT.OBJECT_TYPE_SERVICE  specifies the object type on which the service is set.  In this case the threshold is set on: ‘main.regress.rdbms.dev.testcompany.com’

To retrieve threshold values, you can use the GET_THRESHOLD procedure.
DECLARE
   warning_operator                        BINARY_INTEGER;
  warning_value                                VARCHAR2(60);
   critical_operator                            BINARY_INTEGER;
   critical_value                                   VARCHAR2(60);
   observation_period                     BINARY_INTEGER;
   consecutive_occurrences         BINARY_INTEGER;
BEGIN
   DBMS_SERVER_ALERT.GET_THRESHOLD
(  
                DBMS_SERVER_ALERT.CPU_TIME_PER_CALL,
warning_operator,
warning_value,
                critical_operator,
critical_value,
observation_period,
                consecutive_occurrences,
'inst1',
                DBMS_SERVER_ALERT.OBJECT_TYPE_SERVICE,
‘main.regress.rdbms.dev.testcompany.com’
);
   DBMS_OUTPUT.PUT_LINE('Warning operator: ' || warning_operator);
   DBMS_OUTPUT.PUT_LINE('Warning value: ' || warning_value);
   DBMS_OUTPUT.PUT_LINE('Critical operator: ' || critical_operator);
   DBMS_OUTPUT.PUT_LINE('Critical value: ' || critical_value);
   DBMS_OUTPUT.PUT_LINE('Observation_period: ' || observation_period);
   DBMS_OUTPUT.PUT_LINE('Consecutive occurrences:' || consecutive_occurrences);
END;
/

You can also check the DBA_THRESHOLDS view:
SQL> SELECT metrics_name, warning_value, critical_value, consecutive_occurrences
  2  FROM DBA_THRESHOLDS
  3  WHERE metrics_name LIKE '%CPU Time%';

Viewing Server Generated Alerts
The easiest way to view server-generated alerts is by accessing the Database Home Page of Enterprise Manager.  There are however a number of other ways to view these server generated alerts.

If you are not using enterprise manager:
You must subscribe to the ALERT_QUE, and read the ALERT_QUE.  To create an agent and subscribe the agent to the ALERT_QUE, use the CREATE_AQ_AGENT, and ADD_SUBSCRIBER procedures of the DBMS_AQADM package.

Then you must associate a database user with the subscribing agent, because only a user associated with the subscribing agent can access queued messages in the secure ALERT_QUE.  You must also assign the ENQUEUE privilege to the user.  Use the ENABLE_DB_ACCESS and GRANT_QUEUE_PRIVILEGE procedures of the DBMS_AQADM package.
You also have the option of registering with the DBMS_AQ.REGISTER procedure to receive an asynchronous notification when an alert is enqueued to ALERT_QUE.  The notification can be in the form of an email, an HTTP post, or a PL/SQL Procedure.

To read an alert message you can use the DBMS_AQ.DEQUEUE procedure or OCIAQDeq call.  After the message has been dequeued, use the DBMS_SERVER_ALERT.EXPAND_MESSAGE procedure to expand the text of the message.

The following views provide information about server-generated alerts:
View
Description
DBA_THRESHOLDS
Lists the threshold settings defined for the database
DBA_OUTSTANDING_ALERTS
Describes the outstanding alerts in the database
DBA_ALERT_HISTORY
Lists a history of alerts that have been cleared
V$ALERT_TYPES
Provides information such a group and type for each alert
V$METRICNAME
Contains the names, identifiers, and other information about the system metrics
V$METRIC
Contains system level metric values
V$METRIC_HISTORY
Contains a history of system level metric values
  
Monitoring Performance
Locks are mechanisms that prevent the destructive interaction between transactions accessing the same resources.  The resources can be user objects such as tables and rows or system objects, such as shared data structures in memory and data dictionary rows.  The Oracle database automatically manages locks when executing SQL statements.  You can lock data manually if you wish to.

A deadlock occurs when two or more users are waiting for data that is locked by each other.  Deadlocks will prevent certain transactions from continuing to work.  The Oracle Database automatically detects deadlock situations and resolves them by rolling back one of the statements in involved in the deadlock, and thereby releasing one set of conflicting row locks.

The Oracle database has been designed to avoid deadlocks, so they are not that common.  Most often deadlocks occur, when transactions explicitly overwrite the default locking mechanism of the database.  Deadlocks can be a performance issue, so Oracle provides scripts and views to enable you to monitor locks.

$ORACLE_HOME\rdbms\admin\catblock.sql
This creates the views that is used with this script:
$ORACLE_HOME\rdbms\admin\utllockt.sql

Utllockt.sql displays in a tree fashion, the sessions in the system that are waiting for locks, and the locks that are waiting for them.

Wait events are statistics that are incremented by a server process to indicate that it had to wait for an event to complete, before being able to continue processing.  Sessions may be required to wait for a number of reasons.  The session could need to wait for user input, Operating system, Disk writes, locks or latches etc.

When a session is waiting for a resource it does not do any useful work.  When you get a large number of waits, it could be a problem.  Wait event data reveals various symptoms or problems that may be affecting performance, such as latch contention, buffer contention, and I/O contention.
General performance monitoring data dictionary views
View
Description
V$LOCK
List the locks currently held by Oracle Database and outstanding requests for a lock or latch
DBA_BLOCKERS
Displays a session if it is holding a lock on an object for which another session is waiting
DBA_WAITERS
Displays a session if it is waiting for a locked object
DBA_DDL_LOCKS
Lists all DDL locks held in the database and all outstanding requests for a DDL lock
DBA_DML_LOCKS
Lists all DML locks held in the database and all outstanding requests for a DML lock
DBA_LOCK
Lists all locks or latches held in the database and all outstanding requests for a lock or latch
DBA_LOCK_INTERNAL
Displays a row for each lock or latch that is being held, and one row for each outstanding requests for a lock or latch
V$LOCKED_OBJECT
Lists all locks acquired by every transaction on the system
V$SESSION_WAIT
Lists the objects or resources for which active sessions are waiting
V$SYSSTAT
Contains session statistics
V$RESOURCE_LIMIT
Provides information about current and maximum resource utilization for some system resources
V$SQL_AREA
Contains statistics about shared SQL area and contains one row for each SQL string.  Also provides statistics about SQL statements that are in memory, parsed and ready for execution
V$LATCH
Contains statistics for non parent latches and summary statistics for parent latches





Mark Tiger,
Need a Database Health Check, Remote Monitoring, Support, Maintenance, or a Security Audit?

P.S. I am busy preparing Nine books on preparing for the Oracle 11g R2 OCM DBA exam.  Watch this spot for details on the books, as they start becoming available.

2 comments:

  1. Did you know that that you can make cash by locking special pages of your blog / site?
    All you need to do is to join AdWorkMedia and add their Content Locking tool.

    ReplyDelete
  2. Immediately drawn by writer’s exclusive way of writing.
    best-medical-alert-systems-reviews

    ReplyDelete