Friday, 10 August 2012

11g R2 - Configuring Automatic restart of an Oracle Database


Hi,

My name is Mark Tiger, creator of this blog.  I am an Oracle Certified Professional (OCP DBA 11g).

Gathering information for some DBA tasks can be time-consuming, even although the commands that you need to issue eventually can be over quite quickly.  I have gone through this process over and over again, and have decided to help other Oracle DBA’s in the community.

In this blog, I will give you the details of how to carry out those tasks; that typically need a lot of research, before you can do them.  I will try to present the information in an easy to understand way.  My hope is that this will save you lots of time in research, and help to make you more productive as an Oracle DBA.  The illustrations are primarily meant for Linux, since this is a Platform; that enjoys preference from Oracle.  However they are easily adaptable for versions of UNIX/AIX and windows etc.

11g R2 - Configuring Automatic restart of an Oracle Database

Oracle Restart Overview
With Oracle Restart installed and configured, various oracle components can be automatically restarted after a software or hardware failure, or whenever your database Host server restarts.  These are the components that Oracle Restart can handle:
Database Instance:
·         Oracle restart can handle multiple database instances on a single host machine
Oracle NET Listener:
Database Services:
·         Does not include the default service created when you install, because it is automatically managed by the Oracle Database.  Does not include any other default services created during database creation.
Oracle Automatic Storage Management (ASM) instance:
Oracle ASM disk groups:
·         Restarting a disk group means mounting it
Oracle Notification Services (ONS)
·         In a standalone Server environment, ONS can be used in Oracle Data Guard installations for automating failover of connections between primary and standby database, through Fast Application Notification (FAN).  ONS is a service for sending FAN events integrated clients  upon failover.

Oracle restart will run periodic check  operations to monitor the health of these components.  If a check operations fails for a specific component, then that component will be shutdown and restarted.

Oracle Restart is designed for standalone single instance environments.   For RAC environments the functionality to automatically restart components is already provided by the Oracle Clusterware.

Oracle Restart runs out of the Oracle Grid Infrastructure Home.  This home is installed separately from the Oracle Database homes.

Startup Dependencies
Oracle Restart ensures that the components are started in the correct order.  For example if you are using ASM, then Oracle Restart ensures that the Oracle ASM instance is first started and the relevant diskgroups are mounted, before starting the database up.  Also if a component must be shutdown, then Oracle ensures that the dependent components are shutdown first.

Also the relationship between the listener and the database is managed.  Oracle restart will attempt to restart the listener before starting the database instance.  If the listener fails during the operation of the database, the Oracle Restart does not restart the database instance.

Starting and Stopping Components with Oracle Restart
Oracle Restart automatically starts and stops dependent components in pre-defined orders; when you startup or shutdown your system.

There may be times when you want to start or stop just one component, without the dependent components being affected.  The way to do this is to use srvctl, which is a utility that comes with Oracle Restart.

When you stop a component with srvctl, then Oracle Restart does not automatically stop the dependent components.  When you start that same component again with srvctl, then that component is again available for automatic restart.

Utilities such as SQL*Plus, lsnrctl(listener control), and ASMCMD are integrated with Oracle Restart.  If you shut the database down using SQL*Plus, the Oracle Restart does not try to restart the database.  If you shut the Oracle ASM instance down with SQL*Plus or ASMCMD, then Oracle Restart does not attempt to restart them.

The difference between starting a component with srvctl or another utility is:
·         When you start a component with srvctl, all the dependent components are first started in the correct order.
·         When you start a component with a utility like SQL*Plus, then the other components in the dependency chain are not automatically started.  You must first ensure that all the components in the dependent chain are first started.
·         Oracle Restart enables you to start or stop all of the dependent components in an Oracle Home or Grid infrastructure home with a single command; if you are using srvctl.

CRSCTL
The crsctl utility starts and stops Oracle Restart.  Crsctl is also used to enable and disable the Oracle High Availability services or daemons.  When the high availability services are disabled, then none of the components managed by Oracle Restart are started when a Node is rebooted.

Crsctl is useful, when you need to stop Oracle Restart while installing a patch or carrying out OS maintenance.  When your maintenance is complete, you can just start “Oracle restart” again with crsctl.

Oracle Restart Configuration
Oracle Restart maintains information for each component.  When a component is started, it is started according to the configuration information for that specific component.  The configuration includes the location of the spfile, and the TCP port that is being used for listeners.

If you install Oracle Restart, before you install the database with DBCA, then DBCA will automatically add the database to the Oracle Restart Configuration.

Otherwise you can manually add and remove components from the Oracle Restart configuration using the srvctl tool.  When you manually add a component to Oracle restart using srvctl then Oracle Restart starts to manage the component, restarting it when required.  Adding a component to the Oracle Restart configuration is also known as: “Registering a component with Oracle Restart”.

Other Oracle tools will automatically add the newly created components to the Oracle Restart Configuration:
·         Create a database with OUI or DBCA
·         Create an Oracle ASM instance with OUI, DBCA or ASMCA
·         Create a disk group using any method
·         Create a listener with NETCA
·         Create a database service with srvctl


In these cases the newly created components are not added to the Oracle Restart Configuration.
·         Create a database with the CREATE DATABASE statement in SQL*Plus
·         Create a database service, by modifying the SERVICE_NAMES, initialization parameter
·         Create a database service with the dbms_service.create_service() package
·         Create a standby database

Drop / remove / delete operations, which update the Oracle Restart Configuration
·         Delete a database with DBCA
·         Delete a listener with NETCA
·         Drop an Oracle ASM disk group, using any method
·         Delete a database service with srvctl

Drop / remove / delete operations, which don’t update the Oracle Restart Configuration
·         Delete a database by removing database files with OS commands
·         Delete a database service without using srvctl

Data Guard
Oracle restart is integrated with Data Guard and Data Guard Broker.
Following a Data Guard role transition, all database services configured to run in the new role are started, and all the services that are not configured are stopped.

When you add a database to the Oracle Restart Configuration, you can specify the current Data Guard role for the database:
PRIMARY, PHYSICAL_STANDBY, LOGICAL_STANDBY, or SNAPSHOT_STANDBY
If the role is later changed using the data Guard broker, then the new role is automatically updated on the Oracle Restart Configuration.  If you change the database role without the broker, then you must manually update the database role in the Oracle Restart configuration using srvctl.

When adding a database service to the Oracle Restart Configuration, you can specify one or more Data Guard roles for the service.  If you have this option configured, then Oracle Restart will only restart the database, if one of the service roles matches the current database role.

Oracle Restart, uses Oracle Notification Services(ONS) and Oracle Advanced Queues to publish Fast Application Notification(FAN) high availability events.  Clients integrated with Oracle Restart can use FAN to provide fast notification to clients, when an instance or service goes down.  The client can automate the failover between a primary database and a standby database.

Fast Application Notification(FAN)
Oracle Restart uses FAN to notify other processes about configuration changes, and service status changes that could be UP or DOWN events.  Integrated Oracle clients receive the events and respond. 

Applications can respond by propagating the error to the user, or by resubmitting the transaction and masking the error from the application user.  When a DOWN event occurs, integrated clients immediately clean up the connections.  When an UP event occurs, integrated clients create new connections to the new primary database instance.

Oracle restart publishes FAN events whenever a managed instance or service goes up or down.  After a failover the Oracle Data Guard Broker publishes FAN events:
·         Applications can use FAN with Oracle Restart without programmatic changes, provided that they use one of these Oracle Integrated Database Clients, which can be configured for FCF(Fast Connection Failover), to automatically connect to a new primary database after a failure.
o   Oracle Database JDBC
o   Universal connection pool for Java
o   Oracle Call Interface
o   Oracle Database ODP.NET
o    
·         FAN server side callouts can be configured on the database tier

For DOWN events, such as a failed primary database; FAN provides immediate notification to the clients.  This enables the clients to failover to the new primary database as fast as possible.  The clients don’t have to wait for a timeout, they are notified immediately, and if they are configured to failover, can immediately failover, even before the timeout occurs.

For UP events, when services and instances are started, new connections can immediately be started to take advantage of the of the extra resources.
FAN; using server side callouts can:
·         Log Status information
·         Open support tickets, and page DBA’s when resources fail to start
·         External dependent applications, that need to be co-located with a service can be automatically started

FAN events are published using ONS, Oracle streams , and Advanced Queues.  Queues are configured automatically when you create a service.  ONS must be manually configured using srvctl.

The connection manager (CMAN) and Oracle NET service listeners are integrated with FAN events.  This enables CMAN and the listeners to immediately de-register services that are associated with te failed instance.  This is important, because it will avoid requests being sent to a service or instance that is not available.

Oracle achieves high availability with Oracle Restart and FAN.  When Oracle Restart detects an outage, then it isolates the failed component, and recovers the dependent components.  In the case that the failed component is an instance, then after Data Guard has failed over to the standby database, then
Oracle Restart will start any services that are defined with the current role.

FAN events are published by Oracle Restart and Oracle Data Guard Broker through ONS and Advanced queuing.  Fan callouts is another way that you can perform notifications.  Callouts are run asynchronously, and are subject to scheduling variables.  Therefore with Callouts you can’t guarantee the order of events.

Oracle Restart; restarts and recovers services and instances automatically.  This includes starting and recovering the listener processes and the ASM instance if required.  You can also use FAN   callouts to interface or report faults to your fault management system, and to initiate repair jobs.

Managing Planned Outages with Oracle Restart
For Repairs, upgrades, and maintenance that requires you to shut down the primary database.  Oracle Restart provides interfaces that disable and enable services to minimize service disruption to disruption to application users.  To achieve a coordinated failover of the database service form the primary to the standby database; you should use Oracle Data Guard with Oracle Restart.  Once the maintenance is complete, then you can revert the service back to normal operation.

The important configuration with Oracle Restart is the management policy.
AUTOMATIC – service will start automatically
MANUAL  - you will have to start the service manually

FAN Events, for high availablility
Here is a description of the FAN event record parameters, and a description of their meanings:
Parameter
Description
VERSION
Version of the event record, Used to identify release changes
EVENT_TYPE
Database and instance types provide the database service, like DB_UNIQUE_NAME. DB_DOMAIN
Service types would include:
SERVICE, SERVICE_MEMBER, DATABASE, INSTANCE, NODE , ASM, SRV_PRECONNECT
DATABASE UNIQUE NAME
This is the database that is supporting the service; matches the initialization parameter for DB_UNIQUE_NAME, which defaults to the value of the initialization parameter DB_NAME
INSTANCE
The name of the instance that supports the service, matches the initialization parameter ORACLE_SID
NODE NAME
This matches the name of the node that support the service, or the name of the Node that has stopped.  This matches the node name known to CSS(Cluster Synchronization Services)
SERVICE
This matches the service name in DBA_SERVICES
STATUS
UP, DOWN, NOT_RESTARTING, PRECONN_UP, PRECONN_DOWN, UNKNOWN
REASON
Data_guard_failover, Failure, Dependency, User, Autostart, Restart
CARDINALITY
The current active number of service UP events
TIMESTAMP
Local time zone, this is used when ordering notification events




A FAN record matches the system context area signature per session.
SERVICE
sys_context(‘userenv’,’service_name’)
DATABASE UNIQUE NAME
sys_context(‘userenv’,’db_unique_name’)
INSTANCE
sys_context(‘userenv’,’instance_name’)
NODE NAME
sys_context(‘userenv’,’server_host’)



Using FAN callouts
FAN callouts are server side executables.  Oracle Restart executes a FAN callout immediately, when a high availability event occurs.
FAN callouts can be used to automate the following activities:
·         Opening fault tracking tickets
·         Sending messages to pages
·         Sending email
·         Starting and stopping server-side applications
·         Maintaining an uptime log.  Each event is logged as it occurs.

To make use of FAN callouts, you can place executables or shell scripts in this directory:
$GRID_HOME/racg/usrco  You can have the same scripts on both the primary and standby nodes.
For example:
$GRID_HOME/racg/usrco/Callout.sh callout

#! /bin/ksh
FAN_LOGFILE=[<custom path>]/admin/log/`hostname`_uptime.log
echo $* “reported=”`date` >> $FAN_LOGFILE &

You could get something like the following as output from the previous script:
NODE VERSION=1.3 host=AIX5 status=nodedown reason=
Timestamp=12-AUG-2012 09:10:00 reported=<formatted data time stamp>

Because FAN records match the sys context signature, you can try to determine which session matches the FAN record.

Oracle has integrated FAN with many of the Oracle Client drivers, that are used to connect to an “Oracle Restart” configured database.  You can therefore use FAN, by using one of the integrated clients.
CMAN session pools, Oracle Call Interface, Universal connection pool for Java, JDBC simplefan API, and ODP.NET connection pools. 
The goal should be to enable applications to consistently obtain connections to the current or available Primary database.

Mark Tiger,
Need a Database Health Check, Remote Monitoring, Support, Maintenance, or a Security Audit?

P.S. I am busy preparing Nine books on preparing for the Oracle 11g R2 OCM DBA exam.  Watch this spot for details on the books, as they start becoming available.

1 comment:

  1. Hi,

    I build the single instance database (On file system) manually through "create database ..." command on Windows 2008 Server and having issues with Database restart. Is there any way to configure the database NOW, with the "Oracle Restart" functionality? Please help.

    ReplyDelete