Sunday, 12 August 2012

11g R2 – Managing Redo Logs

Hi,

My name is Mark Tiger, creator of this blog.  I am an Oracle Certified Professional (OCP DBA 11g).

Gathering information for some DBA tasks can be time-consuming, even although the commands that you need to issue eventually can be over quite quickly.  I have gone through this process over and over again, and have decided to help other Oracle DBA’s in the community.

In this blog, I will give you the details of how to carry out those tasks; that typically need a lot of research, before you can do them.  I will try to present the information in an easy to understand way.  My hope is that this will save you lots of time in research, and help to make you more productive as an Oracle DBA.  The illustrations are primarily meant for Linux, since this is a Platform; that enjoys preference from Oracle.  However they are easily adaptable for versions of UNIX/AIX and windows etc.

11g R2 – Managing Redo Logs

The redo logs are very important structures for recovery operations.  They consist of two or more pre-allocated files that store all the changes made to the database, as they occur.  Each instance has at least one redo log to protect the database, in case of an instance failure.

Typically only one instance accesses an Oracle Database.  In a RAC environment, we refer to the redo logs for each instance as “redo threads”.  Each instance has its own redo thread, which consists of a set of redo logs for that instance.  This arrangement eliminates the potential bottleneck, of having contention for a single set of redo logs.  We will focus on one instance, so the thread is assumed to be thread 1, in a typical installation.

Redo logs are filled with redo entries.  A redo entry is made up of a set of change vectors. A change vector is a description of a change made to a single block in the database.  A change to a table will affect: redo entry containing change vectors that describe the changes to the data segment blocks for the table, the undo segment data block is written to, the transaction table for the undo segments.

Redo entries record data that you can use to reconstruct all changes made to the database, this includes the undo segments.  The redo log also protects redo or undo data.  When you recover the database using the redo logs, the Oracle Database reads the change vectors in the redo entries, and applies the changes to the affected data blocks.

Redo records are buffered in a circular fashion in the log buffer of the SGA.  Whenever a transaction is committed, the background process LGWR(Log Writer), writes the redo records from the redo log buffer in the SGA, and writes them to a redo log file, and assigns a SCN(System change number), to identify the records for each committed transaction.  When the redo records from a given transaction are safely written to the redo log files, then the user process is notified that the transaction has been committed.

If the redo log buffer fills, or another transaction commits, then LGWR flushes all the redo log records in the redo log buffer to a redo log file.  Some of the transactions written to the redo log file, may not yet be committed.  However the Oracle database can roll these transactions back if required.

Writing to the Redo Log
There should be two or more redo log files.  Oracle Database requires a minimum of two redo log files, so that one redo log is always available, while the other one is being archived.

LGWR writes to redo log files in a circular fashion.  When one redo log file is full, LGWR moves onto the next one and starts writing.  When LGWR, get back to the first redo log file, it overwrites the first one.

The ability of LGWR, to overwrite already full redo log files, depends on whether archiving is anabled or not.
·         If the database is in NOARCHIVELOG mode then archiving is disabled.  A filled redo log file is available to be overwritten, after the changes recorded in it have been written to the data files.
·         If e database is in ARCHIVELOG mode the archiving is enabled.  A filled redo log file is available to be overwritten, after the changes recorded in it have been written to the data files, and the redo log file has been archived.

Active and inactive redo log files
LGWR writes to one redo log buffer at a time, all the data from the redo log buffer.  The redo log file that LGWR is actively writing to is called the “current” redo log file.

Redo log files that are required for instance recovery are called active redo log files.  Redo log files, that are no longer required for instance recovery are called inactive redo log files.

If the database is in ARCHIVELOG mode, which is the recommended mode of operation; then the LGWR, can’t overwrite a redo log file, until it has been archived by ARCn (Archiver background process). 

Log Switches
A log switch is the point where the database stops writing to one redo log file, and starts writing to the next redo log file.  Under normal conditions, a log switch occurs, when the current redo log file is full, and writing must continue to the next available redo log file.  Log switches can be configured to occur at regular intervals, regardless of whether the redo log file is full of not.  You can also force a log switch manually.

When LGWR begins writing to a new redo log file, the Oracle Database assigns a log sequence number to the new redo log file.  When the Oracle Database archives redo log files, The archived log retains its log sequence number.  Each online or archived redo log file is uniquely identified by its log sequence number.  During crash, instance, or media recovery the Oracle Database applies redo log files in ascending order, by using the log sequence number of the archived and redo log files.

Multiplexing redo log files
To protect against a failure involving the redo log file.  Oracle allows you to multiplex the redo log files.  Best practice requires two copies of each redo log file, on different disks if possible.  This redundancy protects against I/O errors, file corruption, even if the copies on the same disk.  LGWR will write to all the identical copies of the redo logs, concurrently.  This protects against a single point of redo log failure.
Multiplexing is implemented by creating groups of redo log files.  Each group consists of the redo file, and its multiplexed copies.  Each redo log group is defined by a number, such as group 1, group 2 etc.

You can have different members of each group, on different disks.  For example group 1 would have members on disk 1 and disk 2.  Group 2 would have members on disk 1 and disk 2.  Seven if you lost a disk, you would still have a complete set of redo log files.

Whenever LGWR can’t write to member of a group, the database marks that redo log file member as invalid, and writes an error message to the LGWR trace file, and to the alert log.
LGWR’s behaviour will depend on the specific condition when a log member is unavailable:
Condition
LGWR Action
LGWR can write to at least one member in the group
Writing proceeds as per normal.  LGWR writes to the available members of the group, and ignores the unavailable members.
LGWR cannot access the next group at a log switch, because the group must be archived
Database operation temporarily halts until the group becomes available or until the group is archived.
All members of the next group are inaccessible to LGWR at a log switch because of media failure
Oracle Database returns an error, and the database instance shuts down.  In this case you may need to perform media recovery on the database from the loss of a redo log file.

If the database checkpoint has moved beyond the lost redo log, media recovery is not necessary, because the database has saved the data recorded in the redo log file to the data files.  You need only drop the inaccessible redo log group.  If the database did not archive the bad log, use ALTER DATABASE CLEAR UNARCHIVED LOG to disable archiving before the log can be dropped.
All members of a group suddenly become inaccessible to LGWR while it is writing to them
Oracle Database returns an error, and the instance immediately shuts down.  IN this case you may need to perform media recovery.  If the media containing the log is not actually lost: for example if the drive containing the log file was turned off or disconnected, -- meadia recovery may not be needed.  In this case you need only turn the drive back on and let the database perform automatic instance recovery.



The requirement for redo logs, is that there should be at least two groups, preferably three.  All groups of redo logs, should have the same number of members, although this is not actually required.

Always remember to place redo logs on different disks if possible.  This arrangement will protect the database, in case of failure of one of the disks.

Avoid contention between LGWR and ARCn by having the archive log directory on a separate disk to the redo log area.

Block Size of redo log files
Database block sizes vary between 2K and 32K.  However redo log file sizes are different.
Redo log files default to a block size that is equal to the physical sector size of the disk.  Historically this would have been 512B.  Some newer high-capacity disks offer 4K sector size.   The downside of the 4K redo block size, is that there is increased redo wastage.  The amount of redo wastage between 512B and 4K blocks is significant.  You can display the amount of redo wastage by querying v$sesstat and v$sysstat:

SQL> select name, value from v$sysstat where name = ‘redo wastage’;

You can avoid the additional redo wastage if you are using emulation-mode disks.  4K secotr size disks that emulate a 512B secotr size at the disk interface.  You can override the default 4K block size, by specifying a 512B block size, or sometimes a 1K block size. 

However you may incur a significant performance degradation when a redo log write is not aligned with the beginning of a 4K physical sector.  Seven out of eight 512B slots in a 4K physical sector are not aligned.  When planning the redo log block size you must make a trade off between, performance and disk wastage.

From 11gR2 and onwards, you can specify the block size of redo log files with the BLOCKSIZE  keyword, in the CREATE DATABASE, ALTER DATABASE, and CREATE CONTROLFILE statements.  The permissible sizes are: 512, 1024, and 4096

SQL> alter database test add logfile
Group 4 (‘/u01/logs/test/redo04a.log’,’ /u04/logs/test/redo04b.log’)
Size 100m
Blocksize 512
Reuse;

You can verify the size of the redo logs with this query:

SQL> select blocksize from v$log;

Choosing the number of Redo Log Files
Different configurations should be tested out to see which one works the best.  There should always be at least 2 redo groups, but having at least 3 is best practice.  The optimum configuration has the fewest number of redo groups or logs, without hampering LGWR from writing redo log records.

During testing, keep looking at the LGWR trace file, and the alert log file to find evidence that LGWR frequently has to wait for the next available group, because a checkpoint has not completed, or a group has not yet been archived; if you find this evidence then add log groups and continue with the exercise.

Before making any changes, you should examine the parameters limit the number of redo log files that you can add to an Oracle Database.
·         MAXLOGFILES:  This parameter is used in the CREATE DATABASE statement, and determines the maximum of groups of redo log files for each database.  The number of groups defined can range from 1 to MAXLOGFILES.  When the compatibility level is set to earlier than 10.2.0 then the only way to change this parameter is to recreate the database, or to recreate the control file.  From 10.2.0 onwards, you can exceed the value defined in MAXLOGFILES, and the control file expands as needed.  If MAXLOGFILES is not specified in the CREATE DATABASE statement, then an Operating System specific default value is given for this variable.
·         MAXLOGMEMBERS:  This parameter is also used with the CREATE DATABASE statement.  The same restrictions apply as with MAXLOGFILES.  If you specify this parameter when creating the database, then it must be carefully planned.

Controlling Archive Lag
You can force all redo log threads to switch their CURRENT redo logs at regular time intervals.  In a Primary / Standby database configuration, changes are made available to the standby database by archiving redo logs at the primary site, and then shipping them across to the Standby site and applying them to the Standby database.  The standby database must wait for the redo log files to be archived and shipped across to it before it can apply the latest changes.  With the ARCHIVE_LAG_TARGET initialization parameter you can specify in seconds, how long that lag should be.

When ARCHIVE_LAG_TARGET is set, then the database examines the current redo log of the database periodically.  If the following conditions are met, then the instance will switch the logfile.
·         The current redo log file was created prior to n seconds ago, and the estimated archival time for the current log = m seconds.  m seconds is calculated according to the number of redo blocks in the current redo log file.  If n + m exceeds the value of the ARCHIVE_LAG_TARGET initialization parameter, we have a yes for a redo log file switch
·         The current log contains redo records.  A log switch will not occur if there are no records in the current redo log file.

In a RAC(Real Application Clusters) environment, the instance will also cause other threads from other instances to switch and archive their redo log files, if they are falling behind.  This is a useful mechanism if one instance in a RAC environment is more active than the other instances.  In a RAC environment, the value of ARCHIVE_LAG_TARGET must be the same on all instances, otherwise you can expect unpredictable behaviour.

The ARCHIVE_LAG_TARGET is a dynamic parameter, so you can set it with ALTER SYSTEM … SCOPE=BOTH; to make it permanent.  The value of ‘0’ disables it, and is also the default value.  A typical value for this initialization parameter is half an hour:

SQL> alter system set ARCHIVE_LOG_TARGET=1800 scope=both;

Factors to consider when setting ARCHIVE_LOG_TARGET
·         Overhead of switching the redo log files, and archiving the redo log files
·         You need to have some idea of how frequently log switches would occur, as a result of redo log files being full, under normal conditions.  From this value you will be looking at a more frequent interval, if you want to add value to the process.
·         What the impact of redo loss is, and the impact of redo log file lag on the standby system.

If the log switches already occur at a frequency which is greater than the setting you give for ARCHIVE_LOG_TARGET (in seconds), then the new setting will not be of much use.

If you set ARCHIVE_LOG_TARGET to a value that is too low, then you will experience a negative impact on performance, as a result of too frequent log switches.

So in the end, in order to have a positive impact on the Oracle database, you need to get this parameter just right.

Creating Redo Log Groups
The redo log configuration should have been planned properly at database creation time.  However it is possible to add redo log file groups to threads afterwards.  To create or add a new redo log group, you must have the ALTER DATABASE system privilege.  Also you should not add more groups that the number specified in MAXLOGFILES.

SQL> alter database
Add logfile group 10 (‘/uo1/test/redo/log10_01’, ‘/u10/test/redo/log10_02’)
Size 100M blocksize 512;

The group clause is optional, but it is better to specify in which group you going to place the redo log files.  If you number your groups like this: 10, 20,30 etc, you will tend to consume unnecessary space in your control files.  If it far better to number your groups 1,2,3,…10 for example.

Creating redo log members
You can always add members to an existing group.  For example if you decide to have three log files in each group.  Also if due to media failure you have lost a redo log file, from a group and need to replace it.

SQL> alter database add logfile member ‘/uo1/test/redo/log02_03’ to group 2;
You don’t need to specify the size of the redo log file, because it defaults to the same size as the other members of the group.  Depending on your operating system you will notice that the status of the new log member is INVALID.  This is normal behaviour, and it should change to active, the first time it is used.

Instead of specifying the group you want to add the logfile to, you can optionally specify all of the other members of the group in the TO clause.

SQL> alter database add logfile member ‘/uo3/test/redo/log02_03’
TO (‘/uo1/test/redo/log02_01’, ‘/uo2/test/redo/log02_02’);

Renaming and relocating redo log members
This operation may become necessary if you are going to retire the disk which contains the redo log files, or you want to keep copies of the redo log files on different disk drives, or you want to move the files around to minimize I/O contention.

You will need privileges on the Operating system to copy or move the files, and on the database the ALTER DATABASE system privilege.  You must also have privileges to open and backup the database.  Before you start, you should make a full backup of the database.

Let’s assume that the redo log files are located on two disks, disk1 and disk2.  The Redo log groups are duplexed across the two disks. We will be moving the redo log files from disk1 to disk3.

·         Shut down the database
·         Copy or move the redo log files to the new location
$ mv /disk1/redo/redolog01_01 /disk3/redo/redolog01_01
$ mv /disk1/redo/redolog02_01 /disk3/redo/redolog02_01
·         Startup the database in MOUNT mode
·         SQL> alter database
Rename file  ‘/disk1/redo/redolog01_01’,’ /disk1/redo/redolog02_01’
To ‘/disk3/redo/redolog01_01’,’ /disk3/redo/redolog02_01’;
·         Open the database for normal operation.
SQL> alter database open;

Dropping log groups
The following restrictions apply:
·         An instance requires at least two redo log groups at all times.  Each group consists of one or more members.
·         You can only drop inactive log groups.  You may have to do a log switch and wait until the log file has an inactive status, before you can drop the log group.
·         Make sure that the redo log group is archived before dropping it.
SQL> select group#, archived, status from v$log;

When the status is INACTIVE and not ACTIVE or CURRENT, and archived is equal to YES; then you are ready to drop the redo log group.
SQL> alter database drop logfile group 3;
Check that the logfile group was successful, by querying from v$log again.  Remember to clean up the redo log files in the old location.

If you are using Oracle Managed Files, then the cleanup of Operating System Files is done for you by the Oracle Database.

Dropping log members
The following restrictions apply;
·         If all the groups have two members, you can drop one member in a group, and have one member for a while.  As soon as possible you should duplex the group by adding a log member back to the log group again.  While there is only one member in a group, it represents a single point of failure for the Oracle Database.
·         An instance always requires at least two valid log groups, regardless of the number of members that the group has.  A redo log file may have a status of INVALID just after being created, and if the database can’t access it.  The status becomes stale if the Oracle database suspects, that the file is incomplete or corrupt.  Generally switching to the log group will clear this issue.
·         You can only drop a log member, if it is not part of an ACTIVE or CURRENT log group.
·         Make sure that the redo log is part of a group that has been archived.  Query the v$log view.  If the database is not running in archival mode, then the logs will not be archived. 
SQL> alter database drop logfile member ‘/u01/test/redo/redolog01_01’;

Remember to clean up the operating system file, if you are not running Oracle Managed Files.

Forcing Log Switches
By default a log switch will occur, when the current log file group fills up and LGWR starts writing to the next log file group.  You can also force a log switch, to make the CURRENT log group available for redo log maintenance.  You must have the ALTER SYSTEM system privilege, to force a logswitch.
SQL> alter system switch logfile;  

Verifying blocks in redo log files
The default value for DB_BLOCK_CHECKSUM is TYPICAL.  With this setting the database will computes a checksum for each database block / Redo Log Block when it is written to disk.  The checksum is stored in the header of the data block.

The Oracle Database uses this checksum to check for corrupt blocks, when it writes blocks from the redo log files to the archived redo log files.  If corruption is detected while trying to archive the Redo  Log files, then the Oracle database attempts to read the same block from another member in the group.  If the block is corrupted in all members in the group, then archiving can’t proceed.

The database verifies the redo block when it is read from an archived log file, during recovery.  If a block is corrupted, then an error is raised and written to the alert log.

DB_BLOCK_CHECKSUM is a dynamic parameter, the possible values are:
·         TYPICAL, which is the default value for this parameter.  Typical provides a fair level of protection for the database.  Verifies the blocks as they are written to data blocks, and redo log blocks.
·         FULL, will verify the checksum, as it is transferred from one memory construct to another.  This is useful when you are propagating data blocks to standby databases. 
·         TRUE, behaves the same as TYPICAL.

There is an overhead when DB_BLOCK_CHECKSUM is enabled.  Typically between 1% and 10% of CPU overhead.  In a system that is already tightly pressed, this overhead can make a substantial difference.

DB_BLOCK_CHECKSUM is a dynamic parameter, so it can be set with the ALTER SYSTEM command.

Clearing a Redo Log File
A redo log file, may become corrupted during normal database operation.  If archiving can’t continue because of the Redo Log File corruption, then the database activity will cease, until the issue us resolved.  To re-initialize the log file without restarting the database.
SQL> alter database clear logfile group 2;

Dropping the log file, may not be possible for the following reasons:
·         There are only two log groups
·         The corrupt redo log file, belongs to the current group

If the corrupt redo log file has not been archived, then you can use this syntax to clear the logfile.
SQL> alter database clear unarchived logfile group 2;

Note that the above statement will clear the redo log file without arching it first.  The cleared redo logfiles are available for use, even although they were not archived.  On the downside, you may no longer be able to recover completely from a backup.  The Oracle Database writes a message o the alert log, describing the backups from which you can’t recover.  It makes sense to make a full backup of the database at this point.

Note that if you clear a redo log file, that is needed to bring an offline tablespace online, you will not be able to bring that tablespace online again.  To recover that tablespace you will need to recreate it, or perform an incomplete recover.  Tablespaces that were taken offline normally do not require recovery, so they will not be affected by this scenario.
SQL> alter database clear logfile group 2 unrecoverable datafile;

You can get information on the Redo Logfiles from the following views:
View
Description
V$LOG
This is the Redo Log File information that is stored in the control file
V$LOGFILE
Displays redo log groups, log members and log member status
V$LOG_HISTORY
Displays the log history data.  SEQUENCE# is useful if you want to find the last log sequence




Mark Tiger,
Need a Database Health Check, Remote Monitoring, Support, Maintenance, or a Security Audit?

P.S. I am busy preparing Nine books on preparing for the Oracle 11g R2 OCM DBA exam.  Watch this spot for details on the books, as they start becoming available.

No comments:

Post a Comment