Hi,
My name is Mark Tiger, creator
of this blog. I am an Oracle Certified
Professional (OCP DBA 11g).
Gathering information for some
DBA tasks can be time-consuming, even although the commands that you need to
issue eventually can be over quite quickly.
I have gone through this process over and over again, and have decided
to help other Oracle DBA’s in the community.
In this blog, I will give you
the details of how to carry out those tasks; that typically need a lot of
research, before you can do them. I will
try to present the information in an easy to understand way. My hope is that this will save you lots of
time in research, and help to make you more productive as an Oracle DBA. The illustrations are primarily meant for
Linux, since this is a Platform; that enjoys preference from Oracle. However they are easily adaptable for
versions of UNIX/AIX and windows etc.
11g R2 – Managing Redo Logs
The redo logs are very
important structures for recovery operations.
They consist of two or more pre-allocated files that store all the
changes made to the database, as they occur.
Each instance has at least one redo log to protect the database, in case
of an instance failure.
Typically only one instance
accesses an Oracle Database. In a RAC
environment, we refer to the redo logs for each instance as “redo
threads”. Each instance has its own redo
thread, which consists of a set of redo logs for that instance. This arrangement eliminates the potential
bottleneck, of having contention for a single set of redo logs. We will focus on one instance, so the thread
is assumed to be thread 1, in a typical installation.
Redo logs are filled with redo
entries. A redo entry is made up of a
set of change vectors. A change vector is a description of a change made to a
single block in the database. A change
to a table will affect: redo entry containing change vectors that describe the
changes to the data segment blocks for the table, the undo segment data block
is written to, the transaction table for the undo segments.
Redo entries record data that
you can use to reconstruct all changes made to the database, this includes the
undo segments. The redo log also
protects redo or undo data. When you
recover the database using the redo logs, the Oracle Database reads the change
vectors in the redo entries, and applies the changes to the affected data
blocks.
Redo records are buffered in a
circular fashion in the log buffer of the SGA. Whenever a transaction is committed, the
background process LGWR(Log Writer), writes the redo records from the redo log
buffer in the SGA, and writes them to a redo log file, and assigns a SCN(System
change number), to identify the records for each committed transaction. When the redo records from a given transaction
are safely written to the redo log files, then the user process is notified
that the transaction has been committed.
If the redo log buffer fills,
or another transaction commits, then LGWR flushes all the redo log records in
the redo log buffer to a redo log file.
Some of the transactions written to the redo log file, may not yet be
committed. However the Oracle database
can roll these transactions back if required.
Writing to the Redo Log
There should be two or more
redo log files. Oracle Database requires
a minimum of two redo log files, so that one redo log is always available,
while the other one is being archived.
LGWR writes to redo log files
in a circular fashion. When one redo log
file is full, LGWR moves onto the next one and starts writing. When LGWR, get back to the first redo log
file, it overwrites the first one.
The ability of LGWR, to
overwrite already full redo log files, depends on whether archiving is anabled
or not.
·
If the database is in NOARCHIVELOG mode then
archiving is disabled. A filled redo log
file is available to be overwritten, after the changes recorded in it have been
written to the data files.
·
If e database is in ARCHIVELOG mode the
archiving is enabled. A filled redo log
file is available to be overwritten, after the changes recorded in it have been
written to the data files, and the redo log file has been archived.
Active and inactive redo log files
LGWR writes to one redo log
buffer at a time, all the data from the redo log buffer. The redo log file that LGWR is actively
writing to is called the “current” redo log file.
Redo log files that are
required for instance recovery are called active redo log files. Redo log files, that are no longer required
for instance recovery are called inactive redo log files.
If the database is in
ARCHIVELOG mode, which is the recommended mode of operation; then the LGWR,
can’t overwrite a redo log file, until it has been archived by ARCn (Archiver
background process).
Log Switches
A log switch is the point
where the database stops writing to one redo log file, and starts writing to
the next redo log file. Under normal
conditions, a log switch occurs, when the current redo log file is full, and
writing must continue to the next available redo log file. Log switches can be configured to occur at
regular intervals, regardless of whether the redo log file is full of not. You can also force a log switch manually.
When LGWR begins writing to a
new redo log file, the Oracle Database assigns a log sequence number to the new
redo log file. When the Oracle Database archives
redo log files, The archived log retains its log sequence number. Each online or archived redo log file is
uniquely identified by its log sequence number.
During crash, instance, or media recovery the Oracle Database applies
redo log files in ascending order, by using the log sequence number of the
archived and redo log files.
Multiplexing redo log files
To protect against a failure
involving the redo log file. Oracle
allows you to multiplex the redo log files.
Best practice requires two copies of each redo log file, on different
disks if possible. This redundancy
protects against I/O errors, file corruption, even if the copies on the same
disk. LGWR will write to all the
identical copies of the redo logs, concurrently. This protects against a single point of redo
log failure.
Multiplexing is implemented by
creating groups of redo log files. Each
group consists of the redo file, and its multiplexed copies. Each redo log group is defined by a number, such
as group 1, group 2 etc.
You can have different members
of each group, on different disks. For
example group 1 would have members on disk 1 and disk 2. Group 2 would have members on disk 1 and disk
2. Seven if you lost a disk, you would
still have a complete set of redo log files.
Whenever LGWR can’t write to
member of a group, the database marks that redo log file member as invalid, and
writes an error message to the LGWR trace file, and to the alert log.
LGWR’s behaviour will depend
on the specific condition when a log member is unavailable:
Condition
|
LGWR Action
|
LGWR can write to at least
one member in the group
|
Writing proceeds as per
normal. LGWR writes to the available
members of the group, and ignores the unavailable members.
|
LGWR cannot access the next
group at a log switch, because the group must be archived
|
Database operation
temporarily halts until the group becomes available or until the group is
archived.
|
All members of the next
group are inaccessible to LGWR at a log switch because of media failure
|
Oracle Database returns an
error, and the database instance shuts down.
In this case you may need to perform media recovery on the database
from the loss of a redo log file.
If the database checkpoint
has moved beyond the lost redo log, media recovery is not necessary, because
the database has saved the data recorded in the redo log file to the data
files. You need only drop the
inaccessible redo log group. If the
database did not archive the bad log, use ALTER DATABASE CLEAR UNARCHIVED LOG
to disable archiving before the log can be dropped.
|
All members of a group
suddenly become inaccessible to LGWR while it is writing to them
|
Oracle Database returns an
error, and the instance immediately shuts down. IN this case you may need to perform media
recovery. If the media containing the
log is not actually lost: for example if the drive containing the log file
was turned off or disconnected, -- meadia recovery may not be needed. In this case you need only turn the drive
back on and let the database perform automatic instance recovery.
|
|
|
The requirement for redo logs,
is that there should be at least two groups, preferably three. All groups of redo logs, should have the same
number of members, although this is not actually required.
Always remember to place redo
logs on different disks if possible.
This arrangement will protect the database, in case of failure of one of
the disks.
Avoid contention between LGWR
and ARCn by having the archive log directory on a separate disk to the redo log
area.
Block Size of redo log files
Database block sizes vary
between 2K and 32K. However redo log
file sizes are different.
Redo log files default to a
block size that is equal to the physical sector size of the disk. Historically this would have been 512B. Some newer high-capacity disks offer 4K
sector size. The downside of the 4K redo
block size, is that there is increased redo wastage. The amount of redo wastage between 512B and
4K blocks is significant. You can
display the amount of redo wastage by querying v$sesstat and v$sysstat:
SQL> select name, value
from v$sysstat where name = ‘redo wastage’;
You can avoid the additional
redo wastage if you are using emulation-mode disks. 4K secotr size disks that emulate a 512B
secotr size at the disk interface. You
can override the default 4K block size, by specifying a 512B block size, or
sometimes a 1K block size.
However you may incur a
significant performance degradation when a redo log write is not aligned with
the beginning of a 4K physical sector.
Seven out of eight 512B slots in a 4K physical sector are not
aligned. When planning the redo log
block size you must make a trade off between, performance and disk wastage.
From 11gR2 and onwards, you
can specify the block size of redo log files with the BLOCKSIZE keyword, in the CREATE DATABASE, ALTER
DATABASE, and CREATE CONTROLFILE statements.
The permissible sizes are: 512, 1024, and 4096
SQL> alter database test
add logfile
Group 4
(‘/u01/logs/test/redo04a.log’,’ /u04/logs/test/redo04b.log’)
Size 100m
Blocksize 512
Reuse;
You can verify the size of the
redo logs with this query:
SQL> select blocksize from
v$log;
Choosing the number of Redo Log Files
Different configurations
should be tested out to see which one works the best. There should always be at least 2 redo groups,
but having at least 3 is best practice.
The optimum configuration has the fewest number of redo groups or logs,
without hampering LGWR from writing redo log records.
During testing, keep looking
at the LGWR trace file, and the alert log file to find evidence that LGWR
frequently has to wait for the next available group, because a checkpoint has
not completed, or a group has not yet been archived; if you find this evidence
then add log groups and continue with the exercise.
Before making any changes, you
should examine the parameters limit the number of redo log files that you can
add to an Oracle Database.
·
MAXLOGFILES:
This parameter is used in the CREATE DATABASE statement, and determines
the maximum of groups of redo log files for each database. The number of groups defined can range from 1
to MAXLOGFILES. When the compatibility
level is set to earlier than 10.2.0 then the only way to change this parameter
is to recreate the database, or to recreate the control file. From 10.2.0 onwards, you can exceed the value
defined in MAXLOGFILES, and the control file expands as needed. If MAXLOGFILES is not specified in the CREATE
DATABASE statement, then an Operating System specific default value is given
for this variable.
·
MAXLOGMEMBERS:
This parameter is also used with the CREATE DATABASE statement. The same restrictions apply as with
MAXLOGFILES. If you specify this
parameter when creating the database, then it must be carefully planned.
Controlling Archive Lag
You can force all redo log
threads to switch their CURRENT redo logs at regular time intervals. In a Primary / Standby database
configuration, changes are made available to the standby database by archiving
redo logs at the primary site, and then shipping them across to the Standby
site and applying them to the Standby database.
The standby database must wait for the redo log files to be archived and
shipped across to it before it can apply the latest changes. With the ARCHIVE_LAG_TARGET initialization
parameter you can specify in seconds, how long that lag should be.
When ARCHIVE_LAG_TARGET is
set, then the database examines the current redo log of the database
periodically. If the following
conditions are met, then the instance will switch the logfile.
·
The current redo log file was created prior to n
seconds ago, and the estimated archival time for the current log = m
seconds. m seconds is calculated
according to the number of redo blocks in the current redo log file. If n + m exceeds the value of the
ARCHIVE_LAG_TARGET initialization parameter, we have a yes for a redo log file
switch
·
The current log contains redo records. A log switch will not occur if there are no
records in the current redo log file.
In a RAC(Real Application
Clusters) environment, the instance will also cause other threads from other
instances to switch and archive their redo log files, if they are falling
behind. This is a useful mechanism if
one instance in a RAC environment is more active than the other instances. In a RAC environment, the value of
ARCHIVE_LAG_TARGET must be the same on all instances, otherwise you can expect
unpredictable behaviour.
The ARCHIVE_LAG_TARGET is a
dynamic parameter, so you can set it with ALTER SYSTEM … SCOPE=BOTH; to make it
permanent. The value of ‘0’ disables it,
and is also the default value. A typical
value for this initialization parameter is half an hour:
SQL> alter system set
ARCHIVE_LOG_TARGET=1800 scope=both;
Factors to consider when setting ARCHIVE_LOG_TARGET
·
Overhead of switching the redo log files, and
archiving the redo log files
·
You need to have some idea of how frequently log
switches would occur, as a result of redo log files being full, under normal
conditions. From this value you will be
looking at a more frequent interval, if you want to add value to the process.
·
What the impact of redo loss is, and the impact
of redo log file lag on the standby system.
If the log switches already
occur at a frequency which is greater than the setting you give for
ARCHIVE_LOG_TARGET (in seconds), then the new setting will not be of much use.
If you set ARCHIVE_LOG_TARGET
to a value that is too low, then you will experience a negative impact on
performance, as a result of too frequent log switches.
So in the end, in order to
have a positive impact on the Oracle database, you need to get this parameter
just right.
Creating Redo Log Groups
The redo log configuration
should have been planned properly at database creation time. However it is possible to add redo log file
groups to threads afterwards. To create
or add a new redo log group, you must have the ALTER DATABASE system
privilege. Also you should not add more
groups that the number specified in MAXLOGFILES.
SQL> alter database
Add logfile group 10
(‘/uo1/test/redo/log10_01’, ‘/u10/test/redo/log10_02’)
Size 100M blocksize 512;
The group clause is optional,
but it is better to specify in which group you going to place the redo log
files. If you number your groups like
this: 10, 20,30 etc, you will tend to consume unnecessary space in your control
files. If it far better to number your
groups 1,2,3,…10 for example.
Creating redo log members
You can always add members to
an existing group. For example if you
decide to have three log files in each group.
Also if due to media failure you have lost a redo log file, from a group
and need to replace it.
SQL> alter database add
logfile member ‘/uo1/test/redo/log02_03’ to group 2;
You don’t need to specify the
size of the redo log file, because it defaults to the same size as the other
members of the group. Depending on your
operating system you will notice that the status of the new log member is
INVALID. This is normal behaviour, and
it should change to active, the first time it is used.
Instead of specifying the
group you want to add the logfile to, you can optionally specify all of the
other members of the group in the TO clause.
SQL> alter database add
logfile member ‘/uo3/test/redo/log02_03’
TO (‘/uo1/test/redo/log02_01’,
‘/uo2/test/redo/log02_02’);
Renaming and relocating redo log members
This operation may become
necessary if you are going to retire the disk which contains the redo log
files, or you want to keep copies of the redo log files on different disk
drives, or you want to move the files around to minimize I/O contention.
You will need privileges on
the Operating system to copy or move the files, and on the database the ALTER
DATABASE system privilege. You must also
have privileges to open and backup the database. Before you start, you should make a full
backup of the database.
Let’s assume that the redo log
files are located on two disks, disk1 and disk2. The Redo log groups are duplexed across the
two disks. We will be moving the redo log files from disk1 to disk3.
·
Shut down the database
·
Copy or move the redo log files to the new
location
$ mv /disk1/redo/redolog01_01
/disk3/redo/redolog01_01
$ mv
/disk1/redo/redolog02_01 /disk3/redo/redolog02_01
·
Startup the database in MOUNT mode
·
SQL> alter database
Rename
file ‘/disk1/redo/redolog01_01’,’ /disk1/redo/redolog02_01’
To
‘/disk3/redo/redolog01_01’,’ /disk3/redo/redolog02_01’;
·
Open the database for normal operation.
SQL>
alter database open;
Dropping log groups
The following restrictions
apply:
·
An instance requires at least two redo log
groups at all times. Each group consists
of one or more members.
·
You can only drop inactive log groups. You may have to do a log switch and wait
until the log file has an inactive status, before you can drop the log group.
·
Make sure that the redo log group is archived
before dropping it.
SQL>
select group#, archived, status from v$log;
When the status is INACTIVE
and not ACTIVE or CURRENT, and archived is equal to YES; then you are ready to
drop the redo log group.
SQL> alter database drop
logfile group 3;
Check that the logfile group
was successful, by querying from v$log again.
Remember to clean up the redo log files in the old location.
If you are using Oracle
Managed Files, then the cleanup of Operating System Files is done for you by
the Oracle Database.
Dropping log members
The following restrictions
apply;
·
If all the groups have two members, you can drop
one member in a group, and have one member for a while. As soon as possible you should duplex the
group by adding a log member back to the log group again. While there is only one member in a group, it
represents a single point of failure for the Oracle Database.
·
An instance always requires at least two valid log
groups, regardless of the number of members that the group has. A redo log file may have a status of INVALID
just after being created, and if the database can’t access it. The status becomes stale if the Oracle
database suspects, that the file is incomplete or corrupt. Generally switching to the log group will
clear this issue.
·
You can only drop a log member, if it is not
part of an ACTIVE or CURRENT log group.
·
Make sure that the redo log is part of a group
that has been archived. Query the v$log
view. If the database is not running in
archival mode, then the logs will not be archived.
SQL>
alter database drop logfile member ‘/u01/test/redo/redolog01_01’;
Remember to clean up the
operating system file, if you are not running Oracle Managed Files.
Forcing Log Switches
By default a log switch will
occur, when the current log file group fills up and LGWR starts writing to the
next log file group. You can also force
a log switch, to make the CURRENT log group available for redo log maintenance. You must have the ALTER SYSTEM system
privilege, to force a logswitch.
SQL> alter system switch
logfile;
Verifying blocks in redo log files
The default value for DB_BLOCK_CHECKSUM
is TYPICAL. With this setting the
database will computes a checksum for each database block / Redo Log Block when
it is written to disk. The checksum is
stored in the header of the data block.
The Oracle Database uses this
checksum to check for corrupt blocks, when it writes blocks from the redo log
files to the archived redo log files. If
corruption is detected while trying to archive the Redo Log files, then the Oracle database attempts
to read the same block from another member in the group. If the block is corrupted in all members in
the group, then archiving can’t proceed.
The database verifies the redo
block when it is read from an archived log file, during recovery. If a block is corrupted, then an error is
raised and written to the alert log.
DB_BLOCK_CHECKSUM is a dynamic
parameter, the possible values are:
·
TYPICAL, which is the default value for this
parameter. Typical provides a fair level
of protection for the database. Verifies
the blocks as they are written to data blocks, and redo log blocks.
·
FULL, will verify the checksum, as it is
transferred from one memory construct to another. This is useful when you are propagating data
blocks to standby databases.
·
TRUE, behaves the same as TYPICAL.
There is an overhead when
DB_BLOCK_CHECKSUM is enabled. Typically
between 1% and 10% of CPU overhead. In a
system that is already tightly pressed, this overhead can make a substantial
difference.
DB_BLOCK_CHECKSUM is a dynamic
parameter, so it can be set with the ALTER SYSTEM command.
Clearing a Redo Log File
A redo log file, may become
corrupted during normal database operation.
If archiving can’t continue because of the Redo Log File corruption,
then the database activity will cease, until the issue us resolved. To re-initialize the log file without
restarting the database.
SQL> alter database clear
logfile group 2;
Dropping the log file, may not
be possible for the following reasons:
·
There are only two log groups
·
The corrupt redo log file, belongs to the
current group
If the corrupt redo log file
has not been archived, then you can use this syntax to clear the logfile.
SQL> alter database clear
unarchived logfile group 2;
Note that the above statement
will clear the redo log file without arching it first. The cleared redo logfiles are available for use,
even although they were not archived. On
the downside, you may no longer be able to recover completely from a backup. The Oracle Database writes a message o the
alert log, describing the backups from which you can’t recover. It makes sense to make a full backup of the database
at this point.
Note that if you clear a redo
log file, that is needed to bring an offline tablespace online, you will not be
able to bring that tablespace online again.
To recover that tablespace you will need to recreate it, or perform an
incomplete recover. Tablespaces that
were taken offline normally do not require recovery, so they will not be
affected by this scenario.
SQL> alter database clear
logfile group 2 unrecoverable datafile;
You can get information on the
Redo Logfiles from the following views:
View
|
Description
|
V$LOG
|
This is the Redo Log File
information that is stored in the control file
|
V$LOGFILE
|
Displays redo log groups,
log members and log member status
|
V$LOG_HISTORY
|
Displays the log history
data. SEQUENCE# is useful if you want
to find the last log sequence
|
|
|
Mark Tiger,
Need a Database Health Check, Remote Monitoring, Support,
Maintenance, or a Security Audit?
P.S. I am busy
preparing Nine books on preparing for the Oracle 11g R2 OCM DBA exam.
Watch this spot for details on the books, as they start becoming available.
No comments:
Post a Comment