For example, I’m probably not the only person in the world who has wished, at a time, that plan statistics had been available for some statement (some big, dark, evil statement appearing out of the blue), in the cursor cache on the production database … – but of course they were not, as statistics_level is of course set to typical and no one had of course bothered to make this statement run with /*+ gather_plan_statistics*/ 😉
Well, thinking in the same line, how about the parameters Oracle provides to prevent (if possible – otherwise, at least identify) block corruptions?
If the default is the recommendation, then this will be
TYPICAL
forDB_BLOCK_CHECKSUM
, andFALSE
forDB_BLOCK_CHECKING
.
DB_BLOCK_CHECKSUM
Let’s start with the perhaps less controversial case, DB_BLOCK_CHECKSUM
. Here the default means we do indeed do some checking. To recap, what DB_BLOCK_CHECKSUM
does it exactly what it says, – calculate, store and verify a checksum for the block.
With the default setting of TYPICAL
, when a block is written to disk, DBWR will calculate a checksum and store it in the block header. When the block gets read from disk, the reading server process verifies that checksum, if available. So to summarize, the action happens whenever reading from or writing to disk.
Now, changing the setting to FULL
would mean
- the checksum is also calculated when a block already in the cache is to be modified by updates/deletes. In case in-memory corruption is found, Oracle will attempt to recover the block by reading it from disk (or from another instance) and applying the redo required ( see Note 1302539.1)
- Redo logs are checksummed, too
Sounds good? Then why – confirming our intuition that what is default is recommended – does Oracle recommend setting the parameter to TYPICAL
?
Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.
(from the Reference: http://docs.oracle.com/cd/E16655_01/server.121/e17615/refrn10030.htm#CHDDCEIC)
Must be because of the overhead. Performance overhead is estimated to vary around 1-2% for TYPICAL
and 4-5% for FULL
settings.
DB_BLOCK_CHECKING
For DB_BLOCK_CHECKING
, what is recommended (again, in the Reference, http://docs.oracle.com/cd/E16655_01/server.121/e17615/refrn10029.htm#CHDDCEIC) is not what is the default:
You should set DB_BLOCK_CHECKING to FULL if the performance overhead is acceptable.
Going back one step, what DB_BLOCK_CHECKING
does is perform logical intra-block checking including header information and possibly, block content. Checks are performed whenever a block changes in memory.
A setting of LOW
performs header checks only, whereas a setting of MEDIUM
will do logical checks on block content, too. With FULL
, not only tables, but indexes will be checked as well.
With settings MEDIUM
and FULL
, when detecting logical block corruptions in memory, Oracle will read the block from disk and apply any required redo (again, see 1302539.1). If the block cannot be repaired, the block will be marked (soft) corrupted and ORA-1578 or ORA-600 will be thrown.
Marking a block as soft corrupt prevents corruption from further spreading in memory (see 1496934.1).
Here, again, sounds good right? If MEDIUM
is not the default, then, there must be substantial reasons for it. Quoting the Reference again:
Block checking typically causes 1% to 10% overhead in most applications, depending on workload and the parameter value. Specific DML overhead may be higher. The more updates or inserts in a workload, the more expensive it is to turn on block checking.
This is rather vague! We don’t get any estimations for the different parameter settings – just this “depending on … the parameter value.”
And what is “Specific DML Overhead” supposed to mean?
Evidently, the impact of logical block checking is difficult to predict!
Getting an Idea of the Overhead …
So probably, if you consider changing to non-default (from default) settings the safest way is to use Database Replay for impact estimation. To get an idea of in what kind of range performance overhead might lie, however, a simulation might be helpful, and this is what I did.
I wrote a simple simulation procedure (to be executed against a clone of the database in question) that performs typical DML statements and SELECTs in proportion to what is observed in the real database – namely, 1 DELETE (whole-row), 1 UPDATE (by index), and 2 INSERTs (whole-row) for 30 SELECTs (select * from …) (to be frank, the proportion was guessed from v$sql
using command_type
, so it might not be too accurate, but this was a simulation anyway).
Four conditions were tested
DB_BLOCK_CHECKSUM
=TYPICAL
,DB_BLOCK_CHECKING
=FALSE
(the starting point)DB_BLOCK_CHECKSUM
=TYPICAL
,DB_BLOCK_CHECKING
=LOW
(the „minimal action“ to take in direction of logical consistency checks)DB_BLOCK_CHECKSUM
=TYPICAL
,DB_BLOCK_CHECKING
=MEDIUM
(the action I would like to take regarding logical consistency checks)DB_BLOCK_CHECKSUM
=FULL
,DB_BLOCK_CHECKING
=MEDIUM
(the action I would like to take regarding logical consistency checks, combined with enhanced protection against physical corruption)
(I skipped DB_BLOCK_CHECKING
= FULL
as I’d be happy enough with MEDIUM
, given that indexes could simply be rebuilt in case of corruption).
Now, for the results. I’ll just refer averages, as the results from 5 runs per condition were extremely consistent.
The numbers are elapsed times (in seconds) from a procedure that
- selects all rows from a table, one by one (yielding the time elapsed for SELECTs),
- 1 of 30 times, inserts the row just selected into another table (yielding the time elapsed for INSERTs),
- 1 of 30 times, updates the row just selected (yielding the time elapsed for UPDATEs), and
- 1 of 60 times, deletes the row just selected (yielding the time elapsed for DELETEs).
select | insert | update | delete | |
TYPICAL – FALSE | 23.97 | 14.36 | 2.14 | 5.54 |
TYPICAL – LOW | 23.81 | 14.15 | 2.08 | 5.50 |
TYPICAL – MEDIUM | 23.81 | 19.61 | 3.05 | 7.94 |
FULL – MEDIUM | 23.91 | 20.98 | 3.09 | 8.52 |
So, while, as expected, SELECTs are not impacted by the settings, indeed the overhead of logical block checking on DML (simple, prototypical, totally unsophisticated DML) performance is substantial – far more substantial than the numbers (1-10%) given in the Reference would make you think!
Assuming the respective amounts of selects, inserts, updates and deletes really reflected the workload, summing up, we’d get
Overall elapsed time | |
TYPICAL – FALSE | 46.01 |
TYPICAL – LOW | 45.55 |
TYPICAL – MEDIUM | 54.41 |
FULL- MEDIUM | 56.50 |
This makes for an overhead of logical block checking (DB_BLOCK_CHECKING
= MEDIUM
) of 18.8%, for a workload that is strongly dominated by SELECTs.
What to conclude? Not sure. At least, test carefully before switching this on – and be prepared to revert (which is easy, given that DB_BLOCK_CHECKING
is a dynamic parameter).
And, coming back to that topic of defaults – perhaps it would be better if Oracle had made DB_BLOCK_CHECKING
= MEDIUM
the default, so that a conscious decision was necessary to switch this OFF, not ON …