Performance overhead of db_block_checking and db_block_checksum non-default settings

I guess you could argue that in many cases, init parameter default settings should be seen as recommendations (leaving aside the clearly resource-dependent or usage-dependent parameters here of course) – as though Oracle were telling us “if you install a new database and want to keep it safe and simple, do this”.
For example, I’m probably not the only person in the world who has wished, at a time, that plan statistics had been available for some statement (some big, dark, evil statement appearing out of the blue), in the cursor cache on the production database … – but of course they were not, as statistics_level is of course set to typical and no one had of course bothered to make this statement run with /*+ gather_plan_statistics*/ 😉

Well, thinking in the same line, how about the parameters Oracle provides to prevent (if possible – otherwise, at least identify) block corruptions?
If the default is the recommendation, then this will be

  • TYPICAL for DB_BLOCK_CHECKSUM, and
  • FALSE for DB_BLOCK_CHECKING.

DB_BLOCK_CHECKSUM

Let’s start with the perhaps less controversial case, DB_BLOCK_CHECKSUM. Here the default means we do indeed do some checking. To recap, what DB_BLOCK_CHECKSUM does it exactly what it says, – calculate, store and verify a checksum for the block.
With the default setting of TYPICAL, when a block is written to disk, DBWR will calculate a checksum and store it in the block header. When the block gets read from disk, the reading server process verifies that checksum, if available. So to summarize, the action happens whenever reading from or writing to disk.

Now, changing the setting to FULL would mean

  • the checksum is also calculated when a block already in the cache is to be modified by updates/deletes. In case in-memory corruption is found, Oracle will attempt to recover the block by reading it from disk (or from another instance) and applying the redo required ( see Note 1302539.1)
  • Redo logs are checksummed, too

Sounds good? Then why – confirming our intuition that what is default is recommended – does Oracle recommend setting the parameter to TYPICAL?

Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.
(from the Reference: http://docs.oracle.com/cd/E16655_01/server.121/e17615/refrn10030.htm#CHDDCEIC)

Must be because of the overhead. Performance overhead is estimated to vary around 1-2% for TYPICAL and 4-5% for FULL settings.

DB_BLOCK_CHECKING

For DB_BLOCK_CHECKING, what is recommended (again, in the Reference, http://docs.oracle.com/cd/E16655_01/server.121/e17615/refrn10029.htm#CHDDCEIC) is not what is the default:

You should set DB_BLOCK_CHECKING to FULL if the performance overhead is acceptable.

Going back one step, what DB_BLOCK_CHECKING does is perform logical intra-block checking including header information and possibly, block content. Checks are performed whenever a block changes in memory.
A setting of LOW performs header checks only, whereas a setting of MEDIUM will do logical checks on block content, too. With FULL, not only tables, but indexes will be checked as well.
With settings MEDIUM and FULL, when detecting logical block corruptions in memory, Oracle will read the block from disk and apply any required redo (again, see 1302539.1). If the block cannot be repaired, the block will be marked (soft) corrupted and ORA-1578 or ORA-600 will be thrown.
Marking a block as soft corrupt prevents corruption from further spreading in memory (see 1496934.1).
Here, again, sounds good right? If MEDIUM is not the default, then, there must be substantial reasons for it. Quoting the Reference again:

Block checking typically causes 1% to 10% overhead in most applications, depending on workload and the parameter value. Specific DML overhead may be higher. The more updates or inserts in a workload, the more expensive it is to turn on block checking.

This is rather vague! We don’t get any estimations for the different parameter settings – just this “depending on … the parameter value.”
And what is “Specific DML Overhead” supposed to mean?
Evidently, the impact of logical block checking is difficult to predict!

Getting an Idea of the Overhead …

So probably, if you consider changing to non-default (from default) settings the safest way is to use Database Replay for impact estimation. To get an idea of in what kind of range performance overhead might lie, however, a simulation might be helpful, and this is what I did.
I wrote a simple simulation procedure (to be executed against a clone of the database in question) that performs typical DML statements and SELECTs in proportion to what is observed in the real database – namely, 1 DELETE (whole-row), 1 UPDATE (by index), and 2 INSERTs (whole-row) for 30 SELECTs (select * from …) (to be frank, the proportion was guessed from v$sql using command_type, so it might not be too accurate, but this was a simulation anyway).

Four conditions were tested

  • DB_BLOCK_CHECKSUM = TYPICAL, DB_BLOCK_CHECKING = FALSE (the starting point)
  • DB_BLOCK_CHECKSUM = TYPICAL, DB_BLOCK_CHECKING = LOW (the „minimal action“ to take in direction of logical consistency checks)
  • DB_BLOCK_CHECKSUM = TYPICAL, DB_BLOCK_CHECKING = MEDIUM (the action I would like to take regarding logical consistency checks)
  • DB_BLOCK_CHECKSUM = FULL, DB_BLOCK_CHECKING = MEDIUM (the action I would like to take regarding logical consistency checks, combined with enhanced protection against physical corruption)

(I skipped DB_BLOCK_CHECKING = FULLas I’d be happy enough with MEDIUM, given that indexes could simply be rebuilt in case of corruption).

Now, for the results. I’ll just refer averages, as the results from 5 runs per condition were extremely consistent.
The numbers are elapsed times (in seconds) from a procedure that

  • selects all rows from a table, one by one (yielding the time elapsed for SELECTs),
  • 1 of 30 times, inserts the row just selected into another table (yielding the time elapsed for INSERTs),
  • 1 of 30 times, updates the row just selected (yielding the time elapsed for UPDATEs), and
  • 1 of 60 times, deletes the row just selected (yielding the time elapsed for DELETEs).
select insert update delete
TYPICAL – FALSE 23.97 14.36 2.14 5.54
TYPICAL – LOW 23.81 14.15 2.08 5.50
TYPICAL – MEDIUM 23.81 19.61 3.05 7.94
FULL – MEDIUM 23.91 20.98 3.09 8.52

So, while, as expected, SELECTs are not impacted by the settings, indeed the overhead of logical block checking on DML (simple, prototypical, totally unsophisticated DML) performance is substantial – far more substantial than the numbers (1-10%) given in the Reference would make you think!

Assuming the respective amounts of selects, inserts, updates and deletes really reflected the workload, summing up, we’d get

Overall elapsed time
TYPICAL – FALSE 46.01
TYPICAL – LOW 45.55
TYPICAL – MEDIUM 54.41
FULL- MEDIUM 56.50

This makes for an overhead of logical block checking (DB_BLOCK_CHECKING = MEDIUM) of 18.8%, for a workload that is strongly dominated by SELECTs.
What to conclude? Not sure. At least, test carefully before switching this on – and be prepared to revert (which is easy, given that DB_BLOCK_CHECKING is a dynamic parameter).
And, coming back to that topic of defaults – perhaps it would be better if Oracle had made DB_BLOCK_CHECKING = MEDIUM the default, so that a conscious decision was necessary to switch this OFF, not ON