Better call Thomas – Bayes in the Database

This is my third and last post covering my presentations from last week’s Tech Event, and it’s no doubt the craziest one :-).
Again I had the honour to be invited by my colleague Ludovico Caldara to participate in the cool “Oracle Database Lightning Talks” session. Now, there is a connection to Oracle here, but mainly, as you’ll see, it’s about a Reverend in a Casino.

So, imagine one of your customers calls you. They are going to use Oracle Transparent Data Encryption (TDE) in their main production OLTP database. They are in a bit of panic. Will performance get worse? Much worse?

Well, you can help. You perform tests using Swingbench Order Entry benchmark, and soon you can assure them: No problem. The below chart shows average response times for one of the transaction types of Order Entry, using either no encryption or TDE with one of AES-128 or AES-256.
Wait a moment – doesn’t it even look as though response times are even lower with TDE!?

Swingbench Order Entry response times (Customer Registration)

Well, they might be, or they might not be … the Swingbench results include how often a transaction was performed, and the standard deviation … so how about we ask somebody who knows something about statistics?

Cool, but … as you can see, for example, in this xkcd comic, there is no such thing as an average statistician … In short, there’s frequentist statistics and there’s Bayesian, and while there is a whole lot you can say about the respective philosophical backgrounds it mainly boils down to one thing: Bayesians take into account not just the actual data – the sample – but also the prior probability.

This is where the Reverend enters the game . The fundament of Bayesian statistics is Bayes theorem, named after Reverend Thomas Bayes.

Bayes theorem

In short, Bayes Theorem says that the probability of a hypothesis, given my measured data (POSTERIOR), is equal to the the probability of the data, given the hypothesis is correct (LIKELIHOOD), times the prior probability of the hypothesis (PRIOR), divided by the overall probability of the data (EVIDENCE). (The evidence may be neglected if we’re interested in proportionality only, not equality.)

So … why should we care? Well, from what we know how TDE works, it cannot possibly make things faster! In the best case, we’re servicing all requests from the buffer cache and so, do not have to decrypt the data. Then we shouldn’t incur any performance loss. But I cannot imagine how TDE could cause a performance boost.

So we go to our colleague the Bayesian statistian, give him our data and tell him about our prior beliefs.(Prior belief sounds subjective? It is, but prior assumptions are out in the open, to be questioned, discussed and justified.).

Now harmless as Bayes theorem looks, in practice it may be difficult to compute the posterior probability. Fortunately, there is a solution:Markov Chain Monte Carlo (MCMC). MCMC is a method to obtain the parameters of the posterior distribution not by calculating them directly, but by sampling, performing a random walk along the posterior distribution.

We assume our data is gamma distributed, the gamma distribution generally being adequate for response time data (for motivation and background, see chapter 3.5 of Analyzing Computer System Performance with Perl::PDQ by Neil Gunther).
Using R, JAGS (Just Another Gibbs Sampler), and rjags, we (oh, our colleague the statistian I meant, of course) go to work and create a model for the likelihood and the prior.
We have two groups, TDE and “no encryption”. For both, we define a gamma likelihood function, each having its own shape and rate parameters. (Shape and rate parameters can easily be calculated from mean and standard deviation.)

model {
    for ( i in 1:Ntotal ) {
      y[i] ~ dgamma( sh[x[i]] , ra[x[i]] )
    }
    sh[1] <- pow(m[1],2) / pow(sd[1],2)
    sh[2] <- pow(m[2],2) / pow(sd[2],2)
    ra[1]  <- m[1] / pow(sd[1],2)
    ra[2]  <- m[2] / pow(sd[2],2
    [...]

Second, we define prior distributions on the means and standard deviations of the likelihood functions:

    [...]
    m[1] ~ dgamma(priorSha1, priorRa1)
    m[2] ~ dgamma(priorSha2, priorRa2)
    sd[1] ~ dunif(0,100)
    sd[2] ~ dunif(0,100)
  }

As we are convinced that TDE cannot possibly make it run faster, we assume that the means of both likelihood functions are distributed according to prior distributions with the same mean. This mean is calculated from the data, averaging over all response times from both groups. As a consequence, in the above code, priorSha1 = priorSha2 and priorRa1 = priorRa2. On the other hand, we have no prior assumptions about the likelihood functions’ deviations, so we model them as uniformly distributed, choosing noninformative priors.

Now we start our sampler. Faites vos jeux … rien ne va plus … and what do we get?

Posterior distribution of means

Here we see the outcome of our random walks, the posterior distributions of means for the two conditions. Clearly, the modes are different. But what to conclude? The means of the two data sets were different too, right?

What we need to look at is the distribution of the difference of both parameters. What we see here is that the 95% highest density interval (HDI) [3] of the posterior distribution of the difference ranges from -0.69 to +1.89, and thus, includes 0. (HDI is a concept similar, but superior to the classical frequentist confidence interval, as it is composed of the 95% values with the highest probability.)

Posterior distribution of difference between means

From this we conclude that statistically, our observed difference of means is not significant – what a shame. Wouldn’t it have been nice if we could speed up our app using TDE 😉

Overhead of TDE tablespace encryption, is there?

Recently, for a customer, I conducted performance tests comparing performance under TDE tablespace encryption to a baseline. While I won’t present the results of those tests here I will describe two test series I ran in my lab environment.
So you’re an application owner – or a cautious DBA who wants to avoid trouble 😉 – and you want to know, is there an overhead if you use TDE tablespace encryption? (I think there’s some material available on this and I think it rather goes in the “no difference” direction. But it’s always nice to test for yourself, and you really need an answer – yes or no? ;-)).

For both test series, the design was the same: 2*3-factorial with factors

  • encryption: none, AES 128, AES 256, and
  • absence/presence of AES-NI kernel module (on suitable hardware)

So let’s see…

No there’s not!

The first test series were runs of the widely used Swingbench Order Entry Benchmark.
Measurement duration (excluding warm-up) per condition was 20 minutes, which was enough – in this isolated environment – to get reproducible results (as confirmed by sample).
The setup and environmental characteristics were the following:

  • Oracle 12.1.0.2 on OEL 7
  • Order Entry schema size 6G
  • SGA 3G
  • 2 (v)CPUs
  • Data files on ASM
  • 4 concurrent users

Let’s look at the results:

Swingbench OE TPS

At first, it even looks like test runs using encryption had higher throughput. However, with standard deviations ranging between 25 and 30, for the single Order Entry transaction types, this must be attributed to chance. (Also, it would be quite difficult to argue why throughput would actually increase with TDE… let alone be highest when using an algorithm that requires more rounds of encryption…)
So from this test series, we would conclude that their is no performance impact of TDE tablespace encryption. But then, most of the time, at this ratio of schema size to buffer cache, blocks are mostly found in the cache:

  • Logical read (blocks) per transaction: 105.9
  • Physical read (blocks) per transaction: 2.6
  • Buffer Hit %: 97.53

Now with TDE tablespace encryption, blocks are stored in the SGA unencrypted, which means only a process that has to get the block from disk has to perform decryption.
Encryption, on the other hand, is taken care of by database writer, and so happens in the background anyway. So in this setup, it’s not astonishing at all not to find throughput differences due to absence/presence of encryption.

Yes there is…

A second type of test was run against the same database, on the same host. These were SQL Loader direct path loads of a 14G file, done serially. In this case, the server process has to read every line from disk, encrypt it, and write it to disk. This is what we see in this case:

SQL Loader Direct Path Load times

Now suddenly we DO have a substantial impact of TDE. Loading the file and storing data to disk now takes ~1.5 times as long as without encryption. And we see – which was to be expected, too – that this increase in elapsed time is mostly due to increased cpu consumption.

By the way, now that we have such a clear effect, do we see usage of AES-NI speeding up the load? No:

SQL Loader Direct Path Load, with and without AES-NI

This is surprising and probably something deserving further investigation, but not the topic of this post…

Conclusion

So what’s the point? Whether or not you will experience negative impact on performance will depend on your workload. No simple yes-or-no here… (but of course, no mystique either. It just means we have to phrase our questions a little bit more precisely.)

Raising the Fetch Size, Good or Bad? Memory Management in Oracle JDBC 12c

Yesterday at DOAG I presented on “Raising the Fetch Size, Good or Bad? Memory Management in Oracle JDBC 12c”.
Again I’m too lazy to write a separate post (okay, let’s phrase this positively: I don’t like to be redundant ;-)) so I’ll just say what’s not in the slides …

Firstly, it all started with a story, a story about an Oracle DBA for whom, like for most any Oracle DBA I guess, fetch size was a well-known tuning knob. Most any Oracle DBA, I think, will have read some blog post on this topic, or seen a demo showing that as we raise the fetch size, throughput goes up. In fact, sometimes you can see it going up very much!

And so in one company, there was a DBA who decided to do systematic experiments, in order to find out the optimal fetch size for the in-house application. As it turned out, the optimal value for that application lay around 80. So the DBA told the developers, it was integrated into the code, and released to pre-prod. All was fine or at least nobody said the opposite.

Then inevitably, there came the day of the production release, and all wasn’t fine any more … The application server kept crashing with out of memory errors, and they had to do an emergence release, and no-one never ever wanted to talk about this any more.
What had happened? How could a value of 80 (we’re not speaking 10000 here) have such a catastrophic effect?

So that’s for the teaser… now there’s an Oracle Whitepaper where they describe how memory management in the JDBC driver was fundamentally changed between 11g and 12c. Will it be enough to read this paper, for people who have experienced those OOM errors, to be willing to try it again?
My idea was to perform an experiment and see for myself. And that’s what the talk was about, – how I set up the experiment and what was the outcome ;-). Here are the slides.

Write your own Swingbench transaction

Are you using – or planning to use – Swingbench?
Swingbench currently provides four different benchmarks, the most well-known being Order Entry, a TPC-C like benchmark based on the Oracle-shipped oe schema.
In its default configuration, Order Entry’s SELECT-INSERT-DELETE ratio is 50% – 30% – 20% (as documented in http://dominicgiles.com/swingbench/swingbench22.pdf).
Order Entry is highly customizable, you can modify the composing transactions’ weights and activate up to three more datawarehouse-like transactions that are not “on” by default. Still, depending on what you need to test, you might find that even by carefully choosing your transactions you cannot generate the kind of activity required.
In that case, it’s possible to write your own transactions.

Adapting the build is easy and well documented in the FAQs. What I’d like to do here is give a very basic code example, stripped down to the requirements of integration into the Swingbench framework and performing a basic JDBC action.

I’ll break the code up in pieces first; a complete listing will be found at the end of the post.

A basic Swingbench transaction

Swingbench transactions are written in Java. There are two types of transactions, found in two different subdirectories – those which simply use Java to call, and retrieve the results from, stored procedures and those which use JDBC to perform the distinct steps of parsing, executing and fetching.

Here I assume we need more control over single parts of this process (which is why we’re writing our own code in the first place, instead of varying, e.g., the size of the data), so we’re going to write a JDBC-based transaction.

First, the package declaration (surprisingly called mytransactions here) and the imports. Importing OracleConnection is only needed if you’d like to instrument the code for tracing. (Aside: using EndToEndMetrics is deprecated as of 12c, but to use DMS metrics – the recommended alternative – a special jar file, ojdbc_dms.jar, is required).

package com.dom.benchmarking.swingbench.mytransactions;

// the Swingbench framework
import com.dom.benchmarking.swingbench.kernel.DatabaseTransaction;
import com.dom.benchmarking.swingbench.event.JdbcTaskEvent;
import com.dom.benchmarking.swingbench.kernel.SwingBenchException;
import com.dom.benchmarking.swingbench.kernel.SwingBenchTask;

// java classes needed by either Swingbench or our own code
import java.util.Map;
import java.util.Random;
import java.util.logging.Logger;

// JDBC
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

// Oracle JDBC (optional, needed by setEndToEndMetrics)
import oracle.jdbc.OracleConnection;

Next, the class declaration and some instance variables

// Extend DatabaseTransaction to use the Swingbench framework
public class FetchLotsOfData extends DatabaseTransaction {
  
  // add our own logging output to Swingbench's debug log
  private static final Logger logger = Logger.getLogger(FetchLotsOfData.class.getName());
  // Just one query here, I can put it outside of any methods
  private String query = "select * from mining_info where customer_id between ? and ?";
  
  private Random random = new Random();

All classes you write have to extend DatabaseTransaction to use the Swingbench framework. The JDBC transactions that come with Swingbench all extend OrderEntryProcess, a common superclass that provides much shared functionality. So if your test is going to run against the Swingbench-provided tables, and possibly using similar logic, check if you can make use of the functionality provided there. (In that case, in might make sense to put your class in the same package, instead of creating a separate one, or else you might have to change some instance variables to protected in OrderEntryProcess.java).
Here we’re not need to use any common helper methods and so, it is sufficient to extend DatabaseTransaction.

Now, every transaction has to implement three methods, init, execute, and close. Of these, you might just need execute, which is where you place the queries you want to be benchmarked.

The init method can be used for setup activities, e.g., getting data from the database (as in BrowseProducts.java) or reading initialisation files (as in NewCustomerProcess.java). The close method is not used in any of the provided transactions.

So here is our execute method, with comments. The JDBC action itself is kept deliberately simple here, as we’re focusing on the framework integration.

 
  public void execute(Map params) throws SwingBenchException {
    
    // get a connection; how Swingbench does this, is transparent for us
    Connection conn = (Connection)params.get(SwingBenchTask.JDBC_CONNECTION);
    // initialise the counters/metrics
    initJdbcTask();

    // setting the module so we can trace this on the server (optional)
    String metrics[] = new String[OracleConnection.END_TO_END_STATE_INDEX_MAX];
    metrics[OracleConnection.END_TO_END_MODULE_INDEX] = "FetchLotsOfData";
    
    // start measuring
    long executeStart = System.nanoTime();
    
    PreparedStatement stmt = null;
    ResultSet rset = null;
    
    try {
    
      ((OracleConnection) conn).setEndToEndMetrics(metrics, (short) 0);
      logger.fine("Preparing query: " + query);
      stmt = conn.prepareStatement(query);
      int low = random.nextInt(900);
      int high = low + 100;
      stmt.setInt(1, low);
      stmt.setInt(2, high);
      rset = stmt.executeQuery(); 
      
      String s;
      while (rset.next ()) {
	s = "Customer " + rset.getInt(1) + "(run " + rset.getInt(2) + "): " + "";
	// do something with s if you want
      }
      rset.close();
      
      // tell Swingbench we performed one select
      addSelectStatements(1);
      
      logger.fine("Result set closed.");
      stmt.close();
      logger.fine("Statement closed.");
      
      // tell Swingbench our execution time, and that execution was successful
      processTransactionEvent(new JdbcTaskEvent(this, getId(), (System.nanoTime() - executeStart), true, getInfoArray()));
      
    } catch (SQLException se) {
    
      se.printStackTrace();
      // tell Swingbench our execution time, and that we failed
      processTransactionEvent(new JdbcTaskEvent(this, getId(), (System.nanoTime() - executeStart), false, getInfoArray()));
      // re-throw the exception, Swingbench will display it - and the nested SQLException - on standard out
      throw new SwingBenchException(se.getMessage());
    
    } finally {
    
      // these two convenience methods are provided by Swingbench, they just catch the exceptions possibly thrown by closing
      // the statement / result set
      hardClose(rset);
      hardClose(stmt);
      
    }
  }

What’s important here is the exception handling. JDBC methods throw SQLException; this has to be handled while still keeping track of the stats.
In case the transaction was successful, we just call processTransactionEvent and proceed with the benchmark; otherwise we catch the exception, do the bookkeeping for a failed transaction, and rethrow the exception for Swingbench to handle as it sees fit (it will be displayed on the command line).
In both cases, we make sure the resources (ResultSet and Statement) are closed.

So that’s it – for you to implement more complex transactions :-). Here again is the complete class:

package com.dom.benchmarking.swingbench.mytransactions;

// the Swingbench framework
import com.dom.benchmarking.swingbench.kernel.DatabaseTransaction;
import com.dom.benchmarking.swingbench.event.JdbcTaskEvent;
import com.dom.benchmarking.swingbench.kernel.SwingBenchException;
import com.dom.benchmarking.swingbench.kernel.SwingBenchTask;

// java classes needed by either Swingbench or our own code
import java.util.Map;
import java.util.Random;
import java.util.logging.Logger;

// JDBC
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

// Oracle JDBC (optional, needed by setEndToEndMetrics)
import oracle.jdbc.OracleConnection;

// Extend DatabaseTransaction to use the Swingbench framework
public class FetchLotsOfData extends DatabaseTransaction {
  
  // add our own logging output to Swingbench's debug log
  private static final Logger logger = Logger.getLogger(FetchLotsOfData.class.getName());
  // Just one query here, I can put it outside of any methods
  private String query = "select * from mining_info where customer_id between ? and ?";
  
  private Random random = new Random();
  public void execute(Map params) throws SwingBenchException {
    
    // get a connection; how Swingbench does this, is transparent for us
    Connection conn = (Connection)params.get(SwingBenchTask.JDBC_CONNECTION);
    // initialise the counters/metrics
    initJdbcTask();

    // setting the module so we can trace this on the server (optional)
    String metrics[] = new String[OracleConnection.END_TO_END_STATE_INDEX_MAX];
    metrics[OracleConnection.END_TO_END_MODULE_INDEX] = "FetchLotsOfData";
    
    // start measuring
    long executeStart = System.nanoTime();
    
    PreparedStatement stmt = null;
    ResultSet rset = null;
    
    try {
    
      ((OracleConnection) conn).setEndToEndMetrics(metrics, (short) 0);
      logger.fine("Preparing query: " + query);
      stmt = conn.prepareStatement(query);
      int low = random.nextInt(900);
      int high = low + 100;
      stmt.setInt(1, low);
      stmt.setInt(2, high);
      rset = stmt.executeQuery(); 
      
      String s;
      while (rset.next ()) {
	s = "Customer " + rset.getInt(1) + "(run " + rset.getInt(2) + "): " + "";
	// do something with s if you want
      }
      rset.close();
      
      // tell Swingbench we performed one select
      addSelectStatements(1);
      
      logger.fine("Result set closed.");
      stmt.close();
      logger.fine("Statement closed.");
      
      // tell Swingbench our execution time, and that execution was successful
      processTransactionEvent(new JdbcTaskEvent(this, getId(), (System.nanoTime() - executeStart), true, getInfoArray()));
      
    } catch (SQLException se) {
    
      se.printStackTrace();
      // tell Swingbench our execution time, and that we failed
      processTransactionEvent(new JdbcTaskEvent(this, getId(), (System.nanoTime() - executeStart), false, getInfoArray()));
      // re-throw the exception, Swingbench will display it - and the nested SQLException - on standard out
      throw new SwingBenchException(se.getMessage());
    
    } finally {
    
      // these two convenience methods are provided by Swingbench, they just catch the exceptions possibly thrown by closing
      // the statement / result set
      hardClose(rset);
      hardClose(stmt);
      
    }
  }