expdp on the command line

Of course, data pump is just great (think of all the functionality it provides, for data masking, sampling, and what not) – but mastering its syntax and semantics can be, well, a bit tedious 😉
Here, I’ll restrict myself to syntax (although semantics might be the more tricky part, think ANDED includes, combining data and metadata filters, etc.), – or to be precise, it’s not about expdp syntax but what arrives there after command processing by the shell.
Actually, perhaps we should not stress ourselves overly to get the command line right, – we’ve been told in advance …

Depending on your operating system, the use of quotation marks when you specify a value for this parameter may also require that you use escape characters. Oracle recommends that you place this parameter in a parameter file, which can reduce the number of escape characters that might otherwise be needed on the command line.

(this is actually all over the place, but see e.g. http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_export.htm#SUTIL853)

Still, sometimes it’s just nicer to put it in one line instead of creating a parfile (plus, you don’t litter the filesystem…). So, let’s try the command line. But first, as to my “sources” – there must be more, but I’ve found two excellent texts online on shell quoting and escaping:

Although I will be quoting the first one more, I really recommend the second text for its excellent coverage of the concepts, starting out with the order of evaluation (the terminal, the shell, the command) as pertains to quoting.

One other thing before we begin: My OS is OEL 6.5, and I’m using bash version 4.1.2.

Ok, let’s start. My example export contains a metadata filter (include) and a data filter (query), and in a parfile it would look like this:

[oracle@node5 ~]$ cat parfile
include=table:"like 'OBJECT%'"
query=objects1:"where object_id < 99"
dumpfile=test.dmp
reuse_dumpfiles=true
directory=dpdump_fs

A straightforward try

Now, let’s just simply try what happens when we do the same on the command line. Just to be sure all those quotes in the include clause really are needed, this is straight from the docs (http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_export.htm#SUTIL853):

INCLUDE = object_type[:name_clause] [, …]
The name_clause is optional. … It must be separated from the object type with a colon and enclosed in double quotation marks, because single quotation marks are required to delimit the name strings.

[oracle@node5 ~]$ expdp user/pw include=table:"like 'OBJECT%'" query=objects1:"where object_id < 99" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

Export: Release 11.2.0.4.0 - Production on Tue Jul 22 22:42:02 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORA-39001: invalid argument value
ORA-39071: Value for INCLUDE is badly formed.
ORA-00936: missing expression

The error message does not tell us too much, although for datapump, it is not too bad ;-))

An easy way to see the arguments expdp was passed is “stracing” the command:

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:"like 'OBJECT%'" query=objects1:"where object_id < 99" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

and looking at the first line:

[oracle@node5 ~]$ head -1 strace.out
execve("/oracle/base/product/11.2.0.4/dbhome_1/bin/expdp", ["expdp", "user/pw", "include=table:like 'OBJECT%'", "query=objects1:where object_id <"..., "dumpfile=test.dmp", "reuse_dumpfiles=true", "directory=dpdump_fs"], [/* 37 vars */]) = 0

Or you could write a simple shell script to just spell out the arguments:

got 6 arguments
arg 1 = user/pw
arg 2 = include=table:like 'OBJECT%'
arg 3 = query=objects1:where object_id < 99
arg 4 = dumpfile=test.dmp
arg 5 = reuse_dumpfiles=true
arg 6 = directory=dpdump_fs

So, what is missing here – for now, we’re just focusing on the include part – are the double quotes surrounding the where clause (we have table:like … instead of the required table:”like …”
Why? The double quotes prevented the space between like and ‘OBJECT%’ from being interpreted as a shell metacharacter, which means, as a delimiter of arguments. The space being quoted, like and ‘OBJECT%’ become part of the same argument. The quotes that accomplished this are removed and not passed to the command.
Citing from the aforementioned Unix/Linux Shell Command Line Quoting Mechanisms:

“The most common shell meta-character is the blank or space character that shells use to separate arguments. The shell does not pass any blanks and spaces to the command; the shell uses the blanks and spaces to separate and identify individual command line arguments.
[…]
Quoting is used to prevent the shell from acting on and expanding the meta-characters. The quoting causes the shell to ignore the special meaning of the character, so that the character gets passed unchanged to a command as part of an argument.
[…]
A “quoted” blank does not separate arguments.
[…]
The quoting delimits (identifies) a string of characters; the quoting mechanism is removed and is not part of the string passed to the command. Only the characters being quoted are passed to the command.”

So, even though what we visually perceive is the “grouping” quality of the double quotes, what really is happening is the blank being quoted and “neutralized”. Which means we should get the same result if we remove the double quotes but quote the space. Do we?

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:like\ 'OBJECT%' query=objects1:"where object_id < 99" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

[oracle@node5 ~]$ head -1 strace.out
execve("/oracle/base/product/11.2.0.4/dbhome_1/bin/expdp", ["expdp", "user/pw", "include=table:like OBJECT%", "query=objects1:where object_id <"..., "dumpfile=test.dmp", "reuse_dumpfiles=true", "directory=dpdump_fs"], [/* 37 vars */]) = 0

Not really. The space is “neutralized” but now, the single quotes get stripped out, for the same reason as above: the quoting mechanism served its purpose and gets removed.

Keeping the quotes

Ok, so let’s try escaping the double quotes to make sure we keep them.Also, let’s temporarily remove the query part so we clearly see where the problem occurs (should one occur).

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:\"like 'OBJECT%'\" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

Export: Release 11.2.0.4.0 - Production on Tue Jul 22 23:08:12 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORA-39001: invalid argument value
ORA-39071: Value for INCLUDE is badly formed.
ORA-00911: invalid character

oracle@node5 ~]$ head -1 strace.out

execve("/oracle/base/product/11.2.0.4/dbhome_1/bin/expdp", ["expdp", "user/pw", "include=table:\"like", "OBJECT%\"", "query=objects1:where object_id <"..., "dumpfile=test.dmp", "reuse_dumpfiles=true", "directory=dpdump_fs"], [/* 37 vars */]) = 0

What happened? One, the include argument gets split, because not being quoted, the space acts as an argument delimiter again.
Second, now the single quotes got split out, because now they are delimiters only. This is perhaps easier to see in an argument listing:

got 7 arguments
arg 1 = user/pw
arg 2 = include=table:"like
arg 3 = OBJECT%"
arg 4 = query=objects1:where object_id < 99
arg 5 = dumpfile=test.dmp
arg 6 = reuse_dumpfiles=true
arg 7 = directory=dpdump_fs

Keeping the single quotes

OK – to keep the single quotes we have to quote them, too.

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:\"like \'OBJECT%\'\"  dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "USER"."SYS_EXPORT_SCHEMA_05":  user/******** include=table:"like 'OBJECT%'" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs 
Estimate in progress using BLOCKS method...

oracle@node5 ~]$ head -1 strace.out
execve("/oracle/base/product/11.2.0.4/dbhome_1/bin/expdp", ["expdp", "user/pw", "include=table:\"like", "'OBJECT%'\"", "dumpfile=test.dmp", "reuse_dumpfiles=true", "directory=dpdump_fs"], [/* 37 vars */]) = 0

Now, this actually worked! But why? Quote-wise, all seems fine, expdp receives all the quotes it needs, but … the include argument is split! So evidently, expdp reassembles the argument, probably making use of the fact that the first part – include=table:\”like – misses its closing double quote.
Unfortunately, strace and/or ltrace do not help our understanding here, and while fascinating things can be achieved using gdb on oracle – just see Frits Hooglands blog and whitepapers – I don’t have sufficient experience with gdb yet to continue the analysis further here.
All I could say is that when the c processing is finished, the reassembled command is stored in the master table:

QL> select name, value_t from SYS_EXPORT_SCHEMA_05 where name in ('CLIENT_COMMAND', 'NAME_EXPR', 'INCLUDE_PATH_LIST', 'INCLUDE_PATH_EXPR');

NAME 			VALUE_T
---------- ------------------------------ ------------------------ 
CLIENT_COMMAND	 	user/******** include=table:"like 'OBJECT%'" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs keep_master=yes
NAME_EXPR		like 'OBJECT%'
INCLUDE_PATH_LIST	'TABLE'
INCLUDE_PATH_EXPR	IN ('TABLE')

A “cleaner” 😉 alternative that works, too

Of course, don’t take me seriously on the “cleaner” – but this might just be nicer because there is no “secret reassembling” here – just a solution that must work and does work 😉
We are not escaping the double quotes at all, instead using single quotes twice (i.e., for two blocks each delimited by single quotes) to protect the contained metacharacters (spaces and double quotes). The two single-quoted blocks are adjacent and so, get concatenated into one single argument:

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:'"like '\'OBJECT%\''"'  dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

Export: Release 11.2.0.4.0 - Production on Wed Jul 23 10:09:56 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "USER"."SYS_EXPORT_SCHEMA_05":  user/******** include=table:"like 'OBJECT%'" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs 
Estimate in progress using BLOCKS method...

oracle@node5 ~]$ head -1 strace.out

execve("/oracle/base/product/11.2.0.4/dbhome_1/bin/expdp", ["expdp", "user/pw", "include=table:\"like 'OBJECT%'\"", "dumpfile=test.dmp", "reuse_dumpfiles=true", "directory=dpdump_fs"], [/* 37 vars */]) = 0

Completing the command

Finally, let’s add back the query part. We know we have to escape the double quotes…

[oracle@node5 ~]$ strace -o strace.out user/pw include=table:\"like \'OBJECT%\'\" query=objects1:\"where object_id < 99\" dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs
-bash: 99": No such file or directory

This is easy: the < character is not quoted any more, so it gets evaluated as the redirection operator. There’s two things (at least) we could do: surround the double quotes by single quotes, effectively quoting both double quotes and the redirection operator, or just escaping the <:

[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:\"like \'OBJECT%\'\" query=objects1:'"where object_id < 99"' dumpfile=test.dmp reuse_dumpfiles=true directory=dpdump_fs

# alternatively...
[oracle@node5 ~]$ strace -o strace.out expdp user/pw include=table:\"like \'OBJECT%\'\" query=objects1:\"where object_id \< 99\" dumpfile=test.dmp 
reuse_dumpfiles=true directory=dpdump_fs

And that’s it – expdp working fine on the command line, and some insights gained (hopefully ;-)) on how the shell works … Of course, there’s always more questions than answers, and so, always more tools to be explored to see if they can help you (gdb might just be the next one for me).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s