Archives

Tagged ‘Oracle Data Integrator and hsqldb‘

Under the hood of the Sunopsis Memory Engine. Part 2

In the first part of this series we’ve had a high level look at the ODI HSQLDB in memory databasase aka the Sunopsis Memory Engine (SME).

As promised we’ll today have a look at one of the use cases for the SME, in-memory low data volume transformations. This could be particularly useful for temp (yellow) interfaces where you don’t want to physically set down your data.

Before we do this, however, I wanted to introduce another use case for the SME that came to my mind the other day. As HSQLDB allows you to use Java methods as function calls in SQL transformations you could make use of this in ODI interfaces to extend functionality not readily available in your target RDBMS. Data encryption and other security related use cases, e.g. generating MD5 hashes come to my mind.

Below is a screenshot of how this can look like in ODI. You need to make sure to either execute the call to the Java method on the Staging Area, or if you don’t reference a source column on the Target.

sme1

For more information on how to achieve this refer to the HSQLDB documentation. I believe it is possible to use  your own Java methods this way. You need to compile them in class/jar files and reference them in the odiparams.bat classpath. I haven’t tried this out myself but I believe that’s the way it should work.

Let’s move on to our other use case now though. Below I have created a simple temp interface that uses the Oracle sh.sales table to load data into a table in the Sunopsis Memory Engine.

sme2

We then execute this interface a couple of times.

On first execution the memory allocated to the ODI agent JVM jumps to 81.6 MB.

sme3

The next execution brings this up to 123.7 MB

sme4

Yet another execution brings memory usage to 178.6 MB. From the Private Bytes Memory graph you can see that the memory allocation grows linearly as we insert into the s_sales table in the SME.

sme5

We execute the interface a few more times until we finally hit the memory limit specified in odiparams.bat.

sme7

Subsequent executions of the interface will throw an error along the lines of the figure below.

sme6

As you can see we need to clean up after ourselves when using the in memory engine. A simple DROP TABLE in a procedure will drop the in memory table and release memory.

sme9

As you can see from the figure below memory is allocated when the table is loaded and then de-allocated when it is dropped

sme8

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)

Under the hood of the Sunopsis Memory Engine. Part 1.

Oracle Data Integrator not only uses hsqldb for its demo environment. It is also used as the in memory database for the so called Sunopsis Memory Engine.

“It is possible to run HSQLDB in a way that the database is not persistent and exists entirely in random access memory. As no information is written to disk, this mode should be used only for internal processing of application data, in applets or certain special applications. This mode is specified by the mem: protocol”. (hsqldb documentation)

ODI uses version 1.7.3 of hsqldb. You can download this version together with the documentation from here.

When you use the Sunopsis Memory Engine in an ODI interface or procedure an hsqldb in-memory database is created inside the JVM of the agent. The memory that is allocated to the hsqldb is taken from the JVM. As a result the maximum memory size that can be allocated to the in memory database is determined by the heap size parameter in the odiparams.bat file.

The JDBC Url syntax that is used in ODI is actually the old syntax as can be seen from the figure below.

1

The correct syntax for the above JDBC Url should be jdbc:hsqldb:mem:<name_of_db>. As you can see you can specify the name of the database. This allows you create multiple hsqldb databases inside the same JVM.

“Version 1.7.2 introduces a uniform method of distinguishing between different types of connection, alongside new capabilities to provide access to multiple databases. The common driver identifier is jdbc:hsqldb: followed by a protocol identifier (mem: file: res: hsql: http: hsqls: https:) then followed by host and port identifiers in the case of servers, then followed by database identifier”  (hsqldb docs).

When you create an hsqldb in-memory database it persists over the lifetime of the JVM, until the agent is stopped, or if you use the local agent until you close down Designer. Alternatively you can issue a Shutdown command against the hsqldb database. Any of the above will destroy objects that were created and will release the allocated memory. It is important to understand that memory is not released upon execution of an interface or when an ODI session ends. Tables or other objetcs that were created in an ODI package persist across the lifetime of this package. This is quite different to the lifecycle of say a Jython/Java BeanShell variable. You need to make sure that you clean up after yourself by e.g. dropping any objects you create.

As far as I know you can’t connect from an hsqldb mem database inside a JVM to one in another JVM.

“In-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients” (hsqldb docs).

This means that you can’t use a client such as Squirrel SQL to connect to tables that you create in an interface. What you can do however, to debug is to manually create datastores in a Sunopsis Memory Engine model and use the local agent to execute interfaces that create and populate this table. You can then use the View Data functionality to view the content of the datastore. This is also documented in note 424597.1.

2

When does it actually make sense to use the hsqldb in mem database? I can see two scenarios where this feature may be useful.

First, it can be used for simple low volume transformations (less than 100K records; exact number really depends on the parameters of your own environment such as JVM heap size, concurrency etc.). This can be particularly useful for temporary interfaces if you don’t want to physically materialize the data on disk. Remember, however, that it is not the purpose of the ODI agent that runs the in mem database to do a lot of heavy lifting. This would not make any sense in an ELT tool where all the heavy duty stuff is done by the target RDBMS.

Note: You need to be careful not to run out of memory when you use the in mem database for this purpose. I will show you in the second part of this series what best practices you can use to achieve this. I am not sure how efficient hsqldb is in executing complex multi-table joins with subqueries. Currently it also does not support analytic functions, which in my opinion is a big disadvantage.

Secondly, the Sunopsis Memory Engine can be used to persist variables across sessions and scenarios or to populate ODI variables from Jython or the Java BeanShell. This is handy as all your logic is executed inside the JVM of the agent and you don’t have to switch back and forth between different technologies. As a result you will keep overhead to a minimum. We will have a look at how this can be achieved in the third part of this series.

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site