Archives

Tagged ‘Sunopsis Memory Engine‘

Under the hood of the Sunopsis Memory engine. Part 3.

This is the final part in our series on the HSQLDB in memory engine. In the first part of this series we’ve had a look at the Sunopsis Memory Engine (SME) from a high level. In the second part we then looked at two use cases namely using Java methods directly in SQL transformations and using the in memory database for simple, low volume transformations in memory. Craig Stewart commented on this last post and mentioned it would be useful to set a JDBC property that would clean up memory after a disconnect, similar to the drop_on_disconnect JDBC property of the ODI XML driver. I had a look in the HSQLDB documentation and came across the SHUTDOWN property. This property shuts down the database after the last session has been disconnected. The problem is that it’s not available yet in version 1.7.3 (the one that ODI uses). As a result we need to clean up objects ourselves or upgrade the HSQLDB to a higher version. If I find some spare time I will blog on how to do this. Also the other day I came across a corrupted HSQLDB. Another subject I may blog about some time soon.

Ok. Let’s have a look at today’s topic. The SME can be very useful to act as a temporary data store, e.g. you can store a Jython variable in the SME, from there assign it to an ODI variable and then pass it on to another scenario. In today’s post I will show you how to achieve this.

We will first create a procedure using Jython technology. In this procedure we will take a Jython variable and store it in a table in the Sunopsis Memory Engine. In this example we grab the OS username of the logged on user and store it in a table named TTEMP in the in memory database. Thanks to Maciej Kocon for the code.

#matshyeq:20100207
import os, re, time
import java.sql as sql
import java.lang as lang
import os

currentuser = os.environ['USERNAME']

lang.Class.forName("org.hsqldb.jdbcDriver")
url = "jdbc:hsqldb:."
myCon = sql.DriverManager.getConnection(url,"sa","")

mySQL="create table TTEMP (osuser VARCHAR(30), val TINYINT, ts VARCHAR(30))"
try:
        prepStmt = myCon.prepareStatement(mySQL)
        prepStmt.execute()
except:
        pass
mySQL="INSERT INTO TTEMP VALUES('%s',1,NOW())" % currentuser
prepStmt.close()
prepStmt = myCon.createStatement()
prepStmt.executeQuery(mySQL)
prepStmt.close()
myCon.close()

Next we create an ODI variable V_OSUSER that gets its value from TTEMP.OSUSER.

SELECT osuser FROM ttemp

sme3_1

We take this variable and pass it into another scenario.

sme3_2

In this scenario we print the variable to the Operator log to verify that the value of the variable was passed on.

sme3_3

From the screenshot we can see that the OS user uli was printed out, which means that we successfully passed on our Jython variable value to another scenario.

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)

Under the hood of the Sunopsis Memory Engine. Part 2

In the first part of this series we’ve had a high level look at the ODI HSQLDB in memory databasase aka the Sunopsis Memory Engine (SME).

As promised we’ll today have a look at one of the use cases for the SME, in-memory low data volume transformations. This could be particularly useful for temp (yellow) interfaces where you don’t want to physically set down your data.

Before we do this, however, I wanted to introduce another use case for the SME that came to my mind the other day. As HSQLDB allows you to use Java methods as function calls in SQL transformations you could make use of this in ODI interfaces to extend functionality not readily available in your target RDBMS. Data encryption and other security related use cases, e.g. generating MD5 hashes come to my mind.

Below is a screenshot of how this can look like in ODI. You need to make sure to either execute the call to the Java method on the Staging Area, or if you don’t reference a source column on the Target.

sme1

For more information on how to achieve this refer to the HSQLDB documentation. I believe it is possible to use  your own Java methods this way. You need to compile them in class/jar files and reference them in the odiparams.bat classpath. I haven’t tried this out myself but I believe that’s the way it should work.

Let’s move on to our other use case now though. Below I have created a simple temp interface that uses the Oracle sh.sales table to load data into a table in the Sunopsis Memory Engine.

sme2

We then execute this interface a couple of times.

On first execution the memory allocated to the ODI agent JVM jumps to 81.6 MB.

sme3

The next execution brings this up to 123.7 MB

sme4

Yet another execution brings memory usage to 178.6 MB. From the Private Bytes Memory graph you can see that the memory allocation grows linearly as we insert into the s_sales table in the SME.

sme5

We execute the interface a few more times until we finally hit the memory limit specified in odiparams.bat.

sme7

Subsequent executions of the interface will throw an error along the lines of the figure below.

sme6

As you can see we need to clean up after ourselves when using the in memory engine. A simple DROP TABLE in a procedure will drop the in memory table and release memory.

sme9

As you can see from the figure below memory is allocated when the table is loaded and then de-allocated when it is dropped

sme8

In order to master scripting in ODI I recommend the following books.

Java BeanShell

Scripting in Java: Languages, Frameworks, and Patterns

Jython

The Definitive Guide to Jython: Python for the Java Platform.

Jython Essentials (O’Reilly Scripting)

Under the hood of the Sunopsis Memory Engine. Part 1.

Oracle Data Integrator not only uses hsqldb for its demo environment. It is also used as the in memory database for the so called Sunopsis Memory Engine.

“It is possible to run HSQLDB in a way that the database is not persistent and exists entirely in random access memory. As no information is written to disk, this mode should be used only for internal processing of application data, in applets or certain special applications. This mode is specified by the mem: protocol”. (hsqldb documentation)

ODI uses version 1.7.3 of hsqldb. You can download this version together with the documentation from here.

When you use the Sunopsis Memory Engine in an ODI interface or procedure an hsqldb in-memory database is created inside the JVM of the agent. The memory that is allocated to the hsqldb is taken from the JVM. As a result the maximum memory size that can be allocated to the in memory database is determined by the heap size parameter in the odiparams.bat file.

The JDBC Url syntax that is used in ODI is actually the old syntax as can be seen from the figure below.

1

The correct syntax for the above JDBC Url should be jdbc:hsqldb:mem:<name_of_db>. As you can see you can specify the name of the database. This allows you create multiple hsqldb databases inside the same JVM.

“Version 1.7.2 introduces a uniform method of distinguishing between different types of connection, alongside new capabilities to provide access to multiple databases. The common driver identifier is jdbc:hsqldb: followed by a protocol identifier (mem: file: res: hsql: http: hsqls: https:) then followed by host and port identifiers in the case of servers, then followed by database identifier”  (hsqldb docs).

When you create an hsqldb in-memory database it persists over the lifetime of the JVM, until the agent is stopped, or if you use the local agent until you close down Designer. Alternatively you can issue a Shutdown command against the hsqldb database. Any of the above will destroy objects that were created and will release the allocated memory. It is important to understand that memory is not released upon execution of an interface or when an ODI session ends. Tables or other objetcs that were created in an ODI package persist across the lifetime of this package. This is quite different to the lifecycle of say a Jython/Java BeanShell variable. You need to make sure that you clean up after yourself by e.g. dropping any objects you create.

As far as I know you can’t connect from an hsqldb mem database inside a JVM to one in another JVM.

“In-process (standalone) mode supports connections from the client in the same Java Virtual Machine, while server modes support connections over the network from several different clients” (hsqldb docs).

This means that you can’t use a client such as Squirrel SQL to connect to tables that you create in an interface. What you can do however, to debug is to manually create datastores in a Sunopsis Memory Engine model and use the local agent to execute interfaces that create and populate this table. You can then use the View Data functionality to view the content of the datastore. This is also documented in note 424597.1.

2

When does it actually make sense to use the hsqldb in mem database? I can see two scenarios where this feature may be useful.

First, it can be used for simple low volume transformations (less than 100K records; exact number really depends on the parameters of your own environment such as JVM heap size, concurrency etc.). This can be particularly useful for temporary interfaces if you don’t want to physically materialize the data on disk. Remember, however, that it is not the purpose of the ODI agent that runs the in mem database to do a lot of heavy lifting. This would not make any sense in an ELT tool where all the heavy duty stuff is done by the target RDBMS.

Note: You need to be careful not to run out of memory when you use the in mem database for this purpose. I will show you in the second part of this series what best practices you can use to achieve this. I am not sure how efficient hsqldb is in executing complex multi-table joins with subqueries. Currently it also does not support analytic functions, which in my opinion is a big disadvantage.

Secondly, the Sunopsis Memory Engine can be used to persist variables across sessions and scenarios or to populate ODI variables from Jython or the Java BeanShell. This is handy as all your logic is executed inside the JVM of the agent and you don’t have to switch back and forth between different technologies. As a result you will keep overhead to a minimum. We will have a look at how this can be achieved in the third part of this series.

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site