Saturday, May 31, 2014

Stored Procedures vs UDX


Calling a stored procedure


The standard way to call a stored procedure is by using the CALL statement.
call myprocedure(arguments);

You can also invoke a stored procedure using a SELECT statement, but you can't include a FROM clause. So this works:
select myprocedure('foo', 'bar', 123);

This will throw an error:
select myprocedure(x,y,z) from foo;

It is also important to note that a stored procedure cannot be called as part of an expression, even from within NZPLSQL code. So the code below is also illegal and will result in an error.
select myprocedure(arguments) + 100;
-- or in NZPLSQL code
result := myprocedure(arguments) + 100;

Stored Procedure Performance

It is important to understand that a stored procedure runs as a single process on the NPS host, it does not run in parallel.  Any SQL statement issued by a stored procedure behaves just as SQL does when issued by any other source (the SQL runs in parallel on the NPS SPUs).  The primary use case for stored procedures is to control the execution of a multiple step process where all of the work done by each step is performed by a SQL statement.  A stored procedure can be used to dynamically generate a SQL statement based on the inputs to the procedure or the values in a table, but it should never be used to actually process large amounts of data returned from a SQL statement.  Any logic which needs to be applied to a large number of individual rows should be done in a SQL statement or a UDX, never by using IF statements in the procedure itself.  

Procedures vs. Functions

Procedures in Netezza are different than functions. A procedure can only be used as shown above (not with a FROM clause), but a procedure can run SQL. A function can be used with a FROM clause and can process many rows of data instead of only one, but a function cannot issue SQL. In Netezza we usually call them UDF (user defined function), UDA (user defined aggregate), UDTF (user defined table function), or UDAP (user defined analytic process). The term UDX is often used to mean any type of user defined program. A UDX can only process data using the columns of data which are passed into the UDX, it has no capability to issue SQL to get additional lookup information etc. All of the logic must be based only on the arguments passed into the function.

Why SQL isn't allowed in a UDX

When SQL is used inside of a user defined function it limits the ability of the database to run the function in parallel. It is common in non parallel databases to allow functions to issue SQL statements, but in an massively parallel database that would limit scalability. In a MPP database a SQL statement is always executed across many nodes in parallel. Taking a TwinFin-12 as an example, when a SQL statement runs it is actually executing 92 parallel processes across 12 or 14 s-blades. If each of those parallel processes were to run a SELECT statement, each SQL statement would also run in parallel (creating 92 parallel processes for each statement), or a total of 92*92 = 8464 parallel processes. This of course wouldn't work as just one SQL statement would result in more processes than the database is capable of running. Databases which allow SQL to run inside of a function must therefore limit the functions from running in parallel. For that reason Netezza provides two distinct types of user defined programs. Stored procedures run as a single process on the host and are allowed to issue SQL. User defined functions run in parallel but cannot issue SQL.


1 comment: