Thursday, January 27, 2011

More elegant forms of returning records.

Like the blog subtitle says, I learn PL/pgSQL as I go, and share the lessons with you.  A couple of weeks ago, I posted on returning a few records using OUT parameters.  I used that method because it is what I knew.  A reader, Tzvi R., kindly commented with a couple of ways to return records more gracefully.  In this post, I am giving those methods more prominence than the comment provided.

SETOF RECORD
Here the record is defined within the function, keeping the parameter list a clean list of inputs.  A limitation of this approach is that since the record definition is hidden from the caller, the column list must be provided as part of the retrieving SELECT.
CREATE OR REPLACE FUNCTION squares(ct INT)
   RETURNS SETOF RECORD
AS $$
DECLARE
  v_rec RECORD;
BEGIN
  FOR i IN 0..ct-1 LOOP
    SELECT i, POWER(i,2)::INT INTO v_rec;
    RETURN NEXT v_rec;
  END LOOP;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE;

SELECT * FROM squares(5) AS (A INT, B INT);

Line 5 creates a variable, of indeterminate shape, for the output record.  Line 8 creates the record using a SELECT .. INTO, and inserts it into the variable, giving the variable its shape.

Line 9 is a RETURN NEXT.., which adds the record to the list of records to eventually return.  This form of RETURN does not terminate the function, but rather augments the return value, which gets returned whenever the function returns.  In this example, we just passively fall out the bottom of the function, but a bare RETURN can be used to explicitly exit the function.  The bare RETURN does not interfere with the returning of the built-up result set.

Predefined Types
If you define the record type outside of the function, and use it in the function definition, the caller no longer needs a column list, but determines columns from the record type.
CREATE TYPE square_type AS (a INT, b INT);

CREATE OR REPLACE FUNCTION squares(ct INT)
   RETURNS SETOF square_type
AS $$
DECLARE
  v_rec square_type;
BEGIN
  FOR i IN 0..ct-1 LOOP
    SELECT i, POWER(i,2)::INT INTO v_rec;
    RETURN NEXT v_rec;
  END LOOP;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE;

SELECT * FROM squares(5);

Line 1 is the type definition, defining 'square_type' as a pair of INT fields.  Lines 4 and 7 define the return type and the variable type as 'square_type'.  Because the caller can see the return type of the function from the function definition, it knows to treat the return values as pairs of INT columns, and Line 17 can be appealingly spare.

In my opinion, this is the most elegant of the three forms; it does require an additional line to define the type, and an additional addition to the namespace, but it reads easily.  The next best is the OUT parameter method. Having to specify the row shape in the calling SELECT, as in the first example above, is just too awkward.

One caution is that there is no CREATE OR REPLACE TYPE <typename>, so you need to use a DROP TYPE <typename> if you wish to repeat the execution of the above quoted code block complete.   In another week or two, I plan to show how to use Exception handling to gracefully absorb the errors produced by redundantly creating a type.

Thank you, Tzvi, for your assistance.

2 Feb changed function types to IMMUTABLE

Thursday, January 20, 2011

Beyond Little Bobby Tables

The xkcd comic about 'little bobby tables' is a classic, widely known and often quoted:



the strip title is 'Exploits of a Mom', and the punch-line is 'I hope you've learned to sanitize your database inputs'.

The comic gets quoted and linked often, generally to illustrate the point about sanitizing database inputs.  While the verb sanitize is a terse comic-scale expression to represent avoiding SQL injection attacks, it is only part of securing the database.   To sanitize is to remove the dangerous parts of the input.  Sanitizing the input in the xkcd example would presumably involve escaping the singe quote, and would have protected the database.

However, there are other ways to accomplish the desired safety.

  • If you use parameterized queries, the inputs are quoted for you.  
  • If you embed the query in a procedure, with typed parameters, the parameters are firstly forced to match the parameter type, and are then safely interpolated into the embedded query.
  • Using a limited role to run the query, a role that does not have permissions to do more hazardous operations than are required for the query.  For example a role sufficient to the xkcd example would require INSERT privilege, but not DROP nor DELETE nor CREATE.
These three can all be used together, providing multi-layered security.  Using procedures with typed parameters will neutralize SQL injections which are matched to parameter types other than char, varchar, text, and blob.   The SQL injections which map to valid textual types reach the query, but are implicitly quoted and escaped.  Should someone defeat the escaping mechanism somehow (there have been a few exploits in PostgreSQL over the years, not necessarily in the escaping), then the role privilege limits would prevent the most severe of data losses.  


Thursday, January 13, 2011

Returning a few rows

This week, we will look at returning a few rows from a function. Database programming typically involves getting sets of records, so this output ability is natural to look for, in a programming language that runs internally to a database engine.

Here's an example:

CREATE OR REPLACE FUNCTION squares(IN ct INTEGER,
                                   OUT INTEGER, OUT INTEGER)
    RETURNS SETOF RECORD
AS $$
    BEGIN
         FOR i IN 0..ct-1 LOOP
             RETURN QUERY SELECT i, POWER(i,2)::INTEGER; 
        END LOOP;
    END;
$$ LANGUAGE 'plpgsql';

SELECT * FROM squares(3);

Record Type
The third line declares the return type to be SETOF RECORD, indicating that it will return a variable number of elements whose type is RECORD, to be specified elsewhere.  The 'elsewhere' is the set of OUT parameters, which indicate that the RECORD is two integers.  The OUT variables are shown in line 2.  Line 1 names the function, and declares one input parameter to take a count limit.

Lines 6 through 8 define a loop, which iterates over values from 0 to the count limit.  Each iteration runs line 7, which we will discuss in the next paragraph.  Lines 4,5,9,10 are just boilerplate that is found in every PL/pgSQL function. Line 12 provides a context in which to run the function and display the results.

Get Record from SELECT
Line 7 is:  RETURN QUERY SELECT i, POWER(i,2):INTEGER.  The RETURN tells us this line is part of the results.   The QUERY catches the results of the subsequent SELECT, as an ordinary SELECT statement is not permitted in PL/pgSQL.  The SELECTs in PL/pgSQL must be qualified somehow, with an INTO clause or a QUERY as above.  The rest of the line just creates a two part record of the counter index and its square.  The ::INTEGER cast avoids a type error; as POWER is a NUMERIC returning function.

Starting in version 8.4, there is a nice TABLE syntax, for returning records and I will discuss that in a future post.  The Rdbhost server does not support 8.4 at present, but is scheduled for an upgrade.

Multiple Records
If the embedded query returns multiple records, that works too, as all such records get included; witness this variation to the example, which provides both squares and cubes:

BEGIN
  FOR i IN 0..ct-1 LOOP
    RETURN QUERY       SELECT i, POWER(i,2)::INTEGER
                 UNION SELECT i, POWER(i,3)::INTEGER; 
  END LOOP;
END;

RETURN NEXT
An alternate way to return records is with the RETURN NEXT syntax.  This example names the two output parameters, assigns to them, and then adds their aggregate to the return set.

DROP FUNCTION squares(INTEGER);
CREATE OR REPLACE FUNCTION squares(IN ct INTEGER,
                                   OUT a INTEGER, OUT b INTEGER)
    RETURNS SETOF RECORD
AS $$
    BEGIN
         FOR i IN 0..ct-1 LOOP
             a := i;
             b := POWER(i,2)::INTEGER; 
             RETURN NEXT;
        END LOOP;
    END;
$$ LANGUAGE 'plpgsql';

SELECT * FROM squares(3);
Line 3 names the output parameters.  Lines 8 and 9 put values into those named output parameters.  Line 10 RETURN NEXT creates a record from the above parameters, and adds it to the result set.  The columns created are named after the parameters, so you might want names longer and more informative than 'a' and 'b'.

Later PostgreSQL versions support more elegant syntax for returning records, but the above patterns do the job, and can still be used in current versions.

That's enough for this week.  This week, we looked at getting multiple records out of the function.  Next week, we will look at getting multiple rows into the function to work with.

Thursday, January 6, 2011

What's next..

PL/pgSQL is basically a language for writing functions. You cannot just inline a PL/pgSQL statement into a query wherever.  You define functions, and then use those functions to enhance queries, or to perform automated actions as triggers.

I will not be covering PL/pgSQL comprehensively, but in terms of what I would find useful to have.  Here is a short list of stuff I hope to cover:

  1. How to structure a function; what are the minimum set of parts.
  2. How to update or insert a record with a PL function call.
  3. How to return a simple calculated value
  4. How to return a few selected records, with and without enhancing calculations
  5. How to return generated data as if they were a set of records
  6. How to return summaries of selected data
  7. How to write adaptive functions, that change their return sets based on what tables they are used on.
  8. How to create a trigger from a function.

I hope to write a post a week, published on Thursday, but I'm only 4 or 5 weeks in, and am already struggling to keep up, using this outline as a stop-gap post for this week.   I hope to have something more substantial for next week, and stay a post or two ahead; we will see how it goes.



Thursday, December 30, 2010

A row at a time.

This week, we will be discussing functions that use exactly one row.  Functions can take individual rows as parameters, and can generate individual rows.  We will be discussing various syntax forms used for these purposes.

But first, an aside:

As I am trying to think up examples of good PL/pgSQL functions, I keep asking myself the question: why use a function at all, rather than plain old SQL?   One reason is that a function can execute as the creator role, even when called by another role.  So it functions as a 'sudo', allowing narrow functionality to be used by a limited privilege role that would otherwise require a higher privilege.  The 'SECURITY DEFINER' phrase in the function definition gives the function that quality.  The default would be 'SECURITY INVOKER', where the function runs at the privilege of the caller.
CREATE OR REPLACE FUNCTION sudo( INT, INT )
  RETURNS INT
AS $$
BEGIN
  -- some operation requiring elevated privilege
END;
$$ LANGUAGE plpgsql
  SECURITY DEFINER;
If this function was created by a role with greater privilege (such as the Super role at an Rdbhost database), then it can be executed by another role (say a Reader or Preauth role at Rdbhost), and it can affect tables and other resources as if it were executed by the greater role.

Back to rows:

Table-name as Type
If your function is to accept a row of a particular table, you can just use the table name as the type in the parameter list.  FWIW, you are actually specifying an aggregate type which happens to share a name with the table.
CREATE OR REPLACE FUNCTION ctysize( cty capitols )
 RETURNS TEXT
AS $$
  BEGIN
    IF    cty.population > 10000000 THEN
         RETURN 'LARGE';
    ELSIF cty.population > 1000000 THEN
         RETURN 'MEDIUM';
    ELSE
         RETURN 'SMALL'; 
    END IF;
  END;
$$ LANGUAGE plpgsql;

SELECT ctysize(ROW(city,country,population)) FROM capitols LIMIT 3;
This could have been done in plain SQL using a CASE, but this does illustrate how to create a ROW from a set of fields.

There does not seem to be a way to pass a generic table-row as a parameter, so the above could not be written to categorize the populations of just any table, generically, that has a population field.

ROWTYPE
Before we move on to outputting rows, let's look at a syntax feature that can facilitate creating rows.  We can declare a variable, in the DECLARE section of the function, to be of a table's ROWTYPE.  It has attributes for each column of that table, which can then be assigned to.  They can be read, as well, though they will initially be NULL.
DECLARE
    ctyrec capitols%%ROWTYPE;
  BEGIN 


RETURNS tablename
This example illustrates using a table-name as the return type, meaning that the return value will match that row type.  If you are testing these on Rdbhost, you need to double up the '%', as the '%R' gets confused with a substitution token and errors about 'too few arguments'.
CREATE OR REPLACE FUNCTION newcty(name TEXT, nation TEXT, size INT)
 RETURNS capitols
AS $$
  DECLARE
    ctyrec capitols%ROWTYPE;
  BEGIN
    ctyrec.city := name;
    ctyrec.country := nation;
    ctyrec.population := size;
    RETURN ctyrec;
  END;
$$ LANGUAGE plpgsql;

SELECT * FROM newcty('caracas','venezuala',10000);
The example shows how to create a new record using values provided as arguments.  It could be used as input to an SQL insert, like:
INSERT INTO capitols SELECT * FROM newcty('caracas','venezuala',10000);
Again, this example is so lame that it could be replaced with a straight SQL query of similar complexity.  Good simple examples are hard to find.  I was disappointed to discover that the ROWTYPE declaration does not imply inclusion of constraints.  You can put values into the compound-variable that are not permissible in the table, and you won't learn until you attempt to insert it into the table.

RETURNS record
In the last code sample, we saw how a function can return a record by declaring the return type as the tablename, and using a record of that row-type as the return value.

An alternative method of returning a row value is to use OUT parameters.  The parameters represent columns of the resulting row.  This example uses only OUT parameters, though you can use IN (and INOUT) parameters in the same parameter list.  In this example, a row of 3 values is returned, and that row is identical to the above example.
CREATE OR REPLACE FUNCTION newctyrec( name TEXT, nation TEXT, pop INT,
                                      OUT TEXT, OUT TEXT, OUT INT )
 RETURNS record
AS $$
  BEGIN
    $4 = name;
    $5 = nation;
    $6 = pop;
  END;
$$ LANGUAGE plpgsql;

SELECT newctyrec('caracas','venezuala',1000);
The IN parameters are not declared as IN, since that is the default mode.  The OUT parameters are referenced by their number, as I did not give them names.  The 'record' return type represents whatever row shape is indicated by the OUT parameters.  Since the OUT parameters are not in fact, parameters at all, the confusion factor in this is high.  I recommend formatting your function definitions so that the OUT parameters are on their own line.

Forget any notion of reference variables from other languages, which they may sorta look like.  They are just a list of columns to return, confusingly appended to the parameter list.

The code sample above returns the following record.
(caracas,venezuala,1000)
There it is; not the prettiest syntax, but it works.

Thursday, December 23, 2010

PostgresOnline: Quick Guide to Writing PLPGSQL

Just this morning, I discovered a good short series on writing PL/pgSQL functions.  These guides are written as part of the Postgres Online Journal, by Regina Obe and Leo Hsu.  Aside from this short series (from back in 2008, but still very relevant), there is a lot of interesting stuff there.

The Quick Guide to Writing PL/pgSQL.
Part-1
Part-2
Part-3

They also produce a nice Cheat-sheet for PL/pgSQL.
Cheat-Sheet

Thursday, December 16, 2010

PL/pgSQL dabbling for free



If you do not have a PostgreSQL server available to dabble with, and you wish to do some learning by experimentation, free database accounts are available at Rdbhost.com.

These databases are accessible through a small variety of APIs, but the straightforward way for our purposes is to use the on-site admin tool, RdbAdmin.

Let's walk through the process of creating a new database on Rdbhost, and trying some simple PL/pgSQL functions there.

Visit http://www.rdbhost.com.  Provide your email address in the 'Make Me a Database' form, skim the terms of service, click the box acknowledging them, and submit.




Rdbhost will email you a registration letter, which you should get and read, and copy the password contained therein onto the clipboard.  Return to site, and login:


Upon your first login, the site will create your database, and show you your profile page, resembling this next image.  I recommend that, at this point, you change the password to something memorable to you, so that you can easily login in the future.:


Click the 'Rdbadmin' link to open the RdbAdmin app:



The 'SQL Command' button will open an edit box for the entry of SQL and PL/pgSQL commands:


Enter an SQL or PL/pgSQL query and click the 'Execute' button.  Results of your query will appear above the edit box:



If you enter a syntax error into your PL/pgSQL, PostgreSQL will issue an error message, and RdbAdmin will display it like this:


You will likely need data tables to experiment with before you go too far, and such tables can be created a couple of ways.  First, you can just write the raw CREATE TABLE SQL into the SQL edit field, or you can use the Create Table feature:


There are features to assist in creating views, schemata, and indexes as well, but none yet for creating functions or triggers.  Use the SQL edit box to create functions using the raw SQL; after all, we are here to exercise our PL/pgSQL writing abilities are we not?

Next week, I will show how to create and return a table record with a PL/pgSQL function.