Thursday, December 30, 2010

A row at a time.

This week, we will be discussing functions that use exactly one row.  Functions can take individual rows as parameters, and can generate individual rows.  We will be discussing various syntax forms used for these purposes.

But first, an aside:

As I am trying to think up examples of good PL/pgSQL functions, I keep asking myself the question: why use a function at all, rather than plain old SQL?   One reason is that a function can execute as the creator role, even when called by another role.  So it functions as a 'sudo', allowing narrow functionality to be used by a limited privilege role that would otherwise require a higher privilege.  The 'SECURITY DEFINER' phrase in the function definition gives the function that quality.  The default would be 'SECURITY INVOKER', where the function runs at the privilege of the caller.
CREATE OR REPLACE FUNCTION sudo( INT, INT )
  RETURNS INT
AS $$
BEGIN
  -- some operation requiring elevated privilege
END;
$$ LANGUAGE plpgsql
  SECURITY DEFINER;
If this function was created by a role with greater privilege (such as the Super role at an Rdbhost database), then it can be executed by another role (say a Reader or Preauth role at Rdbhost), and it can affect tables and other resources as if it were executed by the greater role.

Back to rows:

Table-name as Type
If your function is to accept a row of a particular table, you can just use the table name as the type in the parameter list.  FWIW, you are actually specifying an aggregate type which happens to share a name with the table.
CREATE OR REPLACE FUNCTION ctysize( cty capitols )
 RETURNS TEXT
AS $$
  BEGIN
    IF    cty.population > 10000000 THEN
         RETURN 'LARGE';
    ELSIF cty.population > 1000000 THEN
         RETURN 'MEDIUM';
    ELSE
         RETURN 'SMALL'; 
    END IF;
  END;
$$ LANGUAGE plpgsql;

SELECT ctysize(ROW(city,country,population)) FROM capitols LIMIT 3;
This could have been done in plain SQL using a CASE, but this does illustrate how to create a ROW from a set of fields.

There does not seem to be a way to pass a generic table-row as a parameter, so the above could not be written to categorize the populations of just any table, generically, that has a population field.

ROWTYPE
Before we move on to outputting rows, let's look at a syntax feature that can facilitate creating rows.  We can declare a variable, in the DECLARE section of the function, to be of a table's ROWTYPE.  It has attributes for each column of that table, which can then be assigned to.  They can be read, as well, though they will initially be NULL.
DECLARE
    ctyrec capitols%%ROWTYPE;
  BEGIN 


RETURNS tablename
This example illustrates using a table-name as the return type, meaning that the return value will match that row type.  If you are testing these on Rdbhost, you need to double up the '%', as the '%R' gets confused with a substitution token and errors about 'too few arguments'.
CREATE OR REPLACE FUNCTION newcty(name TEXT, nation TEXT, size INT)
 RETURNS capitols
AS $$
  DECLARE
    ctyrec capitols%ROWTYPE;
  BEGIN
    ctyrec.city := name;
    ctyrec.country := nation;
    ctyrec.population := size;
    RETURN ctyrec;
  END;
$$ LANGUAGE plpgsql;

SELECT * FROM newcty('caracas','venezuala',10000);
The example shows how to create a new record using values provided as arguments.  It could be used as input to an SQL insert, like:
INSERT INTO capitols SELECT * FROM newcty('caracas','venezuala',10000);
Again, this example is so lame that it could be replaced with a straight SQL query of similar complexity.  Good simple examples are hard to find.  I was disappointed to discover that the ROWTYPE declaration does not imply inclusion of constraints.  You can put values into the compound-variable that are not permissible in the table, and you won't learn until you attempt to insert it into the table.

RETURNS record
In the last code sample, we saw how a function can return a record by declaring the return type as the tablename, and using a record of that row-type as the return value.

An alternative method of returning a row value is to use OUT parameters.  The parameters represent columns of the resulting row.  This example uses only OUT parameters, though you can use IN (and INOUT) parameters in the same parameter list.  In this example, a row of 3 values is returned, and that row is identical to the above example.
CREATE OR REPLACE FUNCTION newctyrec( name TEXT, nation TEXT, pop INT,
                                      OUT TEXT, OUT TEXT, OUT INT )
 RETURNS record
AS $$
  BEGIN
    $4 = name;
    $5 = nation;
    $6 = pop;
  END;
$$ LANGUAGE plpgsql;

SELECT newctyrec('caracas','venezuala',1000);
The IN parameters are not declared as IN, since that is the default mode.  The OUT parameters are referenced by their number, as I did not give them names.  The 'record' return type represents whatever row shape is indicated by the OUT parameters.  Since the OUT parameters are not in fact, parameters at all, the confusion factor in this is high.  I recommend formatting your function definitions so that the OUT parameters are on their own line.

Forget any notion of reference variables from other languages, which they may sorta look like.  They are just a list of columns to return, confusingly appended to the parameter list.

The code sample above returns the following record.
(caracas,venezuala,1000)
There it is; not the prettiest syntax, but it works.

Thursday, December 23, 2010

PostgresOnline: Quick Guide to Writing PLPGSQL

Just this morning, I discovered a good short series on writing PL/pgSQL functions.  These guides are written as part of the Postgres Online Journal, by Regina Obe and Leo Hsu.  Aside from this short series (from back in 2008, but still very relevant), there is a lot of interesting stuff there.

The Quick Guide to Writing PL/pgSQL.
Part-1
Part-2
Part-3

They also produce a nice Cheat-sheet for PL/pgSQL.
Cheat-Sheet

Thursday, December 16, 2010

PL/pgSQL dabbling for free



If you do not have a PostgreSQL server available to dabble with, and you wish to do some learning by experimentation, free database accounts are available at Rdbhost.com.

These databases are accessible through a small variety of APIs, but the straightforward way for our purposes is to use the on-site admin tool, RdbAdmin.

Let's walk through the process of creating a new database on Rdbhost, and trying some simple PL/pgSQL functions there.

Visit http://www.rdbhost.com.  Provide your email address in the 'Make Me a Database' form, skim the terms of service, click the box acknowledging them, and submit.




Rdbhost will email you a registration letter, which you should get and read, and copy the password contained therein onto the clipboard.  Return to site, and login:


Upon your first login, the site will create your database, and show you your profile page, resembling this next image.  I recommend that, at this point, you change the password to something memorable to you, so that you can easily login in the future.:


Click the 'Rdbadmin' link to open the RdbAdmin app:



The 'SQL Command' button will open an edit box for the entry of SQL and PL/pgSQL commands:


Enter an SQL or PL/pgSQL query and click the 'Execute' button.  Results of your query will appear above the edit box:



If you enter a syntax error into your PL/pgSQL, PostgreSQL will issue an error message, and RdbAdmin will display it like this:


You will likely need data tables to experiment with before you go too far, and such tables can be created a couple of ways.  First, you can just write the raw CREATE TABLE SQL into the SQL edit field, or you can use the Create Table feature:


There are features to assist in creating views, schemata, and indexes as well, but none yet for creating functions or triggers.  Use the SQL edit box to create functions using the raw SQL; after all, we are here to exercise our PL/pgSQL writing abilities are we not?

Next week, I will show how to create and return a table record with a PL/pgSQL function.

Thursday, December 9, 2010

Our first Function: Simple Record Insertion

This initial post will discuss a simple function to insert a record into a table.  The table is a simple list of capitol cities, with a city name field, and a country name field.

Here is the function definition, and line by line explanation will follow:

CREATE OR REPLACE FUNCTION  addcity
  ( cty VARCHAR, cntry VARCHAR ) 
  RETURNS void
AS $$
BEGIN
  INSERT INTO capitols (city,country)
                VALUES (cty,cntry);
END;
$$ LANGUAGE plpgsql;

SELECT * FROM addcity('paris','france');

Lines 1,4,5,8,9 are pretty much boiler plate, and you will see a similar elements on each PL/pgSQL function you ever read or write.  The others are specific to the purpose of the function.  Now line by line:
  1. The Create or replace function statement creates the function, over-writing any previous function with that signature.  The name, param list, and return definition combine to create the signature.  addcity is the name of our function, as it adds a city to our table.
  2. Line 2 is the parameter list, listing names and types of parameters passed to the function.  The type is required, the name is not.  There are automatic variables named $1, $2, etc for the parameters.  Note that we name the parameters *differently* than the fields.   Where the string 'cty' is used, it refers to the first parameter; there is no way to override that, so if the first parameter was named 'city', it would mask the field with the same name, and make the field unreferenceable.  So use unique names for parameters.
  3. The return value must be specified; since we do not have a meaningful return value, we indicate so with the 'RETURN void' statement. Return values may be indicated with OUT parameters, but that is outside the scope of this discussion.
  4. The '$$' symbol is known as a dollar quote, and opens a quoted string.  The dollar quote can be any non-whitespace characters between '$' signs, and must match at the beginning and end of the string.
  5. Each PG/pgSQL block is bounded by 'BEGIN' and 'END' statements.
  6. These two lines are just SQL with parameters interpolated into it by name.  See note 2 for why parameters are named differently than the fields.
  7. This ends the body of the function
  8. The '$$' closes the definition string, and the remainder of the line tells PostgreSQL that the function is in PL/pgSQL language.
  9. The SELECT statement provides a context for calling the function.  Just calling the function by itself is a syntax error, so we provide a no-results SELECT to call from.
The SQL in line 6 uses variable names where substitution parameters would be acceptable.  If you were to, for example, try to use a variable for the table name, it would fail with an error.    Say for some reason you had multiple tables all with city and country fields, and you wanted this function to work on a table selected at the time of the call.  You would pass the table name as a parameter, and you would have to use dynamic commands, created on the fly, like:

CREATE OR REPLACE FUNCTION  addcity
  ( tblnm VARCHAR, cty VARCHAR, cntry VARCHAR ) 
  RETURNS void
 AS $$
 BEGIN
   EXECUTE 'INSERT INTO ' || tblnm || ' (city,country) VALUES (\'' 
                          || cty || '\',\'' || cntry || '\' ) ';
 END;
 $$ LANGUAGE plpgsql;

 SELECT * FROM addcity('capitols','paris','france');

  1. The SQL is constructed of parts concatenated together with the '||' concatenation operator to create an executable query; otherwise this example is like the preceding.
This dynamic execution is clunky and fragile and insecure. For example, escaping of values is not done for the query dynamically created,  so SQL injection attacks might be a threat.  Consider carefully how you use such an approach.

Next week, we will be back with an introduction to the RdbAdmin program, and a walk-through on how to experiment with PL/pgSQL in that software.

.

Sunday, December 5, 2010

Introduction

The principal language used within PostgreSQL, other than SQL itself, is PL/pgSQL. This language adds looping and conditionals, as well as exception handling, to PostgreSQL.

This language, handy though it is, doesn't seem to get a lot of discussion on web forums. I may be underestimating the coverage, as it seems many of the PostgreSQL superusers assume it is available, and speak of it as though it were plain old PostgreSQL. However that may be, I hope to be helpful in providing some entry level posts on how to program in PL/pgSQL.  My intention is that these posts will be 'cookbookish', with examples that work, and can be cut-and-pasted and then edited to suit your purposes.  I like to 'work from success', and create custom code by iteratively evolving working code.

PL/pgSQL is for writing functions; you cannot inline PL/pgSQL just anywhere.  That said, you can write a function in PL/pgSQL and call it immediately.  It becomes a permanent element of the database you are connected to, and can be called from other sessions later, until you explicity drop it.

Some presentation conventions:

Code samples will be in block-quotes, like the following example, with syntax highlighting:

DECLARE
    key TEXT;
    delta INTEGER;
BEGIN
    ...
    UPDATE mytab SET val = val + delta 
     WHERE id = key;
END;

SQL and PL/pgSQL language keywords will be in all caps. Table, schema, and column names will be lower case, which is recommended practice to avoid problems due to PostgreSQLs case normalizing.

All examples will have been tested, before posting, using an ordinary account at Rdbhost.com .

You may know PL/pgSQL better than I do, in which case your constructive feedback, via comments, will be welcome.

Edited to add cookbook and 'work from success' items.
Edited to add paragraph on functions