Thursday, January 27, 2011

More elegant forms of returning records.

Like the blog subtitle says, I learn PL/pgSQL as I go, and share the lessons with you.  A couple of weeks ago, I posted on returning a few records using OUT parameters.  I used that method because it is what I knew.  A reader, Tzvi R., kindly commented with a couple of ways to return records more gracefully.  In this post, I am giving those methods more prominence than the comment provided.

SETOF RECORD
Here the record is defined within the function, keeping the parameter list a clean list of inputs.  A limitation of this approach is that since the record definition is hidden from the caller, the column list must be provided as part of the retrieving SELECT.
CREATE OR REPLACE FUNCTION squares(ct INT)
   RETURNS SETOF RECORD
AS $$
DECLARE
  v_rec RECORD;
BEGIN
  FOR i IN 0..ct-1 LOOP
    SELECT i, POWER(i,2)::INT INTO v_rec;
    RETURN NEXT v_rec;
  END LOOP;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE;

SELECT * FROM squares(5) AS (A INT, B INT);

Line 5 creates a variable, of indeterminate shape, for the output record.  Line 8 creates the record using a SELECT .. INTO, and inserts it into the variable, giving the variable its shape.

Line 9 is a RETURN NEXT.., which adds the record to the list of records to eventually return.  This form of RETURN does not terminate the function, but rather augments the return value, which gets returned whenever the function returns.  In this example, we just passively fall out the bottom of the function, but a bare RETURN can be used to explicitly exit the function.  The bare RETURN does not interfere with the returning of the built-up result set.

Predefined Types
If you define the record type outside of the function, and use it in the function definition, the caller no longer needs a column list, but determines columns from the record type.
CREATE TYPE square_type AS (a INT, b INT);

CREATE OR REPLACE FUNCTION squares(ct INT)
   RETURNS SETOF square_type
AS $$
DECLARE
  v_rec square_type;
BEGIN
  FOR i IN 0..ct-1 LOOP
    SELECT i, POWER(i,2)::INT INTO v_rec;
    RETURN NEXT v_rec;
  END LOOP;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE;

SELECT * FROM squares(5);

Line 1 is the type definition, defining 'square_type' as a pair of INT fields.  Lines 4 and 7 define the return type and the variable type as 'square_type'.  Because the caller can see the return type of the function from the function definition, it knows to treat the return values as pairs of INT columns, and Line 17 can be appealingly spare.

In my opinion, this is the most elegant of the three forms; it does require an additional line to define the type, and an additional addition to the namespace, but it reads easily.  The next best is the OUT parameter method. Having to specify the row shape in the calling SELECT, as in the first example above, is just too awkward.

One caution is that there is no CREATE OR REPLACE TYPE <typename>, so you need to use a DROP TYPE <typename> if you wish to repeat the execution of the above quoted code block complete.   In another week or two, I plan to show how to use Exception handling to gracefully absorb the errors produced by redundantly creating a type.

Thank you, Tzvi, for your assistance.

2 Feb changed function types to IMMUTABLE

Thursday, January 20, 2011

Beyond Little Bobby Tables

The xkcd comic about 'little bobby tables' is a classic, widely known and often quoted:



the strip title is 'Exploits of a Mom', and the punch-line is 'I hope you've learned to sanitize your database inputs'.

The comic gets quoted and linked often, generally to illustrate the point about sanitizing database inputs.  While the verb sanitize is a terse comic-scale expression to represent avoiding SQL injection attacks, it is only part of securing the database.   To sanitize is to remove the dangerous parts of the input.  Sanitizing the input in the xkcd example would presumably involve escaping the singe quote, and would have protected the database.

However, there are other ways to accomplish the desired safety.

  • If you use parameterized queries, the inputs are quoted for you.  
  • If you embed the query in a procedure, with typed parameters, the parameters are firstly forced to match the parameter type, and are then safely interpolated into the embedded query.
  • Using a limited role to run the query, a role that does not have permissions to do more hazardous operations than are required for the query.  For example a role sufficient to the xkcd example would require INSERT privilege, but not DROP nor DELETE nor CREATE.
These three can all be used together, providing multi-layered security.  Using procedures with typed parameters will neutralize SQL injections which are matched to parameter types other than char, varchar, text, and blob.   The SQL injections which map to valid textual types reach the query, but are implicitly quoted and escaped.  Should someone defeat the escaping mechanism somehow (there have been a few exploits in PostgreSQL over the years, not necessarily in the escaping), then the role privilege limits would prevent the most severe of data losses.  


Thursday, January 13, 2011

Returning a few rows

This week, we will look at returning a few rows from a function. Database programming typically involves getting sets of records, so this output ability is natural to look for, in a programming language that runs internally to a database engine.

Here's an example:

CREATE OR REPLACE FUNCTION squares(IN ct INTEGER,
                                   OUT INTEGER, OUT INTEGER)
    RETURNS SETOF RECORD
AS $$
    BEGIN
         FOR i IN 0..ct-1 LOOP
             RETURN QUERY SELECT i, POWER(i,2)::INTEGER; 
        END LOOP;
    END;
$$ LANGUAGE 'plpgsql';

SELECT * FROM squares(3);

Record Type
The third line declares the return type to be SETOF RECORD, indicating that it will return a variable number of elements whose type is RECORD, to be specified elsewhere.  The 'elsewhere' is the set of OUT parameters, which indicate that the RECORD is two integers.  The OUT variables are shown in line 2.  Line 1 names the function, and declares one input parameter to take a count limit.

Lines 6 through 8 define a loop, which iterates over values from 0 to the count limit.  Each iteration runs line 7, which we will discuss in the next paragraph.  Lines 4,5,9,10 are just boilerplate that is found in every PL/pgSQL function. Line 12 provides a context in which to run the function and display the results.

Get Record from SELECT
Line 7 is:  RETURN QUERY SELECT i, POWER(i,2):INTEGER.  The RETURN tells us this line is part of the results.   The QUERY catches the results of the subsequent SELECT, as an ordinary SELECT statement is not permitted in PL/pgSQL.  The SELECTs in PL/pgSQL must be qualified somehow, with an INTO clause or a QUERY as above.  The rest of the line just creates a two part record of the counter index and its square.  The ::INTEGER cast avoids a type error; as POWER is a NUMERIC returning function.

Starting in version 8.4, there is a nice TABLE syntax, for returning records and I will discuss that in a future post.  The Rdbhost server does not support 8.4 at present, but is scheduled for an upgrade.

Multiple Records
If the embedded query returns multiple records, that works too, as all such records get included; witness this variation to the example, which provides both squares and cubes:

BEGIN
  FOR i IN 0..ct-1 LOOP
    RETURN QUERY       SELECT i, POWER(i,2)::INTEGER
                 UNION SELECT i, POWER(i,3)::INTEGER; 
  END LOOP;
END;

RETURN NEXT
An alternate way to return records is with the RETURN NEXT syntax.  This example names the two output parameters, assigns to them, and then adds their aggregate to the return set.

DROP FUNCTION squares(INTEGER);
CREATE OR REPLACE FUNCTION squares(IN ct INTEGER,
                                   OUT a INTEGER, OUT b INTEGER)
    RETURNS SETOF RECORD
AS $$
    BEGIN
         FOR i IN 0..ct-1 LOOP
             a := i;
             b := POWER(i,2)::INTEGER; 
             RETURN NEXT;
        END LOOP;
    END;
$$ LANGUAGE 'plpgsql';

SELECT * FROM squares(3);
Line 3 names the output parameters.  Lines 8 and 9 put values into those named output parameters.  Line 10 RETURN NEXT creates a record from the above parameters, and adds it to the result set.  The columns created are named after the parameters, so you might want names longer and more informative than 'a' and 'b'.

Later PostgreSQL versions support more elegant syntax for returning records, but the above patterns do the job, and can still be used in current versions.

That's enough for this week.  This week, we looked at getting multiple records out of the function.  Next week, we will look at getting multiple rows into the function to work with.

Thursday, January 6, 2011

What's next..

PL/pgSQL is basically a language for writing functions. You cannot just inline a PL/pgSQL statement into a query wherever.  You define functions, and then use those functions to enhance queries, or to perform automated actions as triggers.

I will not be covering PL/pgSQL comprehensively, but in terms of what I would find useful to have.  Here is a short list of stuff I hope to cover:

  1. How to structure a function; what are the minimum set of parts.
  2. How to update or insert a record with a PL function call.
  3. How to return a simple calculated value
  4. How to return a few selected records, with and without enhancing calculations
  5. How to return generated data as if they were a set of records
  6. How to return summaries of selected data
  7. How to write adaptive functions, that change their return sets based on what tables they are used on.
  8. How to create a trigger from a function.

I hope to write a post a week, published on Thursday, but I'm only 4 or 5 weeks in, and am already struggling to keep up, using this outline as a stop-gap post for this week.   I hope to have something more substantial for next week, and stay a post or two ahead; we will see how it goes.