Blog Archives

Sending sockets from PostgreSQL triggers with Python

Picture this: We want to notify to one external service each time that one record is inserted in the database. We can find the place where the insert statement is done and create a TCP client there, but: What happens if the application that inserts the data within the database is a legacy application?, or maybe it is too hard to do?. If your database is PostgreSQL it’s pretty straightforward. With the “default” procedural language of PostgreSQL (pgplsql) we cannot do it, but PostgreSQL allows us to use more procedural languages than plpgsql, for example Python. With plpython we can use sockets in the same way than we use it within Python scripts. It’s very simple. Let me show you how to do it.

First we need to create one plpython with our TCP client

CREATE OR REPLACE FUNCTION dummy.sendsocket(msg character varying, host character varying, port integer)
  RETURNS integer AS
$BODY$
  import _socket
  try:
    s = _socket.socket(_socket.AF_INET, _socket.SOCK_STREAM)
    s.connect((host, port))
    s.sendall(msg)
    s.close()
    return 1
  except:
    return 0
$BODY$
  LANGUAGE plpython VOLATILE
  COST 100;
ALTER FUNCTION dummy.sendsocket(character varying, character varying, integer)
  OWNER TO username;

Now we create the trigger that use our socket client.

CREATE OR REPLACE FUNCTION dummy.myTriggerToSendSockets()
RETURNS trigger AS
$BODY$
   import json
   stmt = plpy.prepare("select dummy.sendSocket($1, $2, $3)", ["text", "text", "int"])
   rv = plpy.execute(stmt, [json.dumps(TD), "host", 26200])
$BODY$
LANGUAGE plpython VOLATILE
COST 100;

As you can see in my example we are sending all the record as a JSON string in the socket body.

And finally we attach the trigger to one table (or maybe we need to do it to more than one table)

CREATE TRIGGER myTrigger
  AFTER INSERT OR UPDATE OR DELETE
  ON dummy.myTable
  FOR EACH ROW
  EXECUTE PROCEDURE dummy.myTriggerToSendSockets();

And that’s all. Now we can use one simple TCP socket server to handle those requests. Let me show you different examples of TCP servers with different languages. As we can see all are different implementations of Reactor pattern. We can use, for example:

node.js:

var net = require('net');

var host = 'localhost';
var port = 26200;

var server = net.createServer(function (socket) {
    socket.on('data', function(buffer) {
        // do whatever that we want with buffer
    });
});

server.listen(port, host);

python (with Twisted):

from twisted.internet import reactor, protocol

HOST = 'localhost'
PORT = 26200

class MyServer(protocol.Protocol):
    def dataReceived(self, data):
        # do whatever that we want with data
        pass

class MyServerFactory(protocol.Factory):
    def buildProtocol(self, addr):
        return MyServer()

reactor.listenTCP(PORT, MyServerFactory(), interface=HOST)
reactor.run()

(I know that we can create the Python’s TCP server without Twisted, but if don’t use it maybe someone will angry with me. Probably he is angry right now because I put the node.js example first :))

php (with react):

<?php
include __DIR__ . '/vendor/autoload.php';

$host = 'localhost';
$port = 26200;

$loop   = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);

$socket->on('connection', function ($conn) {
    $conn->on('data', function ($data) {
        // do whatever we want with data
        }
    );
});

$socket->listen($port, $host);
$loop->run();

You also can use xinet.d to handle the TCP inbound connections.

How to protect from SQL Injection with PHP

Security is a part of our work as developers. We need to ensure our applications against malicious attacks. SQL Injection is one of the most common possible attacks. Basically SQL Injection is one kind of attack that happens when someone injects SQL statements in our application. You can find a lot of info about SQL Injection attack. Basically you need to follow the security golden rule

Filter input
Escape output

If you work with PHP problably you work with PDO Database abstraction layer.
Let’s prepare our database for the examples (I work with PostgreSQL):

CREATE TABLE users
(
  uid integer NOT NULL,
  name character varying(50),
  surname character varying(50),
  CONSTRAINT pk_users PRIMARY KEY (uid)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE users OWNER TO gonzalo;

INSERT INTO users(uid, name, surname) VALUES (0, 'Gonzalo', 'Ayuso');
INSERT INTO users(uid, name, surname) VALUES (1, 'Hans', 'Solo');
INSERT INTO users(uid, name, surname) VALUES (2, 'Luke', 'Skywalker');

OK our database is ready. Now let create a simple query

$dbh = new PDO('pgsql:dbname=db;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$stmt = $dbh->prepare('select uid, name, surname from users where uid=:ID');
$stmt->execute(array('ID' => 0));
$data = $stmt->fetchAll();

It works. We are using bind parameters, so we need to prepare one statement, execute and fetch the recordset. The use of prepared statements is strongly recommended. We can also use query() function:

$dbh = new PDO('pgsql:dbname=db;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$uid = 0;
$stmt = $dbh->query("select uid, name, surname from users where uid={$uid}");
$data = $stmt->fetchAll();

But what happens if $id came from the request and it’s not propertly escaped

$dbh = new PDO('pgsql:dbname=db;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$uid = "0; drop table users;";
$stmt = $dbh->query("select uid, name, surname from users where uid={$uid}");
$data = $stmt->fetchAll();

basically nothing: SQLSTATE[42601]: Syntax error. That’s because is not allowed to use two prepared statements in a single statement.

If we use an insert:

$dbh = new PDO('pgsql:dbname=db;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$uid     = 20;
$name    = 'Gonzalo';
$surname = "Ayuso'); drop table users; select 1 where ('1' = '1";

$count = $dbh->exec("INSERT INTO users(uid, name, surname) VALUES ({$uid}, '{$name}', '{$surname}')");

Now we have a problem. Our user table will be deleted. Why? That’s because of the user we are using to connect to the database. It’s important especially at production servers.
It’s very important not to use a database superuser in production. Superusers are very comfortable in our development servers, because you don’t need to grant privileges to every tables but if you forget this issue in production you could have Sql-Injection problems. The solution is very simple:

GRANT SELECT, UPDATE, INSERT, DELETE ON TABLE users TO gonzalo2;

And now

$dbh = new PDO('pgsql:dbname=db;host=localhost', 'gonzalo2', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$uid     = 20;
$name    = 'Gonzalo';
$surname = "Ayuso'); drop table users; select 1 where ('1' = '1";

$count = $dbh->exec("INSERT INTO users(uid, name, surname) VALUES ({$uid}, '{$name}', '{$surname}')");

Now we are safe, at least with this possible attack.
Sumarizing:

  • Filter input
  • Escape output
  • Take care about the database users. Don’t use one user that it allowed to perform “not-allowed” operations within our application. It sounds like a pun but is important.

Database abstraction layers in PHP. PDO versus DBAL

I normally use PDO in my PHP projects. I like it because it’s a PHP extension easy to use and shares the same interface between all databases. Normally I use PostgreSQL but if I change to mySql or Oracle I don’t need to use different functions to handle the database connections.

PHP has a great project called Doctrine2. Doctrine2 is a ORM and it uses its own database abstraction layer called DBAL. In fact DBAL isn’t a pure database abstraction layer. It’s built over PDO. It’s a set of PHP classes we can use that gives us features not available with ‘pure’ PDO. If we use Doctrine2 we’re using DBAL behind the scene, but we don’t need to use Doctrine2 to use DBAL. We can use DBAL as a database abstraction layer without any ORM. Obiously this extra PHP layer over our PDO extension needs to pay a fee. I will have a look to this fee in this post. I will take one of my old post about PDO and I will do the same with DBAL to see the performance differences. Let’s start:

The PDO version:

error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

$dbh = new PDO('pgsql:dbname=mydb;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$dbh->beginTransaction();

$smtp = $dbh->prepare('INSERT INTO test.tbl1 (id, field1) values (:ID, :FIELD1)');

for ($i=0; $i<1000; $i++) {
    $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
}

$dbh->commit();

$stmt = $dbh->prepare('SELECT * FROM test.tbl1 limit 10000');
$stmt->execute();

$i=0;
while ($row = $stmt->fetch()) {
	$i++;
}
echo '<h1>PDO</h1>';
echo "<strong>{$i} </strong>";

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

$dbh->beginTransaction();
$smtp = $dbh->prepare('delete from test.tbl1');
$smtp->execute();
$dbh->commit();

The DBAL version:

error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

use Doctrine\DBAL\DriverManager;

$connectionParams = array(
    'dbname'   => 'mydb',
    'user'     => 'gonzalo',
    'password' => 'password',
    'host'     => 'localhost',
    'driver'   => 'pdo_pgsql',
    );

$dbh = DriverManager::getConnection($connectionParams);

$dbh->beginTransaction();

$smtp = $dbh->prepare('INSERT INTO test.tbl1 (id, field1) values (:ID, :FIELD1)');

for ($i=0; $i<1000; $i++) {
    $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
}

$dbh->commit();

$stmt = $dbh->prepare('SELECT * FROM test.tbl1 limit 10000');
$stmt->execute();

$i=0;
while ($row = $stmt->fetch()) {
	$i++;
}
echo '<h1>DBAL</h1>';
echo "<strong>{$i} </strong>";

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

As we can see DBAL is slower than pure PDO (obiously). Anyway the most of the extra time of DBAL is the time we need to include php classes (remember PDO is a PHP extension and we dont need to include any file). If we take times excluding the include time, the memory usage is almost the same and the execution time a little slower.

Autoload for DBAL version:

spl_autoload_register(function ($class) {
        $class = str_replace('\\', '/', $class) . '.php';
        require_once($class);
    }
);

or hardcoded includes for this example

require_once('Doctrine/DBAL/Driver.php');
require_once('Doctrine/DBAL/Driver/Connection.php');
require_once('Doctrine/DBAL/Platforms/AbstractPlatform.php');
require_once('Doctrine/DBAL/Driver/Statement.php');

require_once('Doctrine/DBAL/DriverManager.php');
require_once('Doctrine/DBAL/Configuration.php');
require_once('Doctrine/Common/EventManager.php');
require_once('Doctrine/DBAL/Driver/PDOPgSql/Driver.php');
require_once('Doctrine/DBAL/Driver.php');
require_once('Doctrine/DBAL/Connection.php');
require_once('Doctrine/DBAL/Driver/Connection.php');
require_once('Doctrine/DBAL/Query/Expression/ExpressionBuilder.php');
require_once('Doctrine/DBAL/Platforms/PostgreSqlPlatform.php');
require_once('Doctrine/DBAL/Platforms/AbstractPlatform.php');
require_once('Doctrine/DBAL/Driver/PDOConnection.php');
require_once('Doctrine/DBAL/Driver/PDOStatement.php');
require_once('Doctrine/DBAL/Driver/Statement.php');
require_once('Doctrine/DBAL/Events.php');
require_once('Doctrine/DBAL/Statement.php');

Outcomes of the tests:

With pure PDO:

  • memory: 0.0044288635253906
  • seconds: 0.24748301506042

With DBAL and autoload:

  • memory: 0.97610473632812
  • seconds: 0.29042816162109

With DBAL and hardcoded requires:

  • memory: 0.97521591186523
  • seconds: 0.31192088127136

With DBAL bypassing the include part:

  • memory: 0.0099525451660156
  • seconds: 0.30333304405212

The fee we paid for using DBAL gives us some extra features. OK we don’t need DBAL to get those features. If we code a bit we can get them (remember DBAL is nothing but a PHP extra layer). But DBAL has a great interface a well documented. Now I’m going to list a few extra features from DBAL very interesting, at least for me:

Transactional mode

I really like it. It allows us to create scripts like that:

$dbh->transactional(function($conn) {
    $smtp = $conn->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

    for ($i=0; $i<1000; $i++) {
        $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
    }
});

A simple closure will make the code more concise and it will commit/rollback our transaction for us. In fact I borrowed this function in my PDO projects to use this interface. I love Open source.

Snippet from DBAL library:

    /**
     * Executes a function in a transaction.
     *
     * The function gets passed this Connection instance as an (optional) parameter.
     *
     * If an exception occurs during execution of the function or transaction commit,
     * the transaction is rolled back and the exception re-thrown.
     *
     * @param Closure $func The function to execute transactionally.
     */
    public function transactional(Closure $func)
    {
        $this->beginTransaction();
        try {
            $func($this);
            $this->commit();
        } catch (Exception $e) {
            $this->rollback();
            throw $e;
        }
    }

Types conversion

Really useful, at least for when I work with dates:

$date = new \DateTime("2011-03-05 14:00:21");
$stmt = $conn->prepare("SELECT * FROM articles WHERE publish_date > ?");
$stmt->bindValue(1, $date, "datetime");
$stmt->execute();

List of Parameters Conversion

It’s a cool feature too available in DBAL since Doctrine 2.1

$dbh->executeQuery('SELECT * FROM wf.tbl1 WHERE id IN (?)',
    array(array(1, 2, 3, 4, 5, 6)),
    array(\Doctrine\DBAL\Connection::PARAM_INT_ARRAY));

Bind parameters with IN clause with PDO is a bit ugly. We need to create a series of bind parameters depending on our list to map them within the SQL. It’s possible but DBAL interface is smarter.

Transaction Nesting

Another cool feature:

$dbh->beginTransaction();
try {
    $dbh->beginTransaction();
    try {
        $smtp = $dbh->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

        for ($i=0; $i<1000; $i++) {
            $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
        }

        } catch (Exception $e) {
            $dbh->rollback(); //transaction marked for rollback only
            throw $e;
        }
    $smtp = $dbh->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

    for ($i=0; $i<1000; $i++) {
        $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
    }

    $dbh->commit(); // real transaction committed
} catch (Exception $e) {
    $dbh->rollback(); // transaction rollback
    throw $e;
}

This piece of code with PDO will throw the following error:
There is already an active transaction
but it works with DBAL. If we need to do this kind of things with PDO we need to use savepoints and things like that. DBAL does the ugly part for us.

Performance analysis of Stored Procedures with PDO and PHP

Last week I had an interesting conversation on twitter about the usage of stored procedures in databases. Someone told stored procedure are evil. I’m not agree with that. Stored procedures are a great place to store business logic. In this post I’m going to test the performance of a small piece of code using stored procedures and using only PHP code.

Without stored procedures

// Without stored procedures
$time = microtime(TRUE);
$mem = memory_get_usage();

$dsn = 'pgsql:host=localhost;dbname=gonzalo;port=5432';
$user = 'user';
$password = 'password';
$conn = new PDO($dsn, $user, $password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$conn->beginTransaction();
$stmt = $conn->prepare('delete from web.tbltest');
$stmt->execute();

$stmt = $conn->prepare('INSERT INTO web.tbltest (field1) values (?)');
foreach (range(0,1000) as $i) {
    $stmt->execute(array($i));
}
$conn->commit();

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

With stored procedures:

// With stored procedures:
/*
CREATE OR REPLACE FUNCTION web.method1()
  RETURNS numeric AS
$BODY$
BEGIN
   DELETE FROM web.tbltest;
   FOR i IN 0..1000 LOOP
     INSERT INTO web.tbltest (field1) values (i);
   END LOOP;
   RETURN 1;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
*/
$time = microtime(TRUE);
$mem = memory_get_usage();

$dsn = 'pgsql:host=localhost;dbname=gonzalo;port=5432';
$user = 'user';
$password = 'password';
$conn = new PDO($dsn, $user, $password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$conn->beginTransaction();
$stmt = $conn->prepare('SELECT web.method1()');
$stmt->execute();
$stmt->setFetchMode(PDO::FETCH_ASSOC);
$out = $stmt->fetchAll();
$conn->commit();

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));
without stored procedures with stored procedures
memory: 0.0023880004882812
seconds: 0.31109309196472
memory: 0.0020713806152344
seconds: 0.065021991729736

So my conclusion: Stored procedures are not evil. The performance is really good. I know maybe it can be a bit mess if we place business logic within database and outside database at the same time, but with a good design and architecture this problem is easy to solve. What do you think?

Follow

Get every new post delivered to your Inbox.

Join 989 other followers