Asynchronous queries to PostgreSql database from the browser with node.js and socket.io

Normally we perform our database connection at server side with PHP and PDO for example. In this post I will show a simple technique to send queries to our database (PostgreSql in this example) asynchronously using node.js and socket.io.

The idea is pretty straightforward. We will send the SQL string and the values with a WebSocket and we will execute a callback in the client when the server (node.js script) fetches the recordset.

Our server:

var pg = require('pg');

var conString = "tcp://user:password@localhost:5432/db";
var client = new pg.Client(conString);
client.connect();

var io = require('socket.io').listen(8888);

io.sockets.on('connection', function (socket) {
    socket.on('sql', function (data) {
        var query = client.query(data.sql, data.values);
        query.on('row', function(row) {
            socket.emit('sql', row);
        });
    });
});

And our client:

<script src="http://localhost:8888/socket.io/socket.io.js"></script>
<script>
var socket = io.connect('http://vmserver:8888');

function sendSql(sql, values, cbk) {
    socket.emit('sql', { sql: sql, values : values});
    socket.on('sql', function(data){
        console.log(data);
    });
}
</script>    
<p>
<a href="#" onclick="sendSql('select * from users', [], function(data) {console.log(data);})">select * from users</a>
</p>
<p>
<a href="#" onclick="sendSql('select * from users where uid=$1', [4], function(data) {console.log(data);})">select * from users where uid=$1</a>
</p>

Simple, isn’t it?
You must take care if you use this script at production. Our database is exposed to raw SQL sent from the client. It’s a concept example. Maybe it would be better not to send the SQL. Store them into key-value table in the server and send only an ID from the browser.

What do you think?

Database abstraction layers in PHP. PDO versus DBAL

I normally use PDO in my PHP projects. I like it because it’s a PHP extension easy to use and shares the same interface between all databases. Normally I use PostgreSQL but if I change to mySql or Oracle I don’t need to use different functions to handle the database connections.

PHP has a great project called Doctrine2. Doctrine2 is a ORM and it uses its own database abstraction layer called DBAL. In fact DBAL isn’t a pure database abstraction layer. It’s built over PDO. It’s a set of PHP classes we can use that gives us features not available with ‘pure’ PDO. If we use Doctrine2 we’re using DBAL behind the scene, but we don’t need to use Doctrine2 to use DBAL. We can use DBAL as a database abstraction layer without any ORM. Obiously this extra PHP layer over our PDO extension needs to pay a fee. I will have a look to this fee in this post. I will take one of my old post about PDO and I will do the same with DBAL to see the performance differences. Let’s start:

The PDO version:

error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

$dbh = new PDO('pgsql:dbname=mydb;host=localhost', 'gonzalo', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$dbh->beginTransaction();

$smtp = $dbh->prepare('INSERT INTO test.tbl1 (id, field1) values (:ID, :FIELD1)');

for ($i=0; $i<1000; $i++) {
    $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
}

$dbh->commit();

$stmt = $dbh->prepare('SELECT * FROM test.tbl1 limit 10000');
$stmt->execute();

$i=0;
while ($row = $stmt->fetch()) {
	$i++;
}
echo '<h1>PDO</h1>';
echo "<strong>{$i} </strong>";

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

$dbh->beginTransaction();
$smtp = $dbh->prepare('delete from test.tbl1');
$smtp->execute();
$dbh->commit();

The DBAL version:

error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

use Doctrine\DBAL\DriverManager;

$connectionParams = array(
    'dbname'   => 'mydb',
    'user'     => 'gonzalo',
    'password' => 'password',
    'host'     => 'localhost',
    'driver'   => 'pdo_pgsql',
    );

$dbh = DriverManager::getConnection($connectionParams);

$dbh->beginTransaction();

$smtp = $dbh->prepare('INSERT INTO test.tbl1 (id, field1) values (:ID, :FIELD1)');

for ($i=0; $i<1000; $i++) {
    $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
}

$dbh->commit();

$stmt = $dbh->prepare('SELECT * FROM test.tbl1 limit 10000');
$stmt->execute();

$i=0;
while ($row = $stmt->fetch()) {
	$i++;
}
echo '<h1>DBAL</h1>';
echo "<strong>{$i} </strong>";

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

As we can see DBAL is slower than pure PDO (obiously). Anyway the most of the extra time of DBAL is the time we need to include php classes (remember PDO is a PHP extension and we dont need to include any file). If we take times excluding the include time, the memory usage is almost the same and the execution time a little slower.

Autoload for DBAL version:

spl_autoload_register(function ($class) {
        $class = str_replace('\\', '/', $class) . '.php';
        require_once($class);
    }
);

or hardcoded includes for this example

require_once('Doctrine/DBAL/Driver.php');
require_once('Doctrine/DBAL/Driver/Connection.php');
require_once('Doctrine/DBAL/Platforms/AbstractPlatform.php');
require_once('Doctrine/DBAL/Driver/Statement.php');

require_once('Doctrine/DBAL/DriverManager.php');
require_once('Doctrine/DBAL/Configuration.php');
require_once('Doctrine/Common/EventManager.php');
require_once('Doctrine/DBAL/Driver/PDOPgSql/Driver.php');
require_once('Doctrine/DBAL/Driver.php');
require_once('Doctrine/DBAL/Connection.php');
require_once('Doctrine/DBAL/Driver/Connection.php');
require_once('Doctrine/DBAL/Query/Expression/ExpressionBuilder.php');
require_once('Doctrine/DBAL/Platforms/PostgreSqlPlatform.php');
require_once('Doctrine/DBAL/Platforms/AbstractPlatform.php');
require_once('Doctrine/DBAL/Driver/PDOConnection.php');
require_once('Doctrine/DBAL/Driver/PDOStatement.php');
require_once('Doctrine/DBAL/Driver/Statement.php');
require_once('Doctrine/DBAL/Events.php');
require_once('Doctrine/DBAL/Statement.php');

Outcomes of the tests:

With pure PDO:

  • memory: 0.0044288635253906
  • seconds: 0.24748301506042

With DBAL and autoload:

  • memory: 0.97610473632812
  • seconds: 0.29042816162109

With DBAL and hardcoded requires:

  • memory: 0.97521591186523
  • seconds: 0.31192088127136

With DBAL bypassing the include part:

  • memory: 0.0099525451660156
  • seconds: 0.30333304405212

The fee we paid for using DBAL gives us some extra features. OK we don’t need DBAL to get those features. If we code a bit we can get them (remember DBAL is nothing but a PHP extra layer). But DBAL has a great interface a well documented. Now I’m going to list a few extra features from DBAL very interesting, at least for me:

Transactional mode

I really like it. It allows us to create scripts like that:

$dbh->transactional(function($conn) {
    $smtp = $conn->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

    for ($i=0; $i<1000; $i++) {
        $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
    }
});

A simple closure will make the code more concise and it will commit/rollback our transaction for us. In fact I borrowed this function in my PDO projects to use this interface. I love Open source.

Snippet from DBAL library:

    /**
     * Executes a function in a transaction.
     *
     * The function gets passed this Connection instance as an (optional) parameter.
     *
     * If an exception occurs during execution of the function or transaction commit,
     * the transaction is rolled back and the exception re-thrown.
     *
     * @param Closure $func The function to execute transactionally.
     */
    public function transactional(Closure $func)
    {
        $this->beginTransaction();
        try {
            $func($this);
            $this->commit();
        } catch (Exception $e) {
            $this->rollback();
            throw $e;
        }
    }

Types conversion

Really useful, at least for when I work with dates:

$date = new \DateTime("2011-03-05 14:00:21");
$stmt = $conn->prepare("SELECT * FROM articles WHERE publish_date > ?");
$stmt->bindValue(1, $date, "datetime");
$stmt->execute();

List of Parameters Conversion

It’s a cool feature too available in DBAL since Doctrine 2.1

$dbh->executeQuery('SELECT * FROM wf.tbl1 WHERE id IN (?)',
    array(array(1, 2, 3, 4, 5, 6)),
    array(\Doctrine\DBAL\Connection::PARAM_INT_ARRAY));

Bind parameters with IN clause with PDO is a bit ugly. We need to create a series of bind parameters depending on our list to map them within the SQL. It’s possible but DBAL interface is smarter.

Transaction Nesting

Another cool feature:

$dbh->beginTransaction();
try {
    $dbh->beginTransaction();
    try {
        $smtp = $dbh->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

        for ($i=0; $i<1000; $i++) {
            $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
        }

        } catch (Exception $e) {
            $dbh->rollback(); //transaction marked for rollback only
            throw $e;
        }
    $smtp = $dbh->prepare('INSERT INTO wf.tbl1 (id, field1) values (:ID, :FIELD1)');

    for ($i=0; $i<1000; $i++) {
        $smtp->execute(array('ID' => $i, 'FIELD1' => "field {$i}"));
    }

    $dbh->commit(); // real transaction committed
} catch (Exception $e) {
    $dbh->rollback(); // transaction rollback
    throw $e;
}

This piece of code with PDO will throw the following error:
There is already an active transaction
but it works with DBAL. If we need to do this kind of things with PDO we need to use savepoints and things like that. DBAL does the ugly part for us.

Performance analysis of Stored Procedures with PDO and PHP

Last week I had an interesting conversation on twitter about the usage of stored procedures in databases. Someone told stored procedure are evil. I’m not agree with that. Stored procedures are a great place to store business logic. In this post I’m going to test the performance of a small piece of code using stored procedures and using only PHP code.

Without stored procedures

// Without stored procedures
$time = microtime(TRUE);
$mem = memory_get_usage();

$dsn = 'pgsql:host=localhost;dbname=gonzalo;port=5432';
$user = 'user';
$password = 'password';
$conn = new PDO($dsn, $user, $password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$conn->beginTransaction();
$stmt = $conn->prepare('delete from web.tbltest');
$stmt->execute();

$stmt = $conn->prepare('INSERT INTO web.tbltest (field1) values (?)');
foreach (range(0,1000) as $i) {
    $stmt->execute(array($i));
}
$conn->commit();

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

With stored procedures:

// With stored procedures:
/*
CREATE OR REPLACE FUNCTION web.method1()
  RETURNS numeric AS
$BODY$
BEGIN
   DELETE FROM web.tbltest;
   FOR i IN 0..1000 LOOP
     INSERT INTO web.tbltest (field1) values (i);
   END LOOP;
   RETURN 1;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
*/
$time = microtime(TRUE);
$mem = memory_get_usage();

$dsn = 'pgsql:host=localhost;dbname=gonzalo;port=5432';
$user = 'user';
$password = 'password';
$conn = new PDO($dsn, $user, $password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$conn->beginTransaction();
$stmt = $conn->prepare('SELECT web.method1()');
$stmt->execute();
$stmt->setFetchMode(PDO::FETCH_ASSOC);
$out = $stmt->fetchAll();
$conn->commit();

print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));
without stored procedures with stored procedures
memory: 0.0023880004882812
seconds: 0.31109309196472
memory: 0.0020713806152344
seconds: 0.065021991729736

So my conclusion: Stored procedures are not evil. The performance is really good. I know maybe it can be a bit mess if we place business logic within database and outside database at the same time, but with a good design and architecture this problem is easy to solve. What do you think?

Performance analysis fetching data with PDO and PHP.

Fetching data from databases is a common operation in our work as developers. There are many drivers (normally I use PDO), but the usage of all of them are similar and switch from one to another is not difficult (they almost share the same interface). In this post I will focus on fetching data. Basically we’ve got two functions: fetch and fetchAll. I’ve created two examples. One with fetch and another one with fetchAll:

// Example with fetch
error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

$dbh = new PDO('pgsql:dbname=mydb;host=localhost', 'username', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$stmt = $dbh->prepare('SELECT * FROM tableName limit 10000');
$stmt->execute();

$i=0;
while ($row = $stmt->fetch()) {
	$i++;
}
echo '
<h1>fetch()</h1>
';
echo '
<strong>{$i} </strong>

';
print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));
// Example with fetchAll
error_reporting(-1);
$time = microtime(TRUE);
$mem = memory_get_usage();

$dbh = new PDO('pgsql:dbname=mydb;host=localhost', 'username', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$stmt = $dbh->prepare('SELECT * FROM tableName limit 10000');
$stmt->execute();

$i=0;
$data = $stmt->fetchAll();
foreach ($data as $row) {
	$i++;
}

echo '
<h1>fetchAll()</h1>
';
echo '
<strong>{$i}</strong>

';
print_r(array('memory' => (memory_get_usage() - $mem) / (1024 * 1024), 'seconds' => microtime(TRUE) - $time));

if we execute the test we obtain:

fetchAll: [memory] => 31.305999755859
fetch: [memory] => 0.002532958984375

OK. It’s obvious. If we approach to the data extraction with fetchAll method we will use more memory. That’s because we’re mapping the whole recorded to a variable ($data) at once. With the fetch loop we are mapping only on row per iteration. By the way if we change the fetch loop to:

$data = array();
while ($row = $stmt->fetch()) {
	$i++;
	$data[] = $row;
}

We will use almost the same amount of memory than the fetchAll method
[memory] => 31.267543792725

Conclusion:
Is it better fetch than fetchAll? The answer is simple: No. We only need to take care what are we doing and use the best solution that fix to our need. If we’re handling small recordset, they’re similar, but if we work with big ones we need to realize that the memory usage we are using changes drastically if we use one method or another.

Keep our PostgreSQL databases syncronized with PHP. Database version control.

Last October I attended to PHP Barcelona 2010. One of the talk I really wanted to see was “Database version control without pain”. In this talk Harrie Verveer showed us different tools to keep synchronized our databases. This is a really common problem working with databases. Source code is well covered with source control management tools. I normally use mercurial (hg) but with git, bazaar or svn we can cover all our needs within source code. We create source code at development server and push the changes to production. It’s really easy to keep synchronized all our code. But with databases it’s different. Maybe Oracle’s people are few steps above the rest and they have something similar than source code control for database schema’s natively in the database. If you add a new column to a table, your schema will be transformed into a new version. Oracle database is great. It has incredible features and really good performance, if you can afford the fee. In this post I will try to solve this problem with PostgreSQL. A really good database similar than Oracle and open source and free (as “free beer” also)

It’s a recurrent problem working with databases. We create database objects (tables, views, ..) in the development server and when our application is ready to go live we push the changes to production server. If we are smart developers we save all database scripts in a file and when we deploy them to production we execute the script. There are tools to do it like dbdeploy or even phing (You must have a look to Harrie’s presentation to see all the possibilities). The problem is that we must be very strict and we must create the script even when we alter a column in a bug fix. It’s a hard work and it will be worthwhile but we must be mindful it’s really easy and fast to alter a column with a IDE (pgadmin for example) and very easy to forget to save the diff file. If you work alone maybe you can afford it but if you work within a team is difficult to ensure everybody update the diff file by hand. The purpose of this library is to create the diff scrip automatically according to the current state of the database. Lets’s start.

As Harrie said there isn’t a silver bullet solution (I really went to his talk finding the silver bullet 🙂 ). My first attempt was to use Doctrine2. Doctrine2 has a great tool called dbal. I tried to use it but I faced with a big problem, at least for me. I like to organize my database objects (tables, views, …) within schemas. With Doctrine2’s dbal I have problems and it doesn’t work as I think it must work. It assumes all objects are in the default schema and it doesn’t add “[schema].” to the information schema queries.  Maybe is my fault but If want to use dbal I need to hack the code. Because of that I started to write pgdbsync.

It would be cool to create a multi-database library to synchronize our databases. But it’s a huge work for doing alone so I’ve focused only on PostgreSQL, because is the database I mainly use in my daily work.

The Idea is to create a command line tool to create the need script to keep synchronized two (or more databases). We must take care we’re speaking about database’s schema. Not database’s data.

First I create a ini file with the configuration of each database connection. The user we use to connect to the database must have access to data dictionary in our PostgreSQL database.

[devel]
TYPE = pgsql
HOST = development
PORT = 5432
DBNAME = developement
USER = user
PASSWORD = password

[prod1]
TYPE = pgsql
HOST = production
PORT = 5432
DBNAME = prod1
USER = user
PASSWORD = password

I’ve created a simple CLI script to use the library with getopt. You can see the script here.
The usage of the script is very simple. I have implemented three main actions: diff, summary and run.

  • diff: Calculates the needed script to keep synchronized the databases. Prints the script on the screen but it doesn’t executes anything.
  • summary: Shows a summary the differences. Not the full script. Useful to see the differences in a glance.
  • run: Calculates the diff function and executes it.

Let’s start. First we start with an empty database in Development and Production servers with a schema WEB in both sides (schema creation’s coverage is not supported by pgdbsync). Then we create a few objects on the development server.

-- TABLE TEST
CREATE TABLE WEB.TEST (
  TEST_NAME CHAR(30) NOT NULL,
  TEST_ID INTEGER NOT NULL,
  TEST_DATE TIMESTAMP NOT NULL
)
TABLESPACE WEB;
ALTER TABLE WEB.TEST ADD CONSTRAINT PK_TEST PRIMARY KEY (TEST_ID);

-- VIEW on TEST
CREATE VIEW WEB.testview(
  TEST_NAME,
  TEST_ID,
  TEST_DATE
) AS
SELECT *
FROM WEB.TEST
WHERE TEST_NAME LIKE 't%';

-- Function hello
CREATE OR REPLACE FUNCTION WEB.hello(item character varying)
  RETURNS character varying AS
$BODY$
DECLARE
BEGIN
   return "Hi " || item;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

Now our Development and Production servers are different. We need to execute the last script in Production, but instead of doing it we are going to check differences with our pgdbsync script.

The usage of pgdbsync command line script is the following one:

 -c [schema]
 -f [from database]
 -t [to database]
 -a [action: diff | summary | run]
./pgdbsync -s web -f devel -t prod -a summary
HOST : production :: prod1
--------------------------------------------
function
 create :: WEB.hello(varchar)
tables
 create :: WEB.test
view
 create :: WEB.testview

[OK]  end process

Here we can see our production server is different from the development one.

./pgdbsync -s wf -f devel -t prod -a diff
HOST : production :: prod1
--------------------------------------------
CREATE OR REPLACE FUNCTION web.hello(item character varying)
 RETURNS character varying
 LANGUAGE plpgsql
AS $function$
DECLARE
BEGIN
 return "Hi " || item;
END;
$function$

CREATE TABLE web.test(
 test_name character NOT NULL,
 test_id integer NOT NULL,
 test_date timestamp without time zone NOT NULL,
 CONSTRAINT pk_test PRIMARY KEY (test_date)
)
TABLESPACE web;
ALTER TABLE web.test OWNER TO user;

CREATE OR REPLACE VIEW web.testview AS
 SELECT test.test_name, test.test_id, test.test_date FROM web.test WHERE (test.test_name ~~ 't%'::text);;
ALTER TABLE web.testview OWNER TO user;
[OK]  end process

and finally

./pgdbsync -s web -f devel -t prod -a run
HOST : production :: prod1
----------------------------------

[OK]  end process

And that’s all. Our databases are synchronized. The library is not finished. Foreign keys are not supported yet, but it checks:

  • Tables
  • Constraints
  • Sequences
  • Views
  • Functions

To finish the demo we are going to drop the new objects at development server and we will run cli script again:

./pgdbsync -s wf -f devel -t prod1 -a diff

HOST : prododuction :: prod1
--------------------------------------------

drop function web.hello(varchar);

DROP TABLE web.test;

drop view web.testview;

[OK]  end process

And we run the script:

./pgdbsync -s wf -f devel -t prod1 -a run
HOST : production :: prod1
--------------------------------------------

[OK]  end process

Source code is available at Google, code here.

An important post I’ve read to understand to PostgreSQL’s information schema is this one from great Lorenzo Alberton. It really helped me building pgdbsync.

The Library is not finished yet and it may crashed in some cases (not all data-types are covered). I always check diff file before execute it. If you want to join me to develop the library, don’t hesitate to contact me :).

Database connection pooling with PHP and gearman

Handling Database connections with PHP is pretty straightforward. Just open database connection, perform the actions we need, and close it. There’s a little problem. We cannot create a pull of database connections. We need to create the connection in every request. Create and destroy, again and again. There are some third-party solutions like SQL relay or pgpool2 (if you use PostgreSQL like me). In this post I’m going to try to explain a personal experiment to create a connection pooling using gearman. Another purpose of this experiment is create a prepared statements’ cache not only for the current request but also for all ones. Let’s start.

This is an example of SELECT statement with PDO and PHP

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'pass');
$stmt = $dbh->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();

Basically the idea is to use a gearman worker to perform every database operations. As far as I known we cannot pass PDO instances from gearman worker to gearman client. Even with object serialization (PDO objects cannot be serialized). That’s a problem.

My idea is use the same interface than using PDO but let the database work to the worker and obtaining a connection id instead of a real PDO connection.

That’s is the configuration class. We can see It’s defined one database connection and two gearman servers at the same host:

class PoolConf
{
    const PG1 = 'PG1';
    static $DB = array(
        self::PG1 => array(
            'dsn'      => "pgsql:dbname=gonzalo;host=localhost",
            'username' => 'user',
            'password' => 'pass',
            'options'  => null),
    );

    static $SERVERS = array(
        array('127.0.0.1', 4730),
        array('127.0.0.1', 4731),
    );
}

We start the workers:

gearmand -d --log-file=/var/log/gearman --user=gonzalo -p=4730
gearmand -d --log-file=/var/log/gearman --user=gonzalo -p=4731

How many workers we need to start? Depends on your needs. We must realize gearman is not a pool. It’s a queue. But we can start as many servers as we want  (OK it’s depends on our RAM) and create a poll of queues. We need to remember that if we start only one gearman server we only can handle only one database operation each time (it’s a queue) and it will be a huge bottleneck if the application scales. So you need to assess your site and evaluate how many concurrent operations you normally have and start as many gearman server as you need.

Maybe is difficult to explain but the final outcome will be something like that:

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$sql = "SELECT * FROM TEST.TBL1";
$stmt = $conn->prepare($sql);

$stmt->execute();
$data = $stmt->fetchall();
echo "<p>count: " . count($data) . "</p>";

We must take into account that $stmt is not a “real” PHP statement. The real PHP statement is stored into a static variable within the worker. Our $stmt is an instance of Pool\Server\Stmt class. This class has some public methods with the same name than the real statement (because of that it behaves as a real statement), and internally those methods are calls to gearman worker. The same occurs with $conn variable. It’s not a real PDO connection. It’s am instance of Pool\Server\Connection Class.

// Pool/Server/Stmt.php
namespace Pool\Server;

class Stmt
{
    private $_smtpId = null;
    private $_cid    = null;
    private $_client = null;

    function __construct($stmtId, $cid, $client)
    {
        $this->_stmt = $stmtId;
        $this->_cid = $cid;
        $this->_client = $client;
    }

    public function execute($parameters=array())
    {
        $out = $this->_client->do('execute', serialize(array(
            'parameters' => $parameters,
            'stmt'       => $this->_stmt,
            )));
        $error = unserialize($out);
        if (is_a($error, '\Pool\Exception')) {
            throw $error;
        }
        $this->_stmt = $out;
        return $this;
    }

    public function fetchAll()
    {
        $data = $this->_client->do('fetchAll', serialize(array(
            'stmt' => $this->_stmt,
            )));
        return unserialize($data);
    }
}

The worker. The following code is an extract. You can see the full code here

# Create our worker object.
$worker= new GearmanWorker();
foreach (PoolConf::$SERVERS as $server) {
    $worker->addServer($server[0], $server[1]);
}

\Pool\Server::init();

$worker->addFunction('getConnection', 'getConnection');
$worker->addFunction('prepare', 'prepare');
$worker->addFunction('execute', 'execute');
$worker->addFunction('fetchAll', 'fetchAll');
$worker->addFunction('info', 'info');
$worker->addFunction('release', 'release');
$worker->addFunction('beginTransaction', 'beginTransaction');
$worker->addFunction('commit', 'commit');
$worker->addFunction('rollback', 'rollback');

while (1) {
    try {
        $ret = $worker->work();
        if ($worker->returnCode() != GEARMAN_SUCCESS) {
            break;
        }
    } catch (Exception $e) {
        echo $e->getMessage();
    }
}

function fetchAll($job)
{
    echo __function__."\n";
    $params = unserialize($job->workload());
    $stmtId = $params['stmt'];
    return serialize(\Pool\Server::fetchAll($stmtId));
}

function execute($job)
{
    echo __function__."\n";
    $params = unserialize($job->workload());
    $stmtId = $params['stmt'];
    $parameters = $params['parameters'];
    return \Pool\Server::execute($stmtId, $parameters);
}
...

The heart of the worker is \Pool\Server class. This class performs every real PDO operations and stores statements and connections into static private variables.

And now we can use the database pool reusing connections and prepared statements. You can see here a small performance test of reusing prepared statements in an older post.

I’ve also implemented a small error handling. Errors in the worker are serialized and thrown on the client simulating the normal operation of standard PDO usage.

Now a set of examples:

Simple queries. And a simple error handling:

include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$sql = "SELECT * FROM TEST.TBL1";
$stmt = $conn->prepare($sql);

$stmt->execute();
$data = $stmt->fetchall();
echo "<p>count: " . count($data) . "</p>";

try {
    $sql = "SELECT * TEST.NON_EXISTENT_TABLE";
    $stmt = $conn->prepare($sql);

    $stmt->execute();
    $data = $stmt->fetchall();
    echo "<p>count: " . count($data) . "</p>";

} catch (Exception $e) {
    echo "ERROR: " . $e->getMessage();
}

print_r(Client::singleton()->info(PoolConf::PG1));

Now with bind parameters:

<?php
include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$data = $conn->prepare("SELECT * FROM TEST.TBL1 WHERE SELECCION=:S")->execute(array('S' => 1))->fetchall();

echo count($data);

print_r(Client::singleton()->info(PoolConf::PG1));

And now a transaction:

<?php
include('../conf/PoolConf.php');
include('../lib/Pool/Client.php');
include('../lib/Pool/Server.php');
include('../lib/Pool/Exception.php');
include('../lib/Pool/Server/Connection.php');
include('../lib/Pool/Server/Stmt.php');

use Pool\Client;
$conn = Client::singleton()->getConnection(PoolConf::PG1);

$conn->beginTransaction();
$data = $conn->prepare("SELECT * FROM TEST.TBL1 WHERE SELECCION=:S")->execute(array('S' => 1))->fetchall();
$conn->rollback();

print_r(Client::singleton()->info(PoolConf::PG1));

Conclusion.

That’s a personal experiment. It works, indeed, but probably it’s crowed by bugs. You can use it in production if you are a brave developer :).

Source code.
The full sourcecode is available here at google code

Speed up PHP scripts with asynchronous database queries

That’s the situation. A web application with 4 heavy queries. Yes I know you can use a cache or things like that but imagine you must perform the four queries yes or yes. As I say before the four database queries are heavy ones. 2 seconds per one. Do it sequentially, that’s means our script will use at least 8 seconds (without adding the extra PHP time). Eight seconds are a huge time in a page load.

So here I will show up a technique to speed up our website using asynchronous calls.

One possible solution is fork each query into a separate thread but, does PHP support threads? The short is answer No. OK that’s not 100% true. We can use PCNTL functions but it’s a bit tricky for me to use them. I prefer to use this another solution to create fork and perform asynchronous operations.

The basic idea is isolate each query into a separate script.

// fork.php
$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'nov', 'nov');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$t = filter_input(INPUT_GET, 't', FILTER_SANITIZE_STRING);
switch ($t) {
    case 0;
    case 1;
    case 2;
       $sql = "select * from test.tbl1";
       break;
}
$stmt = $dbh->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();
echo json_encode($data);

And now we are going to use the curl‘s multi functions to execute the database queries at the same time:

// index.php
class Fork
{
    private $_handles = array();
    private $_mh      = array();

    function __construct()
    {
        $this->_mh = curl_multi_init();
    }

    function add($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 30);
        curl_multi_add_handle($this->_mh, $ch);
        $this->_handles[] = $ch;
        return $this;
    }

    function run()
    {
        $running=null;
        do {
            curl_multi_exec($this->_mh, $running);
            usleep (250000);
        } while ($running > 0);
        for($i=0; $i < count($this->_handles); $i++) {
            $out = curl_multi_getcontent($this->_handles[$i]);
            $data[$i] = json_decode($out);
            curl_multi_remove_handle($this->_mh, $this->_handles[$i]);
        }
        curl_multi_close($this->_mh);
        return $data;
    }
}

$fork = new Fork;

$output = $fork->add("http://localhost/fork.php?t=0")
    ->add("http://localhost/fork.php?t=1")
    ->add("http://localhost/fork.php?t=2")
    ->run();

echo "<pre>";
print_r($output);
echo "</pre>";

And that’s all. Four queries will be executed at the same time and our scrip will use 2 seconds (the slowest query) instead of  8 seconds. But there’s a little inconvenient within this solution. We need to open 4  connections instead of one because of the lack of connection pooling in PHP. Another problem with this technique appears if we need to debug our script. The debugger won’t debug our “forks” at the same time (we will need to debug then separately). Because of that we need to balance the usage of this technique or not, but it’s another weapon we have to improve our scripts.

Performance analysis using bind parameters with PDO and PHP.

Some months ago a work mate asked me for the differences between using bind variables versus executing the SQL statement directly as a string throughout a PDO connection.

Basically the work-flow of almost all database drivers is the same: Prepare statement, execute and fetch results. We have the following small example with a simple  update

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'pass');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$dbh->beginTransaction();
$stmt = $dbh->prepare("UPDATE test.tbl1 set field1=:F1 where id=1");
$stmt->execute(array('F1' => $field1));
$dbh->commit();

And we also can get the same result with the following code:

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'pass');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$dbh->beginTransaction();
$dbh->prepare("UPDATE test.tbl1 set field1='{$field1}' where id=1")->execute();
$dbh->commit();

What’s the best one? Both method work properly. The difference is how databases manage the operation internally. When we prepare a statement we are compiling the string into the database within the current connection. After that we can execute the statement and if our statement is ready to receive parameters we can bind those parameters with PHP values or variables. With this idea in mind we can guess that if we need to perform several executions of the same prepared statement is better to use bind parameters instead of compile and execute directly the string.
b
I have created a simple benchmark to show it.

<?php
error_reporting(-1);
function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'pass');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$time_start = microtime_float();
$dbh->beginTransaction();
$field1 = uniqid();
$stmt = $dbh->prepare("UPDATE test.tbl1 set field1=:F1 where id=1");
$stmt->execute(array('F1' => $field1));
$dbh->commit();
$time_end = microtime_float();
$time = round($time_end - $time_start, 4);
echo "<p>Single UPDATE with bind parameters: $time seconds</p>";

$time_start = microtime_float();
$dbh->beginTransaction();
$field1 = uniqid();
$stmt = $dbh->prepare("UPDATE test.tbl1 set field1='{$field1}' where id=1");
$stmt->execute();
$dbh->commit();
$time_end = microtime_float();
$time = round($time_end - $time_start, 4);
echo "<p>Single UPDATE without bind parameters: $time seconds</p>";

$time_start = microtime_float();
$dbh->beginTransaction();
$stmt = $dbh->prepare('UPDATE test.tbl1 set field1=:F1 where id=1');
foreach (range(1, 5000, 1) as $i) {
   $field1 = $i;
   $stmt->execute(array('F1' => $field1));
}
$dbh->commit();
$time_end = microtime_float();
$time = round($time_end - $time_start, 4);
echo "<p>Multiple UPDATE with bind parameters: $time seconds</p>";

$time_start = microtime_float();
$dbh->beginTransaction();
foreach (range(1, 5000, 1) as $i) {
	$stmt = $dbh->prepare("UPDATE test.tbl1 set field1='{$field1}' where id=1");
   $field1 = $i;
   $stmt->execute();
}
$dbh->commit();
$time_end = microtime_float();
$time = round($time_end - $time_start, 4);
echo "<p>Multiple UPDATE without bind parameters: $time seconds</p>";

The output of this benchmark is the following one:

Single UPDATE with bind parameters: 0.2623 seconds
Single UPDATE without bind parameters: 0.0195 seconds
Multiple UPDATE with bind parameters: 4.1123 seconds
Multiple UPDATE without bind parameters: 8.1732 seconds

As we can see in the output of the benchmark a single update is slightly faster without bind any parameter but if we need to execute the same update with different parameters the bind parameters way is significantly faster than parse+execute again and again.

There’s another benefit of using bind parameters. Databases normally have internally a cache system for our prepared statement. In theory they reuse the statements. The main problem for us (in our PHP world) is that normally we create a new connection to the database at the beginning of the script (or lazy connection) and close it at the end. We don’t have natively a connection pooling like J2EE. So I’m not 100% sure if using bind parameters helps to the database to reuse statements.

There’s another issue we must take into account. Without prepared statements the multiple update example may throw a database exception (depend on our database configuration). Too many open cursors in active transaction. That error appear because for our database each update is a different statement (5000 in the example) and with the another method there’s only one.