Speed up PHP scripts with asynchronous database queries


That’s the situation. A web application with 4 heavy queries. Yes I know you can use a cache or things like that but imagine you must perform the four queries yes or yes. As I say before the four database queries are heavy ones. 2 seconds per one. Do it sequentially, that’s means our script will use at least 8 seconds (without adding the extra PHP time). Eight seconds are a huge time in a page load.

So here I will show up a technique to speed up our website using asynchronous calls.

One possible solution is fork each query into a separate thread but, does PHP support threads? The short is answer No. OK that’s not 100% true. We can use PCNTL functions but it’s a bit tricky for me to use them. I prefer to use this another solution to create fork and perform asynchronous operations.

The basic idea is isolate each query into a separate script.

// fork.php
$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'nov', 'nov');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$t = filter_input(INPUT_GET, 't', FILTER_SANITIZE_STRING);
switch ($t) {
    case 0;
    case 1;
    case 2;
       $sql = "select * from test.tbl1";
       break;
}
$stmt = $dbh->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();
echo json_encode($data);

And now we are going to use the curl‘s multi functions to execute the database queries at the same time:

// index.php
class Fork
{
    private $_handles = array();
    private $_mh      = array();

    function __construct()
    {
        $this->_mh = curl_multi_init();
    }

    function add($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 30);
        curl_multi_add_handle($this->_mh, $ch);
        $this->_handles[] = $ch;
        return $this;
    }

    function run()
    {
        $running=null;
        do {
            curl_multi_exec($this->_mh, $running);
            usleep (250000);
        } while ($running > 0);
        for($i=0; $i < count($this->_handles); $i++) {
            $out = curl_multi_getcontent($this->_handles[$i]);
            $data[$i] = json_decode($out);
            curl_multi_remove_handle($this->_mh, $this->_handles[$i]);
        }
        curl_multi_close($this->_mh);
        return $data;
    }
}

$fork = new Fork;

$output = $fork->add("http://localhost/fork.php?t=0")
    ->add("http://localhost/fork.php?t=1")
    ->add("http://localhost/fork.php?t=2")
    ->run();

echo "<pre>";
print_r($output);
echo "</pre>";

And that’s all. Four queries will be executed at the same time and our scrip will use 2 seconds (the slowest query) instead of  8 seconds. But there’s a little inconvenient within this solution. We need to open 4  connections instead of one because of the lack of connection pooling in PHP. Another problem with this technique appears if we need to debug our script. The debugger won’t debug our “forks” at the same time (we will need to debug then separately). Because of that we need to balance the usage of this technique or not, but it’s another weapon we have to improve our scripts.

About these ads

About Gonzalo Ayuso

Web Architect specialized in Open Source technologies. PHP, Python, JQuery, Dojo, PostgreSQL, CouchDB and node.js but always learning.

Posted on October 11, 2010, in databases, php, PostgreSQL, Technology, tips. Bookmark the permalink. 26 Comments.

  1. Very clever move. IMHO it sounds like a good approach for database tasks/jobs (e.g. backups, purging) instead of simple transactions.

    Regards,
    George

    • yes but If those jobs/tasks are not 100% necessary within the execution of the script (in the user point of view) it’s better to put them off it. I have a new post (in my mind yet :) ) about those kind of things.

  2. Nice trick!

    However there’s an error in the code on line 31.

  3. It feels like you should optimize your queries, two seconds is a long time for a single query. Another thing I’m wondering about is: do you really need the return value of those scripts? If you don’t you could write the queries in a file and execute them later and / or asynchronous in another way.

    • 1) It depends on the database and the query you are executing. There are some datamining queries may use seconds. Even optimized ones. It also depend on the database location. It’s not the same when you are retreiving data from localhost or you are doing it from a database in another location (we need consider network speed). Of course this method is not good for fast queries (it can be even worst).

      2) I return data in those scripts because I need the information online. Imagine I want to populate a datagrid with this information when the user clicks on a button. We also can use another comunication technique instead of json encode/decode. We can use memcached (write en the fork and read in the main script). If I do what you said and I execute the query offline (save them and execute later) I need an event or something like that to populate the datagrid. If we are speaking about it we are speaking about comet or webworkers. That’s another approach but definetly a more complex solution.

  4. Well,

    That’s was a smart solution. I see the problem of not being able to use a pool of connection to manage them. But my concern is about security.

    Do you use some kind of hash through the GET parameter or some other kind of security measure?

    Cheers,
    Lucas

    • The lack of a real pool in PHP is a problem indeed. You can use SQL Relay but I don’t really like it.

      BTW your security appreciation is great. To solve the issue I like to use a separate virtualhost for “forks” only available at localhost. That’s means only server can reach to localhost. With this solution, I trust the localhost connections(server clients with curl) and client (browser clients) never reach to localhost. I haven’t add this to the post becase my aim was to show only the idea of threads with curl.

      • That`s a good solution, mate.

        I would also use, just for the precaution, a hash or a number combination (with digit verification) that both knew, so the person would need to have access to both sources to be a threat.

  5. I am curious as to why you created your own approach here? Things like Gearman can perform far better optimization for things like this.

    For instance, if you are looking just to offload different types of SQL selects or anything else, have a worker pool that does them and send them through asynchronously.

    This way you are not using cURL and having an additional http request which could eventually become a bottleneck in a large enough site.

    • is it possible to do it with gearman? As far as I known with gearman I can enqueue jobs. I use gearman when I want to execute jobs offline (like mailing, logs …). But may I execute several jobs at the same time (not with a queue) as we do with curl-multi funtions? If it’s possible it will be great for me.

  6. Very clever, but hard to implement and debug. Especially if you user MVC or other patterns. In enterprise world to speed up the DB job you can use a technique called CQRS. The idea is to denormalize your DB so you never ever need to run 3 complicated long lasting queries.
    back to your article, as I said it is very interesting idea.

    • Yes. Probably the main problem is debugging. I solve it with a simple trick. I use a flag in the development server to perform queries online instead of using curl. It’s not the same but it allows me to debug them.

  7. Thank you for the information provided….

  8. If you are willing to drop PDO support, PHP postgresql library support asynchronous queries since at least 2002:

    http://www.php.net/manual/en/function.pg-send-query.php

  9. Async is great but Curl is still over head.

    If you want native Async calls to MySQl with PHP and PDO you should instead look into the new PHP native mysql client for php 5.3 – mysqlnd.

    http://blog.ulf-wendel.de/?p=201

  10. Very good code. Performance is always a great concern for majority of our projects. asynchronous calls is one good way to speed up things. Thanks for the post.

  11. Here is another one solution for parallel processing without redundant HTTP requests:

    http://php.net/manual/en/function.proc-open.php

    see comments for code samples

    • Gonzalo Ayuso

      Yes. But in fact the motivation of the example of the post was to change one implementation that I have in one proyect with proc-open. It works, but it’s a mess. At least for me is more simple and clean the solution with curl. With curl we can also use different servers and scale out, but as you said proc-open is another choice.

  1. Pingback: Parallel processing in PHP | davblog: webdev and stuff

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 963 other followers

%d bloggers like this: