Speed up PHP scripts with asynchronous database queries


That’s the situation. A web application with 4 heavy queries. Yes I know you can use a cache or things like that but imagine you must perform the four queries yes or yes. As I say before the four database queries are heavy ones. 2 seconds per one. Do it sequentially, that’s means our script will use at least 8 seconds (without adding the extra PHP time). Eight seconds are a huge time in a page load.

So here I will show up a technique to speed up our website using asynchronous calls.

One possible solution is fork each query into a separate thread but, does PHP support threads? The short is answer No. OK that’s not 100% true. We can use PCNTL functions but it’s a bit tricky for me to use them. I prefer to use this another solution to create fork and perform asynchronous operations.

The basic idea is isolate each query into a separate script.

// fork.php
$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'nov', 'nov');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$t = filter_input(INPUT_GET, 't', FILTER_SANITIZE_STRING);
switch ($t) {
    case 0;
    case 1;
    case 2;
       $sql = "select * from test.tbl1";
       break;
}
$stmt = $dbh->prepare($sql);
$stmt->execute();
$data = $stmt->fetchAll();
echo json_encode($data);

And now we are going to use the curl‘s multi functions to execute the database queries at the same time:

// index.php
class Fork
{
    private $_handles = array();
    private $_mh      = array();

    function __construct()
    {
        $this->_mh = curl_multi_init();
    }

    function add($url)
    {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 30);
        curl_multi_add_handle($this->_mh, $ch);
        $this->_handles[] = $ch;
        return $this;
    }

    function run()
    {
        $running=null;
        do {
            curl_multi_exec($this->_mh, $running);
            usleep (250000);
        } while ($running > 0);
        for($i=0; $i < count($this->_handles); $i++) {
            $out = curl_multi_getcontent($this->_handles[$i]);
            $data[$i] = json_decode($out);
            curl_multi_remove_handle($this->_mh, $this->_handles[$i]);
        }
        curl_multi_close($this->_mh);
        return $data;
    }
}

$fork = new Fork;

$output = $fork->add("http://localhost/fork.php?t=0")
    ->add("http://localhost/fork.php?t=1")
    ->add("http://localhost/fork.php?t=2")
    ->run();

echo "<pre>";
print_r($output);
echo "</pre>";

And that’s all. Four queries will be executed at the same time and our scrip will use 2 seconds (the slowest query) instead of  8 seconds. But there’s a little inconvenient within this solution. We need to open 4  connections instead of one because of the lack of connection pooling in PHP. Another problem with this technique appears if we need to debug our script. The debugger won’t debug our “forks” at the same time (we will need to debug then separately). Because of that we need to balance the usage of this technique or not, but it’s another weapon we have to improve our scripts.

Advertisement

32 thoughts on “Speed up PHP scripts with asynchronous database queries

    1. yes but If those jobs/tasks are not 100% necessary within the execution of the script (in the user point of view) it’s better to put them off it. I have a new post (in my mind yet 🙂 ) about those kind of things.

  1. It feels like you should optimize your queries, two seconds is a long time for a single query. Another thing I’m wondering about is: do you really need the return value of those scripts? If you don’t you could write the queries in a file and execute them later and / or asynchronous in another way.

    1. 1) It depends on the database and the query you are executing. There are some datamining queries may use seconds. Even optimized ones. It also depend on the database location. It’s not the same when you are retreiving data from localhost or you are doing it from a database in another location (we need consider network speed). Of course this method is not good for fast queries (it can be even worst).

      2) I return data in those scripts because I need the information online. Imagine I want to populate a datagrid with this information when the user clicks on a button. We also can use another comunication technique instead of json encode/decode. We can use memcached (write en the fork and read in the main script). If I do what you said and I execute the query offline (save them and execute later) I need an event or something like that to populate the datagrid. If we are speaking about it we are speaking about comet or webworkers. That’s another approach but definetly a more complex solution.

  2. Well,

    That’s was a smart solution. I see the problem of not being able to use a pool of connection to manage them. But my concern is about security.

    Do you use some kind of hash through the GET parameter or some other kind of security measure?

    Cheers,
    Lucas

    1. The lack of a real pool in PHP is a problem indeed. You can use SQL Relay but I don’t really like it.

      BTW your security appreciation is great. To solve the issue I like to use a separate virtualhost for “forks” only available at localhost. That’s means only server can reach to localhost. With this solution, I trust the localhost connections(server clients with curl) and client (browser clients) never reach to localhost. I haven’t add this to the post becase my aim was to show only the idea of threads with curl.

      1. That`s a good solution, mate.

        I would also use, just for the precaution, a hash or a number combination (with digit verification) that both knew, so the person would need to have access to both sources to be a threat.

  3. I am curious as to why you created your own approach here? Things like Gearman can perform far better optimization for things like this.

    For instance, if you are looking just to offload different types of SQL selects or anything else, have a worker pool that does them and send them through asynchronously.

    This way you are not using cURL and having an additional http request which could eventually become a bottleneck in a large enough site.

    1. is it possible to do it with gearman? As far as I known with gearman I can enqueue jobs. I use gearman when I want to execute jobs offline (like mailing, logs …). But may I execute several jobs at the same time (not with a queue) as we do with curl-multi funtions? If it’s possible it will be great for me.

  4. Very clever, but hard to implement and debug. Especially if you user MVC or other patterns. In enterprise world to speed up the DB job you can use a technique called CQRS. The idea is to denormalize your DB so you never ever need to run 3 complicated long lasting queries.
    back to your article, as I said it is very interesting idea.

    1. Yes. Probably the main problem is debugging. I solve it with a simple trick. I use a flag in the development server to perform queries online instead of using curl. It’s not the same but it allows me to debug them.

    1. Great thing. I didn’t knew that. I need to test it deeply. I’m a PDO fan but I’ll test it. Many thanks

  5. Very good code. Performance is always a great concern for majority of our projects. asynchronous calls is one good way to speed up things. Thanks for the post.

    1. Yes. But in fact the motivation of the example of the post was to change one implementation that I have in one proyect with proc-open. It works, but it’s a mess. At least for me is more simple and clean the solution with curl. With curl we can also use different servers and scale out, but as you said proc-open is another choice.

  6. Very nice. So maybe you can use it in some microservice environment, where you’ve to request multiple ones. And there you need, as someone mentioned before, the returning data. Anyway, nice idea to run multiple queries asynchronous.

  7. you can wirte your code like this:

    $obj = new asyncRedisProxy();
    $obj->connect(“127.0.0.1”, “6379”);
    $obj->set(“a”, 11111, function($obj, $ret) {
    $obj->get(“a”, function($obj, $data) {
    var_dump($data);
    $obj->release(); //release to con pool
    });
    });

    $obj2 = new asyncPdoProxy(‘mysql:host=192.168.1.19;dbname=mz_db’, “public_user”, “1qa2ws3ed”);
    $obj2->query(“select 1 from mz_user where user_id=299”, function($obj, $stmt) {
    $arr = $stmt->fetchAll();
    var_dump($arr);
    $obj->query(“select 2 from mz_user where user_id=299”, function($obj, $stmt) {
    $arr = $stmt->fetchAll();
    var_dump($arr);
    $obj->release(); //release to con pool
    });
    });

    $obj3 = new asyncPdoProxy(‘mysql:host=192.168.1.19;dbname=mz_db’, “public_user”, “1qa2ws3ed”);
    $obj3->exec(“insert into t1(name) values(‘111111’)”, function($obj, $data) {
    var_dump($data);
    $obj->release(); ////release to con pool
    });

    $obj4 = new asyncPdoProxy(‘mysql:host=192.168.1.19;dbname=mz_db’, “public_user”, “1qa2ws3ed”);
    $stmt = $obj4->prepare(“select * from mz_account where user_id=:user_id”);
    $stmt->bindParam(‘:user_id’, “311”);
    $stmt->execute(function($stmt, $data) {
    var_dump($stmt->fetchAll());
    $stmt->release();//release to con pool
    });

    amazing !!right? this sulution base on https://github.com/swoole/php-cp

    1. Looks great! I must put this library in the top of my tools-to-study list. I’ll check it

  8. This is good. The only issue – a minor one – is you’re now blocking 4 connections on port 80. Additionally, how are you managing transacitons? i.e. row / table level locking? We’ve tackled these issues by relegating the queries to separate scripts managed by either a remote process controller (like a cron) or placed on an MQueue.

  9. My script searches through the database and might get 100 or more results, each of which would be identified like:

    ->add(“https://mysite.com/dev/fork.php?id=’ . $someId .'”)
    ‘;

    How can this output be added to the fork ($fork) object as they change and cannot be hard coded?

    Thanks

  10. My script searches through the database and might get 100 or more results, each of which would be identified like:

    ->add(“https://mysite.com/dev/fork.php?id=’ . $someId .’”)
    ‘;

    How can this output be added to the fork ($fork) object as they change and cannot be hard coded?

    Thanks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.