Thursday, February 9, 2012

Xdebug PHP scipts profiling examples

Look at this presentation to become familiar with PHP profiling:
And also look into this article for more details:

Edit PHP.ini and add the line:
xdebug.profiler_enable = 1
Restart the Apache and you will find the profiling dumps in /tmp/ .

The format of those files is "Cachegrind", generally its just a plain text file with execution trace information added line by line. I wasn't able to find the plugin to visualize that kind of data inside Eclipse so use third-party tool for that. More information about using Cachegrid with Xdebug can be found here:

I use  third-party GUI tool called "KCachegrind". The installation of KCachegrind is very simple as it is available from all distro-specific repositories. You can just start the application from your menu and open the dump file.
NOTE: If your profiling dump files were created and downloaded from remote server, you should configure the local source directory in KCachegrind in order to have "Source code" visualization tab to work.

There is a project called Valgrind that has the ability to display Cachegrind dumps, but I haven't had a chance to use it. Also you can loook at the project that has a very interesting WEB interface.

Lets look at some specific profiling examples I had in my practice that describe situations when it is hard to investigate some performance issues.

Case 1. Analyze bottlenecks and improve script performance.

There were some script that was created couple years ago, it calls some XML-RPC service and based on information retrieved from remote service it returns file from filesystem. It wasn't developed to run in hi-load environment so our aim is to profile it and introduce the performance improvement right before release where quire large loads are planned.

Initial profiling chart looked like (click to enlarge):

First two columns shows time intervals in time units where 1 unit equals to 1/1 000 000 of second). So total execution is about 1.4 second. Main challenges are:

  • we can't rewrite script from the scratch because there are a lot of business logic and we don't have enough time to understand how it works
  • we should bootstrap Drupal 6 core in our content handing script, as huge parts of functionality we need is based on Drupal modules that utilize Drupal API
  • try to put into memcache as much information as we can to avoid expensive calls
  • move files handing to CDN and current script should be responsible for redirects to appropriate URL on CDN
Lets cache expensive XML-RPC service calls in memcache via cacheRouter and look at profiling charts again after caches are crated:

Obviously we've got significant performance improvement. But our script still returns file by reading it from filesystem. As there is no actual need (permission restrictions) to handle file by script we can just move file to CDN and redirect user with header('Location:'); That will allow us to avoid 60ms call of readfile() function and should introduce 2x increase of performance in script execution.

Now we have about 90ms for single script run, which is less than 10% from execution time we had initially. Note that this result is not about real delivery performance because xdebug profiler greatly degrades script execution performance. After xdebugger Apache module will be disabled and we will enable op-code cacher like APC or eAccelerator the real performance will be about 20-25ms for each script call which more than enough for our production requirements.

There are still a lot of performance improvements possibilities. Some of them are obviously visible on last profiling chart:
  • we can move to autoloading and skip inclusion of PEAR HTTP Request class file as XML-RPC calls are cached and we don't need that class until cache invalidates and we really need to initialize it;
  • avoid database connection because as all data is retrieved from memcache we don't actually send queries to database; Drupal 6 doesn't supports establishing of lazy-connections so the only way I see if to decrease Drupal bootstrap level to DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE that will not initialize DB API at all;
So in short interval of time we were able to analyze the script execution, located bottlenecks, found some ways for optimization and introduced significant performance improvements with help of xdebug profiling.

Case 2. Investigate exceptional and unpredictable situations when script loops forever.

While running load tests of Drupal website we've found interesting issue that appeared expectantly on hi-loads and we haven't had any exact reproducing steps for that problem. 

PHP script unpredictable starts to run in infinite loop, just in seconds couple "httpd" are spawned running in "top" and consuming 100% of server CPU.

There were no any obvious reason for that due to script itself is simple and doesn't contains complex constructs except utilizing Drupal API calls to save/get data from memcache. 

We've enabled xdebug profiling and started to run different kinds of load tests to somehow reproduce the issue. Finally we've got the situation when couple "httpd" processes appeared actively consuming CPU. We've waited for couple seconds to dump more profiling data and then restarted Apache with xdebug turned off.

We've received thousandths of charts for each page call (we tried to run about 150 requests / second) each of them were only about ~100Kb but couple files were really huge, up to hundreds of megabytes. These are files with dump information from PHP script that looped in the infinite loop and dumping a huge amounts of profiling data into these files.

According to profiling chart we clearly see that problem occurs on theme registry rebuild triggered and previously saved values in the cache are cleared. The infinitive loops are occurred inside Drupal cachrouter memcache engine that implements cache locks by using memcache to hold "lock" semaphore in it. I'm not familiar with that functionality anyway but tried to report as much information as I have to the author on Drupal module issues tracker. You can look at this issue in details under the following comment:

Right now I've added some minor workaround into this code to have script waiting in idle ( sleep() call )  instead of extensively running in while() loop and consuming CPU. After that minor change - we were not able to reproduce this issue anymore at least for now.