How does Ada, Java and Python compare with each other when they are used to connect to a database? This was the main motivation for me to write the SQL Benchmark and write this article.
Tag - performance
Ada, Java and Python database access
By Stephane Carrez2018-11-17 14:02:00
Rest API Benchmark comparison between Ada and Java
By Stephane Carrez2017-03-21 22:55:00 3 comments
Arcadius Ahouansou from Menelic.com made an interesting benchmark to compare several Java Web servers: Java REST API Benchmark: Tomcat vs Jetty vs Grizzly vs Undertow, Round 3. His benchmark is not as broad as the TechEmpower Benchmark but it has the merit to be simple to understand and it can be executed very easily by everyone. I decided to make a similar benchmark for Ada Web servers with the same REST API so that it would be possible to compare Ada and Java implementations.
World IPv6 Day
By Stephane Carrez2013-12-31 14:31:15
Today, June 8th 2011, is the World IPv6 day. Major organisations such as Google, Facebook, Yahoo! wil offer native IPv6 connectivity.
To check your IPv6 connectivity, you can run a test from your browser: Test your IPv6 connectivity.
If you install the ShowIP Firefox plugin, you will know the IP address of web sites while you browse and therefore quickly know whether you navigate using IPv4 or IPv6.
Below are some basic performance results between IPv4 and IPv6. Since most routers are tuned for IPv4, the IPv6 flow path is not yet as fast as IPv4. The (small) performance degradation has nothing to do with the IPv6 protocol.
Google IPv4 vs IPv6 ping
$ ping -n www.google.com
PING www.l.google.com (209.85.146.103) 56(84) bytes of data.
64 bytes from 209.85.146.103: icmp_seq=1 ttl=55 time=9.63 ms
$ ping6 -n www.google.com
PING www.google.com(2a00:1450:400c:c00::67) 56 data bytes
64 bytes from 2a00:1450:400c:c00::67: icmp_seq=1 ttl=56 time=11.6 ms
Yahoo IPv4 vs IPv6 ping
$ ping -n www.yahoo.com
PING fpfd.wa1.b.yahoo.com (87.248.122.122) 56(84) bytes of data.
64 bytes from 87.248.122.122: icmp_seq=1 ttl=58 time=25.7 ms
$ ping6 -n www.yahoo.com
PING www.yahoo.com(2a00:1288:f00e:1fe::3000) 56 data bytes
64 bytes from 2a00:1288:f00e:1fe::3000: icmp_seq=1 ttl=60 time=31.3 ms
Facebook IPv4 vs IPv6 ping
$ ping -n www.facebook.com
PING www.facebook.com (66.220.156.25) 56(84) bytes of data.
64 bytes from 66.220.156.25: icmp_seq=1 ttl=247 time=80.6 ms
$ ping6 -n www.facebook.com
PING www.facebook.com(2620:0:1c18:0:face:b00c:0:1) 56 data bytes
64 bytes from 2620:0:1c18:0:face:b00c:0:1: icmp_seq=1 ttl=38 time=98.6 ms
Optimization with Valgrind Massif and Cachegrind
By Stephane Carrez2013-03-02 22:51:00
Memory optimization reveals sometimes some nice surprise. I was interested to analyze the memory used by the Ada Server Faces framework. For this I've profiled the unit tests program. This includes 130 tests that cover almost all the features of the framework.
Memory analysis with Valgrind Massif
Massif is a Valgrind tool that is used for heap analysis. It does not require the application to be re-compiled and can be used easily. The application is executed by using Valgrind and its tool Massif. The command that I've used was:
valgrind --tool=massif --threshold=0.1 \
--detailed-freq=1 --alloc-fn=__gnat_malloc \
bin/asf_harness -config test.properties
The valgrind tool creates a file massif.out.NNN
which contains the analysis. The massif-visualizer
is a graphical tool that reads the file and allows you to analyze the results. It is launched as follows:
massif-visualizer massif.out.19813
(the number is the pid of the process that was running, replace it accordingly).
The tool provides a graphical representation of memory used over the time. It allows to highlight a given memory snapshot and understand roughly where the memory is used.
While looking at the result, I was intrigued by a 1MB allocation that was made several times and then released (It creates these visual spikes and it correspond to the big red horizontal bar that appears visually). It was within the sax-utils.adb file that is part of the XML/Ada library. Looking at the implementation, it turns out that it allocates a hash table with 65536 entries. This allocation is done each time the sax parser is created. I've reduced the size of this hash table to 1024 entries. If you want to do it, change the following line in sax/sax-symbols.ads
(line 99):
Hash_Num : constant := 2**16;
by:
Hash_Num : constant := 2**10;
After building, checking there is no regression (yes, it works), I've re-run the Massif tool and here are the results.
The peak memory was reduced from 2.7Mb to 2.0Mb. The memory usage is now easier to understand and analyse because the 1Mb allocation is gone. Other memory allocations have more importance now. But wait. There is more! My program is now faster!
Cache analysis with cachegrind
To understand why the program is now faster, I've used Cachegrind that measures processor cache performance. Cachegrind is a cache and branch-prediction profiler provided by Valgrind as another tool. I've executed the tool with the following command:
valgrind --tool=cachegrind \
bin/asf_harness -config test.properties
I've launched it once before the hash table correction and once after. Similar to Massif, Cachegrind generates a file cachgrind.NNN
that contains the analysis. You analyze the result by using either cg_annotate
or kcachegrind
. Having two Cachegrind files, I've used cg_diff
to somehow get diff between the two executions.
cg_diff cachegrind.out.24198 cachegrind.out.23286 > cg.out.1
cg_annotate cg.out.1
Before the fix, we can see in Cachegrind report that the most intensive memory operations are performed by Sax.Htable.Reset
operation and by the GNAT operation that initializes the Sax.Symbols.Symbol_Table_Record
type which contains the big hash table. Dr
is the number of data reads, D1mr
the L1 cache read miss and Dw
is the number of writes with D1mw
representing the L1 cache write miss. Having a lot of cache miss will slow down the execution: L1 cache access requires a few cycles while main memory access could cost several hundreds of them.
--------------------------------------------------------------------------------
Dr D1mr Dw D1mw
--------------------------------------------------------------------------------
212,746,571 2,787,355 144,880,212 2,469,782 PROGRAM TOTALS
--------------------------------------------------------------------------------
Dr D1mr Dw D1mw file:function
--------------------------------------------------------------------------------
25,000,929 2,081,943 27,672 244 sax/sax-htable.adb:sax__symbols__string_htable__reset
508 127 33,293,050 2,080,768 sax/sax-htable.adb:sax__symbols__symbol_table_recordIP
43,894,931 129,786 7,532,775 8,677 ???:???
15,021,128 4,140 5,632,923 0 pthread_getspecific
7,510,564 2,995 7,510,564 10,673 ???:system__task_primitives__operations__specific__selfXnn
6,134,652 41,357 4,320,817 49,207 _int_malloc
4,774,547 22,969 1,956,568 4,392 _int_free
3,753,930 0 5,630,895 5,039 ???:system__task_primitives__operations(short,...)(long, float)
With a smaller hash table, the Cachegrind report indicates a reduction of 24,543,482 data reads and 32,765,323 data writes. The cache read miss was reduced by 2,086,579 (74%) and the cache write miss was also reduced by 2,056,247 (83% reduction!).
With a small hash table, the Sax.Symbols.Symbol_Table_Record
gets initialized quicker and its cleaning needs less memory accesses, hence CPU cycles. By having a smaller hash table, we also benefit from less cache miss: using a 1Mb hash table flushes a big part of the data cache.
--------------------------------------------------------------------------------
Dr D1mr Dw D1mw
--------------------------------------------------------------------------------
188,203,089 700,776 112,114,889 413,535 PROGRAM TOTALS
--------------------------------------------------------------------------------
Dr D1mr Dw D1mw file:function
--------------------------------------------------------------------------------
43,904,760 120,883 7,532,577 8,407 ???:???
15,028,328 98 5,635,623 0 pthread_getspecific
7,514,164 288 7,514,164 9,929 ???:system__task_primitives__operations__specific__selfXnn
6,129,019 39,636 4,305,043 48,446 _int_malloc
4,784,026 18,626 1,959,387 3,261 _int_free
3,755,730 0 5,633,595 4,390 ???:system__task_primitives__operations(short,...)(long, float)
2,418,778 65 2,705,140 14 ???:system__tasking__initialization__abort_undefer
3,839,603 2,605 1,283,289 0 malloc
Conclusion
Running massif and cachegrind is very easy but it may take some time to figure out how to understand and use the results. A big hash table is not always a good thing for an application. By creating cache misses it may in fact slow down the application. To learn more about this subject, I recommend the excellent document What Every Programmer Should Know About Memory written by Ulrich Drepper.
Thread safe cache updates in Java and Ada
By Stephane Carrez2011-04-28 22:01:14 2 comments
Problem Description
The problem is to update a cache that is almost never modified and only read in multi-threaded context. The read performance is critical and the goal is to reduce the thread contention as much as possible to obtain a fast and non-blocking path when reading the cache.
Cache Declaration
Java Implementation
Let's define the cache using the HashMap
class.
public class Cache {
private HashMap<String,String> map = new HashMap<String, String>();
}
Ada Implementation
In Ada, let's instantiate the Indefinite_Hashed_Maps
package for the cache.
with Ada.Strings.Hash;
with Ada.Containers.Indefinite_Hashed_Maps;
...
package Hash_Map is
new Ada.Containers.Indefinite_Hashed_Maps (Key_Type => String,
Element_Type => String,
Hash => Hash,
"=" => "=");
Map : Hash_Map.Map;
Solution 1: safe and concurrent implementation
This solution is a straightforward solution using the language thread safe constructs. In Java this solution does not allow several threads to look at the cache at the same time. The cache access will be serialized. This is not a problem with Ada, where multiple concurrent readers are allowed. Only writing locks the cache object
Java Implementation
The thread safe implementation is protected by the synchronized keyword. It guarantees mutual exclusions of threads invoking the getCache
and addCache
methods.
public synchronized String getCache(String key) {
return map.get(key);
}
public synchronized void addCache(String key, String value) {
map.put(key, value);
}
Ada Implementation
In Ada, we can use the protected
type. The cache could be declared as follows:
protected type Cache is
function Get(Key : in String) return String;
procedure Put(Key, Value: in String);
private
Map : Hash_Map.Map;
end Cache;
and the implementation is straightforward:
protected body Cache is
function Get(Key : in String) return String is
begin
return Map.Element (Key);
end Get;
procedure Put(Key, Value: in String) is
begin
Map.Insert (Key, Value);
end Put;
end Cache;
Pros and Cons
+: This implementation is thread safe.
-: In Java, thread contention is high as only one thread can look in the cache at a time.
-: In Ada, thread contention occurs only if another thread updates the cache (which is far better than Java but could be annoying for realtime performance if the Put
operation takes time).
-: Thread contention is high as only one thread can look in the cache at a time.
Solution 2: weak but efficient implementation
The Solution 1 does not allow multiple threads to access the cache at the same time, thus providing a contention point. The second solution proposed here, removes this contention point by relaxing some thread safety condition at the expense of cache behavior.
In this second solution, several threads can read the cache at the same time. The cache can be updated by one or several threads but the update does not guarantee that all entries added will be present in the cache. In other words, if two threads update the cache at the same time, the updated cache will contain only one of the new entry. This behavior can be acceptable in some cases and it may not fit for all uses. It must be used with great care.
Java Implementation
A cache entry can be added in a thread-safe manner using the following code:
private volatile HashMap<String, String> map = new HashMap<String, String>();
public String getCache(String key) {
return map.get(key);
}
public void addCache(String key, String value) {
HashMap<String, String> newMap = new HashMap<String, String>(map);
newMap.put(newKey, newValue);
map = newMap;
}
This implementation is thread safe because the hash map is never modified. If a modification is made, it is done on a separate hash map object. The new hash map is then installed by the map = newMap
assignment operation which is atomic. Again this code extract does not guarantee that all the cache entries added will be part of the cache.
Ada Implementation
The Ada implementation is slightly more complex basically because there is no garbage collector. If we allocate a new hash map and update the access pointer, we still have to free the old hash map when no other thread is accessing it.
The first step is to use a reference counter to automatically release the hash table when the last thread finishes its work. The reference counter will handle memory management issues for us. An implementation of thread-safe reference counter is provided by Ada Util. In this implementation, counters are updated using specific instruction (See Showing multiprocessor issue when updating a shared counter).
with Util.Refs;
...
type Cache is new Util.Refs.Ref_Entity with record
Map : Hash_Map.Map;
end record;
type Cache_Access is access all Cache;
package Cache_Ref is new Util.Refs.References (Element_Type => Cache,
Element_Access => Cache_Access);
C : Cache_Ref.Atomic_Ref;
Source: Util.Refs.ads, Util.Refs.adb
The References
package defines a Ref
type representing the reference to a Cache
instance. To be able to replace a reference by another one in an atomic manner, it is necessary to use the Atomic_Ref
type. This is necessary because the Ada assignment of an Ref
type is not atomic (the assignment copy and the call to the Adjust
operation to update the reference counter are not atomic). The Atomic_Ref
type is a protected type that provides a getter and a setter. Their use guarantees the atomicity.
function Get(Key : in String) return String is
R : constant Cache_Ref.Ref := C.Get;
begin
return R.Value.Map.Element (Key); -- concurrent access
end Get;
procedure Put(Key, Value: in String) is
R : constant Cache_Ref.Ref := C.Get;
N : constant Cache_Ref.Ref := Cache_Ref.Create;
begin
N.Value.all.Map := R.Value.Map;
N.Value.all.Insert (Key, Value);
C.Set (N); -- install the new map atomically
end Put;
Pros and Cons
+: high performance in SMP environments
+: no thread contention in Java
-: cache update can loose some entries
-: still some thread contention in Ada but limited to copying a reference (C.Set)
Showing multiprocessor issue when updating a shared counter
By Stephane Carrez2011-03-06 09:52:43
When working on several Ada concurrent counter implementations, I was interested to point out the concurrent issue that exists in multi-processor environment. This article explains why you really have to take this issue seriously in multi-tasks applications, specially because multi-core processors are now quite common.
What's the issue
Let's say we have a simple integer shared by several tasks:
Counter : Integer;
And several tasks will use the following statement to increment the counter:
Counter := Counter + 1;
We will see that this implementation is wrong (even if a single instruction is used).
Multi task increment sample
To show up the issue, let's define two counters. One not protected and another protected from concurrent accesses by using a specific data structure provided by the Ada Util library.
with Util.Concurrent.Counters;
..
Unsafe : Integer := 0;
Counter : Util.Concurrent.Counters.Counter;
In our testing procedure, let's declare a task type that will increment both versions of our counters. Several tasks will run concurrently so that the shared counter variables will experience a lot of concurrent accesses. The task type is declared in a declare
block inside our procedure so that we will benefit from task synchronisation at the end of the block (See RM 7.6, and RM 9.3).
Each task will increment both counters in a loop. We should expect the two counters to get the same value at the end. We will see this is not the case in multi-processor environments.
declare
task type Worker is
entry Start (Count : in Natural);
end Worker;
task body Worker is
Cnt : Natural;
begin
accept Start (Count : in Natural) do
Cnt := Count;
end;
for I in 1 .. Cnt loop
Util.Concurrent.Counters.Increment (Counter);
Unsafe := Unsafe + 1;
end loop;
end Worker;
Now, in the same declaration block, we will define an array of tasks to show up the concurrency.
type Worker_Array is array (1 .. Task_Count) of Worker;
Tasks : Worker_Array;
Our tasks are activated and they are waiting to get the counter. Let's make our tasks count 10 million times.
begin
for I in Tasks'Range loop
Tasks (I).Start (10_000_000);
end loop;
end;
Before leaving the declare
scope, Ada will wait until the tasks have finished. (yes, there is no need to write any pthread_join
code). After this block, we can just print out the value stored in the two counters and compare them:
Log.Info ("Counter value at the end : " & Integer'Image (Value (Counter)));
Log.Info ("Unprotected counter at the end : " & Integer'Image (Unsafe));
The complete source is available in the Ada Util project in multipro.adb.
The Results
With one task, everything is Ok (Indeed!):
Starting 1 tasks
Expected value at the end : 10000000
Counter value at the end : 10000000
Unprotected counter at the end : 10000000
With two tasks, the problem appears:
Starting 2 tasks
Expected value at the end : 10000000
Counter value at the end : 10000000
Unprotected counter at the end : 8033821
And it aggravates as the number of tasks increases.
Starting 16 tasks
Expected value at the end : 10000000
Counter value at the end : 10000000
Unprotected counter at the end : 2496811
(The above results have been produced on an Intel Core Quad; Similar problems show up on Atom processors as well)
Explanation
On x86 processors, the compiler can use an incl instruction for the unsafe counter increment. So, one instruction for our increment. You thought it was thread safe. Big mistake!
incl %(eax)
This instruction is atomic in a mono-processor environment meaning that it cannot be interrupted. However, in a multi-processor environment, each processor has its own memory cache (L1 cache) and will read and increment the value into its own cache. Caches are synchronized but this is almost always too late. Indeed, two processors can read their L1 cache, increment the value and save it at the same time (thus, loosing one increment). This is what is happening with the unprotected counter.
Let's see how to do the protection.
Protection with specific assembly instruction
To avoid this, it is necessary to use special instructions that will force the memory location to be synchronized and locked until the instruction completes. On x86, this is achieved by the lock
instruction prefix. The following is guaranteed to be atomic on multi-processors:
lock
incl %(eax)
The lock
instruction prefix introduces a delay to the execution of the instruction it protects. This delay increases slightly when concurrency occurs but it remains acceptable (up to 10 times slower).
For Sparc, Mips and other processors, the implementation requires to loop until either a lock is get (Spinlock) or it is guaranteed that no other processor has modified the counter at the same time.
Source: Util.Concurrent.Counters.ads, Util.Concurrent.Counters.adb
Protection with an Ada protected type
A safe and portable counter implementation can be made by using Ada protected types. The protected type allows to define a protected procedure Increment
which provides an exclusive read-write access to the data (RM 9.5.1). The protected function Value
will offer a concurrent read-only access to the data.
package Util.Concurrent.Counters is
type Counter is limited private;
procedure Increment (C : in out Counter);
function Value (C : in Counter) return Integer;
private
protected type Cnt is
procedure Increment;
function Get return Integer;
private
N : Integer := 0;
end Cnt;
type Counter is limited record
Value : Cnt;
end record;
end Util.Concurrent.Counters;
Source: Util.Concurrent.Counters.ads, Util.Concurrent.Counters.adb
Installing an SSD device on Ubuntu
By Stephane Carrez2011-02-20 17:29:17
This article explains the steps for the installation of an SSD device on an existing Ubuntu desktop PC.
Disk Performances
First of all, let's have a look at the disk read performance with the hdparm
utility. The desktop PC has three disks, /dev/sda
being the new SSD device (an OCZ Vertex 2 SATA II 3.5" SSD).
The three disks have the following performance:
sda: OCZ-VERTEX2 3.5 229.47 MB/sec
sdb: WDC WD3000GLFS-01F8U0 122.29 MB/sec
sdc: ST3200822A 59.23 MB/sec
The SSD device appears to be 2 times faster than a 10000 rpm disk.
Plan for the move
The first step is to plan for the move and define what files should be located on the SSD device.
Identify files used frequently
To benefit of the high read performance, files used frequently could be moved to the SSD device. To identify them, you can use the find
command and the -amin
option. This option will not work if the file system is mounted with noatime
. The -amin
option indicates a number of minutes. To find the files that were accessed during the last 24 hours, you may use the following command:
In most cases, files accessed frequently are the system files (in /bin
, /etc
, /lib
, ..., /usr/bin
, /usr/lib
, /usr/share
, ...) and users' files located in /home
.
Identify Files that change frequently
Some people argue that files modified frequently should not be located on an SSD device (write endurance and write performance).
On a Linux system, the system files that are changed on regular basis are in general grouped together in the /var
directory. Some configuration files are modified by system daemons while they are running. The list of system directories that changes can be limited to:
/etc (cups/printers.conf.0, mtab, lvm/cache, resolv.conf, ...)
/var (log/*, cache/*, tmp/*, lib/*...)
/boot (grub/grubenv modified after booting)
Temporary Files
On Linux temporary files are stored in one of the following directories. Several KDE applications are saving temporary files in the .kde/tmp-
host
directory for each user. These temporary files could be moved to a ram file system.
/tmp
/var/tmp
/home/$user/.kde/tmp-$host
Move plan
The final plan was to create one partition for the root file system and three LVM partitions for /usr
, /var
and /home
directories.
Partition the drive
The drive must be partitioned with fdisk
. I created one bootable partition and a second partition with what remains.
Disk /dev/sda: 120.0 GB, 120034123776 bytes
255 heads, 63 sectors/track, 14593 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00070355
Device Boot Start End Blocks Id System
/dev/sda1 * 1 1295 10402056 83 Linux
/dev/sda2 1296 14593 106816185 83 Linux
To ease future management of partitions, it is useful to use LVM and create a volume group.
Volume group "vg01" successfully created
The partitions are then created by using lvcreate
. More space can be allocated on them by using the lvextend
utility.
Logical volume "sys" created
$ sudo lvcreate -L 10G -n var vg01
Logical volume "var" created
$ sudo lvcreate -L 4G -n swap vg01
Logical volume "swap" created
$ sudo lvcreate -L 60G -n home vg01
Logical volume "home" created
The LVM partitions are available through the device mapper and they can be accessed by their name:
total 0
lrwxrwxrwx 1 root root 19 2011-02-20 14:03 home -> ../mapper/vg01-home
lrwxrwxrwx 1 root root 19 2011-02-20 14:03 swap -> ../mapper/vg01-swap
lrwxrwxrwx 1 root root 18 2011-02-20 14:03 sys -> ../mapper/vg01-sys
lrwxrwxrwx 1 root root 18 2011-02-20 14:03 var -> ../mapper/vg01-var
Format the partition
Format the file system with ext4 as it integrates various improvements which are useful for the SSD storage (Extents, Delayed allocation). Other file systems will work very well too.
Move the files
To move files from one system to another place, it is safer to use the tar
command instead of a simple cp
. Indeed, the tar
command is able to copy special files without problems while not all cp
commands support the copy of special files.
$ sudo -i
# cd /usr
# tar --one-file-system -cf - . | (cd /target; tar xf -)
If the file system to move is located on another LVM partition, it is easier and safer to use the pvmove
utility to move physical extents from one physical volume to another one.
Change the mount point
Edit the /etc/fstab
file and change the old mount point to the new one. The noatime
mount option tells the kernel to avoid updating the file access time when it is read.
/dev/vg01/sys /usr ext4 noatime 0 2
/dev/vg02/home /home ext4 noatime 0 2
/dev/vg01/var /var ext4 noatime 0 2
Tune the IO Scheduler
For the SSD drive, it is best to disable the Linux IO scheduler. For this, we will activate the noop
IO scheduler. Other disks will use the default IO scheduler or another one. Add the following lines in /etc/rc.local
file:
test -f /sys/block/sda/queue/scheduler &&
echo noop > /sys/block/sda/queue/scheduler
References
http://www.ocztechnologyforum.com/forum/showthread.php?54379-Linux-Tips-tweaks-and-alignment
Boost your php web site by installing eAccelerator
By Stephane Carrez2010-10-23 06:29:00 1 comment
This article explains how to boost the performance of a PHP site by installing a PHP accelerator software.
Why is PHP slow
PHP is an interpreted language that requires to parse the PHP files for each request received by the server. With a compiled language such as Java or Ada, this long and error prone process is done beforehand. Even if the PHP interpretor is optimized, this parsing step can be long. The situation is worse when you use a framework (Symfony, CakePHP,...) that requires many PHP files to be scanned.
eAccelerator to the rescue
eAccelerator
is a module that reduces this performance issue by introducing a shared cache for the PHP pre-compiled files. The module somehow compiles the PHP files in some internal compiled state and makes this available to the apache2
processes through a shared memory segment.
Installing eAccelerator
First get eAccelerator sources at http://eaccelerator.net/
Then extract the tar.bz2
file on your server:
$ tar xvjf eaccelerator-0.9.6.1.tar.bz2
eaccelerator-0.9.6.1/
eaccelerator-0.9.6.1/COPYING
...
Build eAccelerator module
Before building the module you must first run the phpize
command to prepare the module before compilation:
$ cd eaccelerator-0.9.6.1/
$ phpize
Then, launch the configure script:
$ ./configure --enable-eaccelerator=shared \
--with-php-config=/usr/bin/php-config
Finally build the module:
$ make
Install eAccelerator
Installation is done by the next steps:
$ sudo make install
Don't forget to copy the configuration file (have a look at its content but in most cases it works as is):
$ sudo cp eaccelerator.ini /etc/php5/conf.d/
Restart Apache server
To make the module available, you have to restart the Apache server:
$ sudo /etc/init.d/apache2 restart
Performance improvements
What performance gain can you expect... That will depend on the PHP software and the page. It's easy to have an idea.
To measure the performance improvement, you can use the Apache benchmarking tool. Do a performance measurement on the web site before the installation and another one after. Be sure to benchmark the same page.
The following command will benchmark the http://mysite.mydomain.com/index.php page 100 times with only one connection.
$ ab -n 100 http://mysite.mydomain.com/index.php
Below is an extract of the percentage of the requests served within a certain time (ms) for one of my web page served by Dotclear:
Without with
eAccelerator eAccelerator
50% 383 236
66% 384 237
75% 387 238
80% 388 239
90% 393 258
95% 425 265
98% 536 295
99% 796 307
100% 796 307 (longest request)
The gain varies from 38% to 60% so it is quite interesting. The other benefit is that the variance is also smaller meaning that requests are served globally in the same time.
Solving Linux system lock up when intensive disk I/O are performed
By Stephane Carrez2010-08-28 08:02:43
When a system lock up occurs, we often blame applications but when you look carefully you may see that despite your multi-core CPU, your applications are sleeping! No cpu activity! So what happens then? Check the I/Os, it could be the root cause!
With Ubuntu 10.04, my desktop computer was freezing when the ReadyNAS Bacula backup was running. Indeed, the Bacula daemon was performing intensive disk operations (on a fast SATA hard disk). The situation was such that it was impossible to use the system, the interface was freezing for a several seconds then working for a few seconds and freezing again.
Linux I/O Scheduler
The I/O scheduler is responsible for organizing the order in which disk operations are performed. Some algorithms allow to minimize the disk head moves, other algorithms tend to anticipate read operations,
When I/O operations are not scheduled correctly, an interactive application such as a desktop or a browser can be blocked until its I/O operations are scheduled and executed (the situation can be even worse for those applications that use the O_SYNC
writing mode).
By default, the Linux kernel is configured to use the Completely Fair Queuing scheduler. This I/O scheduler does not provide any time guarantee but it gives in general good performances. Linux provides other I/O schedulers such as the Noop scheduler, the Anticipatory scheduler and the Deadline scheduler.
The deadline scheduler puts an execution time limit to requests to make sure the I/O operation is executed before an expiration time. Typically, a read operation will wait at most 500 ms. This is the I/O scheduler we need to avoid the system lock up.
Checking the I/O Scheduler
To check which I/O scheduler you are using, you can use the following command:
$ cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
where sda
is the device name of your hard disk (or try hda
).
The result indicates the list of supported I/O scheduler as well as the current scheduler used (here the Completely Fair Queuing).
Changing the I/O Scheduler
To change the scheduler, you can echo
the desired scheduler name to activate it (you must be root):
# echo deadline > /sys/block/sda/queue/scheduler
To make sure the I/O scheduler is configured after each system startup, you can add the following lines to your /etc/rc.local
startup script:
test -f /sys/block/sda/queue/scheduler &&
echo deadline > /sys/block/sda/queue/scheduler
test -f /sys/block/sdb/queue/scheduler &&
echo deadline > /sys/block/sdb/queue/scheduler
test -f /sys/block/hda/queue/scheduler &&
echo deadline > /sys/block/hda/queue/scheduler
You may have to change the sda
and sdb
into hda
and hdb
if you have an IDE hard disk.
Conclusion
After changing the I/O scheduler to use the Deadline scheduler, the desktop was not freezing any more when backups are running.
Experience feedback in running a SaaS application
By Stephane Carrez2010-07-14 16:02:10
When you go in production for a new service you may not know whether your application will have the necessary performance to serve your customer. Can the application support the growth? Should you deploy early? What do you do if you reach performance pr
How google analytics can alter your web performance
By Stephane Carrez2009-04-28 19:55:27
Google analytics is often used by Marketing teams to have a feedback of the web site usage, track visits, entry and leave points. Google analytics is easy to use but it has some drawbacks that you don't see at the beginning. Altering the performance of yo
Tags
- Facelet
- NetBSD
- framework
- Mysql
- generator
- files
- application
- gcc
- ReadyNAS
- Security
- binutils
- ELF
- JSF
- Java
- bacula
- Tutorial
- Apache
- COFF
- collaboration
- planning
- project
- upgrade
- AWA
- C
- EL
- J2EE
- UML
- php
- symfony
- Ethernet
- Ada
- FreeBSD
- Go
- KVM
- MDE
- Proxy
- STM32
- Servlet
- backup
- lvm
- multiprocessing
- web
- Bean
- Jenkins
- release
- OAuth
- ProjectBar
- REST
- Rewrite
- Sqlite
- Storage
- USB
- Ubuntu
- bison
- cache
- crash
- Linux
- firefox
- performance
- interview