Java 2 Ada

Optimization with Valgrind Massif and Cachegrind

By stephane.carrez

Memory optimization reveals sometimes some nice surprise. I was interested to analyze the memory used by the Ada Server Faces framework. For this I've profiled the unit tests program. This includes 130 tests that cover almost all the features of the framework.

Memory analysis with Valgrind Massif

Massif is a Valgrind tool that is used for heap analysis. It does not require the application to be re-compiled and can be used easily. The application is executed by using Valgrind and its tool Massif. The command that I've used was:

valgrind --tool=massif --threshold=0.1 \
   --detailed-freq=1 --alloc-fn=__gnat_malloc \
   bin/asf_harness -config test.properties

The valgrind tool creates a file massif.out.NNN which contains the analysis. The massif-visualizer is a graphical tool that reads the file and allows you to analyze the results. It is launched as follows:

massif-visualizer massif.out.19813

(the number is the pid of the process that was running, replace it accordingly).

The tool provides a graphical representation of memory used over the time. It allows to highlight a given memory snapshot and understand roughly where the memory is used.

Memory consumption with Massif [before]

While looking at the result, I was intrigued by a 1MB allocation that was made several times and then released (It creates these visual spikes and it correspond to the big red horizontal bar that appears visually). It was within the sax-utils.adb file that is part of the XML/Ada library. Looking at the implementation, it turns out that it allocates a hash table with 65536 entries. This allocation is done each time the sax parser is created. I've reduced the size of this hash table to 1024 entries. If you want to do it, change the following line in sax/sax-symbols.ads (line 99):

   Hash_Num : constant := 2**16;

by:

   Hash_Num : constant := 2**10;

After building, checking there is no regression (yes, it works), I've re-run the Massif tool and here are the results.

Memory consumption with Massif [after]

The peak memory was reduced from 2.7Mb to 2.0Mb. The memory usage is now easier to understand and analyse because the 1Mb allocation is gone. Other memory allocations have more importance now. But wait. There is more! My program is now faster!

Cache analysis with cachegrind

To understand why the program is now faster, I've used Cachegrind that measures processor cache performance. Cachegrind is a cache and branch-prediction profiler provided by Valgrind as another tool. I've executed the tool with the following command:

valgrind --tool=cachegrind \
    bin/asf_harness -config test.properties

I've launched it once before the hash table correction and once after. Similar to Massif, Cachegrind generates a file cachgrind.NNN that contains the analysis. You analyze the result by using either cg_annotate or kcachegrind. Having two Cachegrind files, I've used cg_diff to somehow get diff between the two executions.

cg_diff cachegrind.out.24198 cachegrind.out.23286 > cg.out.1
cg_annotate cg.out.1

Before the fix, we can see in Cachegrind report that the most intensive memory operations are performed by Sax.Htable.Reset operation and by the GNAT operation that initializes the Sax.Symbols.Symbol_Table_Record type which contains the big hash table. Dr is the number of data reads, D1mr the L1 cache read miss and Dw is the number of writes with D1mw representing the L1 cache write miss. Having a lot of cache miss will slow down the execution: L1 cache access requires a few cycles while main memory access could cost several hundreds of them.

--------------------------------------------------------------------------------
         Dr      D1mr          Dw      D1mw 
--------------------------------------------------------------------------------
212,746,571 2,787,355 144,880,212 2,469,782  PROGRAM TOTALS

--------------------------------------------------------------------------------
        Dr      D1mr         Dw      D1mw  file:function
--------------------------------------------------------------------------------
25,000,929 2,081,943     27,672       244  sax/sax-htable.adb:sax__symbols__string_htable__reset
       508       127 33,293,050 2,080,768  sax/sax-htable.adb:sax__symbols__symbol_table_recordIP
43,894,931   129,786  7,532,775     8,677  ???:???
15,021,128     4,140  5,632,923         0  pthread_getspecific
 7,510,564     2,995  7,510,564    10,673  ???:system__task_primitives__operations__specific__selfXnn
 6,134,652    41,357  4,320,817    49,207  _int_malloc
 4,774,547    22,969  1,956,568     4,392  _int_free
 3,753,930         0  5,630,895     5,039  ???:system__task_primitives__operations(short,...)(long, float)

With a smaller hash table, the Cachegrind report indicates a reduction of 24,543,482 data reads and 32,765,323 data writes. The cache read miss was reduced by 2,086,579 (74%) and the cache write miss was also reduced by 2,056,247 (83% reduction!).

With a small hash table, the Sax.Symbols.Symbol_Table_Record gets initialized quicker and its cleaning needs less memory accesses, hence CPU cycles. By having a smaller hash table, we also benefit from less cache miss: using a 1Mb hash table flushes a big part of the data cache.

--------------------------------------------------------------------------------
         Dr    D1mr          Dw    D1mw 
--------------------------------------------------------------------------------
188,203,089 700,776 112,114,889 413,535  PROGRAM TOTALS

--------------------------------------------------------------------------------
        Dr    D1mr        Dw   D1mw  file:function
--------------------------------------------------------------------------------
43,904,760 120,883 7,532,577  8,407  ???:???
15,028,328      98 5,635,623      0  pthread_getspecific
 7,514,164     288 7,514,164  9,929  ???:system__task_primitives__operations__specific__selfXnn
 6,129,019  39,636 4,305,043 48,446  _int_malloc
 4,784,026  18,626 1,959,387  3,261  _int_free
 3,755,730       0 5,633,595  4,390  ???:system__task_primitives__operations(short,...)(long, float)
 2,418,778      65 2,705,140     14  ???:system__tasking__initialization__abort_undefer
 3,839,603   2,605 1,283,289      0  malloc

Conclusion

Running massif and cachegrind is very easy but it may take some time to figure out how to understand and use the results. A big hash table is not always a good thing for an application. By creating cache misses it may in fact slow down the application. To learn more about this subject, I recommend the excellent document What Every Programmer Should Know About Memory written by Ulrich Drepper.

To add a comment, you must be connected. Login to add a comment

Deploying a J2EE application behind an Apache server in a production environment

By stephane.carrez

In a production environment, you should not put your JBoss application as a Web front-end. Instead, you should use an Apache server and configure it to redirect specific Web application requests to your J2EE server. There are many many advantages in doing this:

  • The Apache server can serve static files (CSS, images, javascript files) faster than JBoss/Tomcat.
  • When you need it, you can activate SSL on Apache without having to change your application.
  • The Apache SSL implementation is faster compared to the Tomcat implementation (and a lot easier to configure!).
  • You can have a better control of HTTP headers. No need to develop any servlet filter for that.
  • You can get compression out of the box. No need to develop another servlet filter either (no need to configure Tomcat connector either!).

I assume here that the Apache server is already installed with the following modules and these modules are enabled.

jk headers expires ssl deflate rewrite

If they are not enabled, you can enable them using the command:

sudo a2enmod jk

Step 1: Explode your Web or J2EE application

For Apache to serve the static files, it is necessary to have those files available in a directory that the Apache server can access. For this, explode your J2EE application (EAR file) and all the Web applications which have static files to be served by Apache. You will do this in a directory somewhere with one of the command:

mkdir ''myapplication''
cd myapplication && jar xf ../''myapplication''.ear
mv ''mywebapp''.war ''mywebapp''-new.war && mkdir ''mywebapp.war''
cd ''mywebapp''.war && jar xf ../''mywebapp''-new.war

If your EAR file contains a WAR file, you have to explode it as well (the static files to be served are there!). It is also a good practice to explode it in a directory having the same name as the WAR. Once everything is exploded, you can also configure JBoss to directly use the exploded directory (this will speed up JBoss startup significantly).

Step 2: Create a site configuration file

For good practices, you should write a configuration file that corresponds to the site that you are going to manage. This allows to enable or not a server configuration which will be useful during the maintenance. For this, create a file in /etc/apache2/sites-available and put an initial content (replace myserver.mydomain.com with your server name and server-installation-dir with the path of your installation directory):

 <VirtualHost _default_:80>
       ServerAdmin webmaster@localhost
       ServerAlias ''myserver.mydomain''.com
       ServerName ''myserver.mydomain''.com
       DocumentRoot /''server-installation-dir''
       <Directory />
               Options FollowSymLinks
               AllowOverride None
       </Directory>
       ErrorLog /var/log/apache2/''myserver''-error.log
       LogLevel warn
       CustomLog /var/log/apache2/''myserver''-access.log combined
 </VirtualHost>

The server-installation-dir should point to the WAR exploded directory.

It is also a good practice to use a specific log file for each server (virtual host) that you configure in Apache. Restrict the number of options to the minimum so that you do not activate an option that could compromise the security and also to keep the configuration understandable and manageable.

You may find additional information about virtual hosts on Apache Virtual Host documentation.

Step 3: Configure Apache mod_jk

The Apache server can redirect requests to JBoss/Tomcat by using the mod_jk module (jk). Edit the file /etc/apache2/mods-available/jk.load and define the following properties:

 
JkWorkersFile /etc/apache2/worker.properties
JkShmFile     /var/log/apache2/mod_jk.shm
JkLogFile     /var/log/apache2/mod_jk.log
JkLogLevel    info
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "

Write a /etc/apache2/worker.properties file with the following example (be sure to replace some of the paths):

workers.tomcat_home=''directory where Tomcat is installed''
workers.java_home=''directory of the JDK installation home'' 
ps=/
worker.list=workerName
worker.workerName.port=8009
worker.workerName.host=localhost
worker.workerName.type=ajp13
worker.wokerName.lbfactor=1

The workerName is the name you are going to use within your site configuration file to tell Apache to which JBoss/Tomcat the requests are going to be forwarded.

You may find additional information on The Apache Tomcat Connector - Webserver HowTo

Step 4: Configure mod_jk in your site configuration file

Now that mod_jk is configured, you have to setup your site configuration file to redirect some of your URLs to your JBoss/Tomcat server through the AJP connector. This is done by the JkMount and JkUnMount directives. For this, add the following lines:

<VirtualHost _default_:80>
 ....
 JkMount /''mywebapp''/* ''workerName''
 JkOptions +ForwardURICompat
</VirtualHost>

where mywebap is the Web application context of your Web application when it is running. All request to /mywebapp will be redirected to JBoss/Tomcat. If Apache has to serve static files located in the same context, you have to use:

   JkUnMount /mywebapp/*.css workerName
   JkUnMount /mywebapp/*.js workerName
   JkUnMount /mywebapp/*.html workerName
   JkUnMount /mywebapp/*.png workerName

The Javascript, CSS, images and HTML files will not be served by JBoss/Tomcat because the JK connector is not activated for these links.

You could also only mount the dynamic files that your Web application is serving (like *.jsp, *.do, *.jsf or *.seam). This may not be the best solution if your Web application has specific servlets that are mapped without any extension (like an XMLRPC servlet, a Seam resource servlet and others). This is why, it is best to mount everything to your Web application and then manually specify what is static and served by Apache directly. This will prevent you from big surprises!

Step 5: Configure caching and compression

Compression can be activated easily by adding the following line at the top of your site configuration file (see the deflate module):

       AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/x-javascript

Browser caching is activated and controlled by the expires module in Apache. You may add the following options within the <Directory> section controlling the static files.

       # enable expirations
       ExpiresActive On
       # Activate the browser caching (CSS, images and scripts should not
       # change)
       ExpiresByType text/css A1296000
       ExpiresByType image/png A1296000
       ExpiresByType image/gif A1296000
       ExpiresByType image/jpg A1296000
       ExpiresByType text/javascript A1296000

You may find additional information on Apache Module mod_expires and Apache Module mod_deflate.

Step 6: Harden you Apache configuration

JBoss can add headers in the HTTP response. The X-Powered-By header exposes what implementation is behind your site. This header is created by a servlet filter that is activated by default in JBoss web configuration files (server/default/deploy/jbossweb-tomcat55.sar/conf/web.xml). You can either disable this filter by commenting the following lines:

  <!-- <filter-mapping>
     <filter-name>CommonHeadersFilter</filter-name>
     <url-pattern>/*</url-pattern>
  </filter-mapping> -->

If you cannot change this, don't worry! The Apache server can remove those headers for you. Just add the following directives in your site configuration file:

   # For security reasons, do not expose who serves the page
   <LocationMatch '^/''mywebap''/.*'>
       Header unset 'X-Powered-By'
   </LocationMatch>

Removing this header is also good for performance as it reduces the size of responses. You may have a look at the Apache documentation Apache Module mod_headers.

By default, the Apache server sends a complete signature in the Server header response. You should verify the /etc/apache2/apache2.conf file and make sure you have the following options:

ServerSignature Off
ServerTokens Prod

To verify that your server generates the good response headers, you may use the wget -S command or Firebug to look at those headers.