Ada BFD 1.3.0

By Stephane Carrez

Ada BFD is an Ada binding for the GNU Binutils BFD library. It allows to read binary ELF, COFF files by using the GNU BFD and allows your program to read ELF sections, get access to the symbol table and use the disassembler. The new version fixes compilation issues with recent GNU Binutils versions (starting with 2.39) and fixes loading the mini-symbol table of shared libraries.

Ada/ada-bfd-1.3.jpg

Integration with Alire

For Linux users only, the Ada BFD has an associated Alire crate which allows you to use it easily. To get access to the Alire crate, you should add the AWA Alire index in your Alire configuration as follows:

alr index --add=https://github.com/stcarrez/awa-alire-index.git --name awa

Then, you can get access to the crate by using

alr with bfdada

Let's see how to use this library...

Declarations

The Ada BFD library provides a set of Ada bindings that give access to the BFD library. A binary file such as an object file, an executable or an archive is represented by the Bfd.Files.File_Type limited type. The symbol table is represented by the Bfd.Symbols.Symbol_Table limited type. These two types hold internal data used and managed by the BFD library.

with Bfd.Files;
with Bfd.Sections;
with Bfd.Symbols;
...
  File    : Bfd.Files.File_Type;
  Symbols : Bfd.Symbols.Symbol_Table;

Opening the BFD file

The first step is to use the Open procedure to read the object or executable file whose path is given as argument. The File_Type parameter will be initialized to get access to the binary file information. The Check_Format function must then be called to let the BFD library gather the file format information and verify that it is an object file or an executable.

Bfd.Files.Open (File, Path, "");
if Bfd.Files.Check_Format (File, Bfd.Files.OBJECT) then
    ...
end if;

The File_Type uses finalization so that it will close and reclaim resources automatically.

Loading the symbol table

The symbol table is loaded by using the Read_Symbols procedure.

   Bfd.Symbols.Read_Symbols (File, Symbols);

The resources used by the symbol table will be freed when the symbol table instance is finalized.

Find nearest line

Once the symbol table is loaded, we can use the Find_Nearest_Line function to find the nearest line of a function knowing some address. This is almost a part of that function that the addr2line (1) command is using.

File_Name, Func_Name : Ada.Strings.Unbounded.Unbounded_String;
Text_Section : Bfd.Sections.Section;
Line : Natural;
Pc : constant Bfd.Vma_Type := ...;
...
   Text_Section := Bfd.Sections.Find_Section (File, ".text");
   Bfd.Symbols.Find_Nearest_Line (File    => File,
                                  Sec     => Text_Section,
                                  Symbols => Symbols,
                                  Addr    => Pc,
                                  Name    => File_Name,
                                  Func    => Func_Name,
                                  Line    => Line);

One tricky aspect of using Find_Nearest_Line is the fact that the address we are giving must sometimes be converted to an offset within the text region. With Address space layout randomization (ASLR) a program is mapped at a random address when it executes. Before calling Find_Nearest_Line, we must subtract the base address of the memory region. We must now find the virtual address of the start of the text region that is mapped in memory. While the program is running, you can find the base address of the program by looking at the /proc/self/maps file. This special file indicates the list of memory regions used by the process with the addresses, flags and other information. Without ASLR, the program is almost always loaded at the 0x00400000 address.

00400000-007f9000 r-xp 00000000 fd:01 12067645          /home/...
009f8000-009fa000 r--p 003f8000 fd:01 12067645          /home/...
009fa000-00a01000 rw-p 003fa000 fd:01 12067645          /home/...

But when it is mapped at a random address, we get a different address each time the program is launched:

55d5983d9000-55d598592000 r--p 00000000 fc:02 1573554   /...
55d598592000-55d599376000 r-xp 001b9000 fc:02 1573554   /...
55d599376000-55d5997ed000 r--p 00f9d000 fc:02 1573554   /...
55d5997ee000-55d5998bb000 r--p 01414000 fc:02 1573554   /...
55d5998bb000-55d5998c6000 rw-p 014e1000 fc:02 1573554   /...

In that case, the value to use it the first address of first r--p region associated with the program (here 0x55d5983d9000).

Another method to know the virtual base address is to use the dl_iterate_phdr (3) function and look at the shared objects which are loaded. This function must be executed by the program itself: it gets as parameter a callback function which is called for each loaded shared object and a data parameter that will be passed to the callback.

#include <dlfcn.h>
static int dl_callback (struct dl_phdr_info* info, size_t size, void* data) {

  /* VM base address is: info->dlpi_addr */
  return 0;
}
...
   dl_iterate_phdr (dl_callback, 0);

When the callback is called, you can get the name of the shared object by looking at info->dlpi_name and the virtual base address by looking at info->dlpi_addr.

Ada BFD is a very specific library that is not always easy to use due to the complexity of binary program representation (ELF, DWARF, ...) and program execution. It is however used in very specific contexts such as the Muen Separation Kernel and the Memory Analysis Tool.

Add a comment

To add a comment, you must be connected. Login