A Dynamic Initialization Deep-Dive: Abusing Initialization Side Effects

I recently ended up building a configuration system for a piece of C++ software I’m maintaining. One feature I wanted was: I want to just create new configuration items at arbitrary locations in my code without “registering” them somewhere centrally. The config system should just “pick them up” on startup.

One prominent example of a library using a similar functionality is Google Test, Google’s testing framework. In Google Test, you can just write something like

1TEST(TestSuiteName, TestName) {
2  ... test body ...
3}

somewhere is your code, and when Google Test’s main function runs, it picks up on your test and runs it.

How does one achieve that?

On a more abstract level, one needs to run a function with side effects before one’s main() runs. The side effect can then be used to register the test function or the config item in some central data structure.

So the question boils down to: How can we run some code before main()?

Initialization Primer

I’m not sure if this is the only trick C++ offers us, but it’s the trick Google Test uses: When you have some global variable in your code, you can (and should!) initialize it, i.e., assign a value that the global variable will have when your program starts. In its simplest form, this can look like this:

1namespace whatever {
2  int64_t simpleGlobal = 42; // initialized to 42
3}

For the global variable simpleGlobal, your compiler will usually choose static initialization. This means that the compiler will add space for the variable (8 bytes in this case) to the .data segment of the resulting file (an ELF binary). When your OS loads an ELF file, the .data segment is loaded into memory as-is. So, if the compiler writes the value 42 into the space in the .data segment used for simpleGlobal, the variable is automatically initialized to the correct value.

However, static initialization is not our only option. We can also write something like this:

1namespace whatever {
2  int64_t someFunction(); // defined in some other translation unit
3
4  int64_t dynamicGlobal = someFunction();
5}

In this case, the initial value for dynamicGlobal is determined by executing someFunction(). If we make someFunction() complex enough it is clear that the compiler cannot evaluate it at compile or link time, and instead it somehow needs to be executed before main(). There is no rule that someFunction() has to be side-effect free! Can we use that side effect to register our test / config item / whatever?

Dynamic Initialization

The answer is: probably. The C++ standard states in basic.start.dynamic/5 (slightly simplified):

It is implementation-defined whether the dynamic initialization of a […] variable with static storage duration is sequenced before the first statement of main or is deferred. If it is deferred, it strongly happens before any […] odr-use of any […] function or […] variable defined in the same translation unit as the variable to be initialized.

So basically the standard explicitly says that you may not rely on the side effects of someFunction already being visible when main() starts. In fact, the side effects may never occur if you have no ODR-use of anything in the respective TU!

The fact that Google Test uses something very similar1 makes one hopeful that this trick is pretty portable, even if it’s not guaranteed by the standard. Google Test advertises platform support according to Google’s “foundational C++” rules, which would mean pretty widespread support.

What’s Going On Under the Hood?

In this section I’m going to take a closer look at how the dynamic initialization is achieved in practice. The details of this will vary depending on compiler, standard library implementation, CPU architecture and probably also linker. I’m using GCC 14.2.0, glibc 2.40-1ubuntu3.1, libstdc++ 14.2.0-4ubuntu on an x86_64 system.

This is my example “project”, where the side effect of the dynamic initialization function is just setting a global variable called global to the value 42:

Let this be main.cpp:

1#include <iostream>
2
3int global = 0;
4
5int main() {
6  std::cout << "Global value: " << global << "\n";
7}

And this a second translation unit other.cpp:

1extern int global;
2
3bool setGlobal() {
4  global = 23;
5  return true;
6}
7
8bool dummy = setGlobal();

If you build and execute this, it should output:

1> ./example
2Global value: 23

So setGlobal did indeed run before main(). How? Let’s fire up gdb and set a breakpoint in setGlobal():

 1gdb ./example
 2GNU gdb (Ubuntu 15.1-1ubuntu1~24.04) 15.1
 3 4Reading symbols from ./example...
 5(gdb) break setGlobal()
 6Breakpoint 1 at 0x1184: file …/other.cpp, line 4.
 7(gdb) r
 8Starting program: …/example
 9[Thread debugging using libthread_db enabled]
10Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
11
12Breakpoint 1, setGlobal () at …/other.cpp:4
134           global = 23;
14(gdb) bt
15#0  setGlobal () at ~/some_dir/other.cpp:4
16#1  0x000055555555518f in __static_initialization_and_destruction_0 ()
17    at ~/some_dir/other.cpp:8
18#2  0x00005555555551a5 in _GLOBAL__sub_I__Z9setGlobalv ()
19    at ~/some_dir/other.cpp:8
20#3  0x00007ffff782a4f4 in call_init (argc=1, argv=0x7fffffffde28, env=<optimized out>)
21    at ../csu/libc-start.c:145
22#4  __libc_start_main_impl (main=0x5555555551a7 <main()>, argc=1, argv=0x7fffffffde28,
23    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
24    stack_end=0x7fffffffde18) at ../csu/libc-start.c:347
25#5  0x00005555555550a5 in _start ()
26(gdb)

We see that the chain of calls reaching our setGlobal() is:

  • _start()
  • __libc_start_main_impl()
  • call_init()
  • _GLOBAL__sub_I__Z9setGlobalv()
  • __static_initialization_and_destruction_0()

For the two topmost stack frames (__static_initialization_and_destruction_0() and _GLOBAL__sub_I__Z9setGlobalv()()), gdb claims that these originate from our own code (other.cpp:8). This tells us that these two stack frames do not correspond to functions in libraries, but that they are somehow generated by the compiler for our variable initialization.

The two frames below that (call_init() and __libc_start_main_impl()) point to csu/libc-start.c , which is part of the GNU C Library.

Below that, there is a mysterious _start , which has no origin at all.

_start

Let’s start with that _start stack frame. Even though gdb doesn’t tell us so, this function actually also comes from the GNU C Library.

In our built executable, it looks like this (as generated by objdump -Crawd ./example):

 10000000000001080 <_start>:
 2    1080:	f3 0f 1e fa          	endbr64
 3    1084:	31 ed                	xor    %ebp,%ebp
 4    1086:	49 89 d1             	mov    %rdx,%r9
 5    1089:	5e                   	pop    %rsi
 6    108a:	48 89 e2             	mov    %rsp,%rdx
 7    108d:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
 8    1091:	50                   	push   %rax
 9    1092:	54                   	push   %rsp
10    1093:	45 31 c0             	xor    %r8d,%r8d
11    1096:	31 c9                	xor    %ecx,%ecx
12    1098:	48 8d 3d 08 01 00 00 	lea    0x108(%rip),%rdi        # 11a7 <main>
13    109f:	ff 15 3b 2f 00 00    	call   *0x2f3b(%rip)        # 3fe0 <__libc_start_main@GLIBC_2.34>
14    10a5:	f4                   	hlt
15    10a6:	66 2e 0f 1f 84 00 00 00 00 00 	cs nopw 0x0(%rax,%rax,1)

We see the call to __libc_start_main_impl in line 13. Don’t be confused by the reference to main in the line above that - that’s not a call to main(). That instruction (lea - load effective address) loads the address of the main() function into the %rdi register, which is used for passing an argument to the function called in the following call instruction. Thus, __libc_start_main_impl takes the address of the main() function to execute as a parameter.

__libc_start_main

To find __libc_start_main (resp. __libc_start_main_impl) in libc-start.c , one needs to chase a couple of #define and then ends up here:

 1
 2/* Note: The init and fini parameters are no longer used.  fini is
 3    completely unused, init is still called if not NULL, but the
 4    current startup code always passes NULL.  (In the future, it would
 5    be possible to use fini to pass a version code if init is NULL, to
 6    indicate the link-time glibc without introducing a hard
 7    incompatibility for new programs with older glibc versions.)
 8
 9    For dynamically linked executables, the dynamic segment is used to
10    locate constructors and destructors.  For statically linked
11    executables, the relevant symbols are access directly.  */
12STATIC int
13LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
14                 int argc, char **argv,
15#ifdef LIBC_START_MAIN_AUXVEC_ARG
16                 ElfW(auxv_t) *auxvec,
17#endif
18                 __typeof (main) init,
19                 void (*fini) (void),
20                 void (*rtld_fini) (void), void *stack_end)
21{
22

The comment is interesting: It says that historically, the init parameter was used to pass initialization routines2, but today “the dynamic segment” is used to locate “constructors and destructors”.

The body of __libc_start_main does a lot of initialization and then calls call_init():

 1
 2  /* Call the initializer of the program, if any.  */
 3#ifdef SHARED
 4
 5  if (init != NULL)
 6    /* This is a legacy program which supplied its own init
 7       routine.  */
 8    (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
 9  else
10    /* This is a current program.  Use the dynamic segment to find
11       constructors.  */
12    call_init (argc, argv, __environ);
13
14#else /* !SHARED */
15  call_init (argc, argv, __environ);
16
17#endif

call_init()

Here is a slightly shortened (and reformatted…) version of call_init(), also from libc-start.c:

 1/* Initialization for dynamic executables.  Find the main executable
 2   link map and run its init functions.  */
 3static void
 4call_init (int argc, char **argv, char **env)
 5{
 6  /* Obtain the main map of the executable.  */
 7  struct link_map *l = GL(dl_ns)[LM_ID_BASE]._ns_loaded;
 8
 9  
10
11  ElfW(Dyn) *init_array = l->l_info[DT_INIT_ARRAY];
12  if (init_array != NULL) {
13    unsigned int jm = l->l_info[DT_INIT_ARRAYSZ]->d_un.d_val / sizeof (ElfW(Addr));
14
15    ElfW(Addr) *addrs = (void *) (init_array->d_un.d_ptr + l->l_addr);
16    for (unsigned int j = 0; j < jm; ++j) {
17      ((dl_init_t) addrs[j]) (argc, argv, env);
18    }
19  }
20}

Lines 7 and 11 access the “link map” of the executable. The “link map” is a data structure created by the dynamic linker when the executable is loaded into memory. It basically contains information about where in memory the different sections of the loaded ELF file ended up. You can find it here in glibc.

Line 11 uses the DT_INIT_ARRAY tag into the link map to access the address of .init_array section of the ELF file (where it was loaded into memory), and line 13 uses the DT_INIT_ARRAYSZ tag (SZ for “size”…) to determine the number of entries in the .init_array section.

The content of the .init_array section is a list of addresses of functions to be called for initialization - that’s what lines 15 to 18 do. Each of these initialization functions gets argc, argv and a map of the environment variables passed in.

Let’s look at the contents of the .init_array in our example:

1> objdump -s -j .init_array ./example
2
3./example:     file format elf64-x86-64
4
5Contents of section .init_array:
6 3d98 60110000 00000000 98110000 00000000  `...............

The .init_section apparently contains two 64-bit values, i.e., two addresses. Converting from little-endian, this tells us that two functions at address 0x1160 and 0x1198 should be called. Let’s have a quick look at the next function on our call stack, _GLOBAL__sub_I__Z9setGlobalv:

1> objdump -Cdwr ./example
230000000000001198 <_GLOBAL__sub_I__Z9setGlobalv>:
4    1198:	f3 0f 1e fa          	endbr64
5    119c:	55                   	push   %rbp
6

It’s at address 0x1198! So now we know how it ends up being called from call_init().

The other function at 0x1160 is an (indirect) call to register_tm_clones, which is a setup function for some inner mechanics of the (not yet working, I think?) GNU Transactional Memory Library. We can ignore that.

_GLOBAL__sub_I__Z9setGlobalv and __static_initialization_and_destruction_0()

Let’s take a closer look at _GLOBAL__sub_I__Z9setGlobalv:

10000000000001198 <_GLOBAL__sub_I__Z9setGlobalv>:
2    1198:	f3 0f 1e fa          	endbr64
3    119c:	55                   	push   %rbp
4    119d:	48 89 e5             	mov    %rsp,%rbp
5    11a0:	e8 dd ff ff ff       	call   1182 <__static_initialization_and_destruction_0()>
6    11a5:	5d                   	pop    %rbp
7    11a6:	c3                   	ret

That’s simple enough. Nothing going on here, just forwarding the call to __static_initialization_and_destruction_0().

And that function look like this:

10000000000001182 <__static_initialization_and_destruction_0()>:
2    1182:	f3 0f 1e fa          	endbr64
3    1186:	55                   	push   %rbp
4    1187:	48 89 e5             	mov    %rsp,%rbp
5    118a:	e8 da ff ff ff       	call   1169 <setGlobal()>
6    118f:	88 05 bc 2f 00 00    	mov    %al,0x2fbc(%rip)        # 4151 <dummy>
7    1195:	90                   	nop
8    1196:	5d                   	pop    %rbp
9    1197:	c3                   	ret

Again not much going on, basically only the call to setGlobal().

I assume that these two compiler-generated functions will look more complicated if you have multiple functions running for dynamic initialization.

So, is this safe?

So, can we (ab)use the side effects of dynamic initialization, e.g. for “plugin registration” the way Google Test does?

My understanding is that if we use it in the way we want to use it (say a TU which is never called into from main(), but which contains a call like Google Test’s TEST(…) {…}), the standard gives us zero guarantees at which point these side effects run, or whether they run at all. If the compiler (or rather: linker) detects that no symbols from a translation unit are ODR-used at all, it may optimize everything from that TU away, I think.

Aside from optimizing away the call, the other wrench the compiler could throw into our works is deferred dynamic initialization, i.e., deferring the calls until the first ODR-use of anything in the same TU, instead of running them in the _start phase. However I fail to see how this could be achieved without massive overhead. The only way I see would be to generate a function that runs before every ODR-use of any symbol of a TU and which makes sure that all dynamic initialization from the TU has run. That seems insane.

So, in conclusion: While removing all dynamic initialization of a TU when the linker can prove that nothing of the TU is ever ODR-used may seem realistic, my guess would be that such an optimization would break a lot of things, Google Test amongst other things. So my guess would be that it’s pretty safe to depend on this behavior.


  1. Actually, they use dynamic initialization of a static class member variable instead of a namespace-scope variable, see this code↩︎

  2. I did not check, but I assume that this was essentially the .init section of the ELF file containing the executable… As far as I can tell, the Glibc hasn’t used the .init section since 1999. ↩︎

Comments

You can use your Mastodon account to reply to this post.

Reply to tinloaf's post

With an account on the Fediverse or Mastodon, you can respond to this post. Since Mastodon is decentralized, you can use your existing account hosted by another Mastodon server or compatible platform if you don't have an account on this one.

Copy and paste this URL into the search field of your favourite Fediverse app or the web interface of your Mastodon server.