The perils of transition to 64-bit time_t

(please note that there’s a correction at the bottom)

In the Overview of cross-architecture portability problems, I have dedicated a section to the problems resulting from use of 32-bit time_t type. This design decision, still affecting Gentoo systems using glibc, means that 32-bit applications will suddenly start failing in horrible ways in 2038: they will be getting -1 error instead of the current time, they won’t be able to stat() files. In one word: complete mayhem will emerge.

There is a general agreement that the way forward is to change time_t to a 64-bit type. Musl has already switched to that, glibc supports it as an option. A number of other distributions such as Debian have taken the leap and switched. Unfortunately, source-based distributions such as Gentoo don’t have it that easy. So we are still debating the issue and experimenting, trying to figure out a maximally safe upgrade path for our users.

Unfortunately, that’s nowhere near trivial. Above all, we are talking about a breaking ABI change. It’s all-or-nothing. If a library uses time_t in its API, everything linking to it needs to use the same type width. In this post, I’d like to explore the issue in detail — why is it so bad, and what we can do to make it safer.

Going back to Large File Support

Before we get into the time64 change, as I’m going to shortly call it, we need to go back in history a bit and consider another similar problem: Large File Support.

Long story short, originally 32-bit architectures specify two important file-related types that were 32 bits wide: off_t used to specify file offsets (signed to support relative offsets) and ino_t used to specify inode numbers. This had two implications: you couldn’t open files larger than 2 GiB, and you couldn’t open files whose inode numbers exceeded 32-bit unsigned integer range.

To resolve this problem, Large File Support was introduced. It involved replacing these two types with 64-bit variants, and on glibc it is still optional today. In its case, we didn’t take the leap and transitioned globally. Instead, packages generally started enabling LFS support upstream — also taking care to resolve any ABI breakage in the process. While many packages did that, we shouldn’t consider the problem solved.

The important point here is that time64 support in glibc requires LFS to be used. This makes sense — if we are going to break stuff, we may as well solve both problems.

What ABIs are we talking about?

To put it simply, we have three possible sub-ABIs here:

  1. the original ABI with 32-bit types,
  2. LFS: 64-bit off_t and ino_t, 32-bit time_t,
  3. time64: LFS + 64-bit time_t.

What’s important here is that a single glibc build remains compatible with all three variants. However, libraries that use these types in their API are not.

Today, 32-bit systems roughly use a mix of the first and second ABI — the latter including packages that enabled LFS explicitly. For the future, our goal is to focus on the third option. We are not concerned about providing full-LFS systems with 32-bit time_t.

Why the ABI change is so bad?

Now, the big deal is that we are replacing a 32-bit type with a 64-bit type, in place. Unlike with LFS, glibc does not provide any transitional API that could be used to enable new functions while preserving backwards compatibility — it’s all-or-nothing.

Let’s consider structures. If a structure contains time_t with its natural 32-bit alignment, then there’s no padding for the type to extend to. Inevitable, all fields will have to shift to make room for the new type. Let’s consider a trivial example:

struct {
    int a;
    time_t b;
    int c;
};

With 32-bit time_t, the offset of c is 8. With the 64-bit type, it’s 16. If you mix binaries using different time_t width, they’re inevitably are going to read or write the wrong fields! Or perhaps even read or write out of bounds!

Let’s just look at the size of struct stat, as an example of structure that uses both file and time-related types. On plain 32-bit x86 glibc it’s 88 byte long. With LFS, it’s 96 byte long (size and inode number fields are expanded). With LFS + time64, it’s 108 byte long (three timestamps are expanded).

However, you don’t even need to use structures. After all, we are talking about x86 where function parameters are passed on stack. If one of the parameters is time_t, then positions of all parameters on stack change, and we find ourselves seeing the exact same problem! Consider the following prototype:

extern void foo(int a, time_t b, int c);

Let’s say we’re calling it as foo(1, 2, 3). With 32-bit types, the call looks like the following:

	pushl	$3
	pushl	$2
	pushl	$1
	call	foo@PLT

However, with 64-bit time_t, it changes to:

	pushl	$3
	pushl	$0
	pushl	$2
	pushl	$1
	call	foo@PLT

An additional 32-bit value (zero) is pushed between the “old” b and c. Once again, if we mix both kinds of binaries, they are going to fail to read the parameters correctly!

So yeah, it’s a big deal. And right now, there are no real protections in place to prevent mixing these ABIs. So what you actually may get is runtime breakage, potentially going as far as to create security issues.

You don’t have to take my word for it. You can reproduce it yourself on x86/amd64 easily enough. Let’s take the more likely case of a time32 program linked against a library that has been rebuilt for time64:

$ cat >libfoo.c <<EOF
#include <stdio.h>
#include <time.h>

void foo(int a, time_t b, int *c) {
   printf("a = %d\n", a);
   printf("b = %lld", (long long) b);
   printf("%s", ctime(&b));
   printf("c = %d\n", *c);
}
EOF
$ cat >foo.c <<EOF
#include <stddef.h>
#include <time.h>

extern void foo(int a, time_t b, int *c);

int main() {
    int three = 3;
    foo(1, time(NULL), &three);
    return 0;
}
EOF
$ cc -m32 libfoo.c -shared -o libfoo.so
$ cc -m32 foo.c -o foo -Wl,-rpath,. libfoo.so
$ ./foo
a = 1
b = 1727154919
Tue Sep 24 07:15:19 2024
c = 3
$ cc -m32 -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 \
  libfoo.c -shared -o libfoo.so
$ ./foo 
a = 1
b = -34556652301432063
Thu Jul 20 06:16:17 -1095054749
c = 771539841

On top of that, the source-first nature of Gentoo amplifies these problems. An average binary distribution rebuilds all binary packages — and then the user upgrades the system in a single, relatively atomic step. Sure, if someone uses third-party repositories or has locally built programs that link to system libraries, problems can emerge but the process is relatively safe.

On the other hand, in Gentoo we are talking about rebuilding @world while breaking ABI in place. For a start, we are talking around prolonged periods of time between two packages being rebuilt when they would actually be mixing incompatible ABI. Then, there is a fair risk that some rebuild will fail and leave your system half-transitioned with no easy way out. Then, there is a real risk that cyclic dependencies will actually make rebuild impossible — rebuilding a dependency will break build-time tools, preventing stuff from being rebuilt. It’s a true horror.

What can we do to make it safer?

Our deliberations currently revolve about three ideas, that are semi-related, though not inevitably dependent one upon another:

  1. Changing the platform tuple (CHOST) for the new ABIs, to clearly distinguish them from the baseline 32-bit ABI.
  2. Changing the libdir for the new ABIs, effectively permitting the rebuilt libraries to be installed independently of the original versions.
  3. Introducing an binary-level ABI distinction that could prevent binaries using different sub-ABI to be linked to one another.

The subsequent sections will focus on each of these changes in detail. Note that all the values used there are just examples, and not necessarily the strings used in a final solution.

The platform tuple change

The platform tuple (generally referenced through the CHOST variable) identifies the platform targeted by the toolchain. For example, it is used as a part of GCC/binutils install paths, effectively allowing toolchains for multiple targets to be installed simultaneously. In clang, it can be used to switch between supported cross-compilation targets, and can control the defaults to match the specified ABI. In Gentoo, it is also used to uniquely identify ABIs for the purpose of multilib support. Because of that, we require that no two co-installable ABIs share the same tuple.

A tuple consists of four parts, separated by hyphens: architecture, vendor, operating system and libc. Of these, vendor is generally freeform but the other three are restricted to some degree. A few semi-equivalent examples of tuples used for 32-bit x86 platform include:

i386-pc-linux-gnu
i686-pc-linux-gnu
i686-unknown-linux-gnu

Historically, two approaches were used to introduce new ABIs. Either the vendor field was changed, or an additional ABI specification was appended to the libc field. For example, Gentoo historically used two different kind of tuples for ARM ABIs with hardware floating-point unit:

armv7a-hardfloat-linux-gnueabi
armv7a-unknown-linux-gnueabihf

The former approach was used earlier, to avoid incompatibility problems resulting from altering other tuple fields. However, as these were fixed and upstreams normalized on the latter solution, Gentoo followed suit.

Similarly, the discussion of time64 ABIs resurfaced the same dilemma: should we just “abuse” the vendor field for this, or instead change libc field and fix packages? The main difference is that the former is “cleaner” as a downstream solution limited to Gentoo, while the latter generally opens up discussions about interoperability. Therefore, the options look like:

i686-gentoo_t64-linux-gnu
i686-pc-linux-gnut64
armv7a-gentoo_t64-linux-gnueabihf
armv7a-unknown-linux-gnueabihft64

Fortunately, changing the tuple should not require much patching. The GNU toolchain and GNU build system both ignore everything following “gnu” in the libc field. Clang will require patching — but upstream is likely to accept our patches, and we will want to make patches anyway, as they will permit clang to automatically choose the right ABI based on the tuple.

The libdir change

The term “libdir” refers to the base name of the library install directory. Having different libdirs, and therefore separate library install directories, makes it possible to build multilib systems, i.e. installing multiple ABI variations of libraries on a single system, and making it possible to run executables for different ABIs. For example, this is what makes it possible to run 32-bit x86 executables on amd64 systems.

The libdir values are generally specified in the ABI. Naturally, the baseline value is plain lib. As a historical convention (since 32-bit architectures were first), usually 32-bit platforms (arm, ppc, x86) use lib, whereas their more modern 64-bit counterparts (amd64, arm64, ppc64) use lib64 — even if a particular architecture never really supported multilib on Gentoo.

Architectures that support multiple ABIs also define different libdirs. For example, the additional x32 ABI on x86 uses libx32. MIPS n32 ABI uses lib32 (with plain lib defining the o32 ABI).

Now, we are considering changing the libdir value for time64 variants of 32-bit ABIs, for example from lib to libt64. This would make it possible to install the rebuilt libraries separately from the old libraries, effectively bringing three advantages:

  1. reducing the risk of time64 executables accidentally linking to time32 libraries,
  2. enabling Portage’s preserved-libs feature to preserve time32 libraries once the respective packages have been rebuilt for time64, and before their reverse dependencies have been rebuilt,
  3. optionally, making it possible to use a time32 + time64 multilib profiles, that could be used to preserve compatibility with prebuilt time32 applications linking to system libraries.

In my opinion, the second point is a killer feature. As I’ve mentioned before, we are talking about the kind of migration that would break executables for a prolonged time on production systems, and possibly break build-time tools, preventing the rebuild from proceeding further. By preserving original libraries, we are minimizing the risk of actual breakage, since the existing executables will keep using the time32 libraries until they are rebuilt and linked to the time64 libraries.

The libdir change is definitely going to require some toolchain patching. We may want to also consider special-casing glibc, as the same set of glibc libraries is valid for all of the sub-ABIs we were considering. However, we will probably want a separate ld.so executable, as it would need to load libraries from the correct libdir, and then we will want to set .interp in time64 executables to reference the time64 ld.so.

Note that due to how multilib is designed in Gentoo, a proper multilib support for this (i.e. the third point) requires a unique platform tuple for the ABI as well — so that specific aspect is dependent on the tuple change.

Ensuring binary incompatibility

In general, you can’t mix binaries using different ABIs. For example, if you try to link a 64-bit program to a 32-bit library, the linker will object:

$ cc foo.c libfoo.so 
/usr/lib/gcc/x86_64-pc-linux-gnu/14/../../../../x86_64-pc-linux-gnu/bin/ld: libfoo.so: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status

Similarly, the dynamic loader will refuse to use a 32-bit library with 64-bit program:

$ ./foo 
./foo: error while loading shared libraries: libfoo.so: wrong ELF class: ELFCLASS32

There are a few mechanisms that are used for this. As demonstrated above, architectures with 32-bit and 64-bit ABIs use two distinct ELF classes (ELFCLASS32 and ELFCLASS64). Additionally, some architectures use different machine identifiers (EM_386 vs. EM_X86_64, EM_PPC vs. EM_PPC64). The x32 bit ABI on x86 “abuses” this by declaring its binaries as ELFCLASS32 + EM_X86_64 (and therefore distinct from ELFCLASS32 + EM_386 and from ELFCLASS64 + EM_X86_64).

Both ARM and MIPS use the flags field (it is a bit-field with architecture-specific flags) to distinguish different ABIs (hardfloat vs. softfloat, n32 ABI on MIPS…). Additionally, both feature a dedicated attribute section — and again, the linker refuses to link incompatible object files.

It may be desirable to implement a similar mechanism for time32 and time64 systems. Unfortunately, it’s not a trivial task. It doesn’t seem that there is a reusable generic mechanism that could be used for that. On top of that, we need a solution that would fit a fair number of different architectures. It seems that the most reasonable solution right now would be to add a new ELF note section dedicated to this feature, and implement complete toolchain support for it.

However, whatever we decide to do, we need to take into consideration that the user may want to disable it. Particularly, there is a fair number of prebuilt software that have no sources available, and it may continue working correctly against system libs, provided it does not call into any API using time_t. The cure of unconditionally preventing them from working might be worse than the disease.

On the bright side, it should be possible to create a non-fatal QA check for this without much hacking, provided that we go with separate libdirs. We can distinguish time64 executables by their .interp section, pointing to the dynamic loader in the appropriate libdir, and then verify that time32 programs will not load any libraries from libt64, and that time64 programs will not load any libraries directly from lib.

What about old prebuilt applications?

So far we were concerned about packages that are building from sources. However, there is still a fair number of old applications, usually proprietary, that are available only as prebuilt binaries — particularly for x86 and PowerPC architectures. These packages are going to face two problems: firstly, compatibility issues with system libraries, and secondly, the y2k38 problem itself.

For the compatibility problem, we have a reasonably good solution already. Since we already had to make them work on amd64, we have a multilib layout in place, along with necessary machinery to build multiple library versions. In fact, given that the primary purpose of multilib is compatibility with old software, it’s not even clear if there is much of a point in switching amd64 multilib to use time64 for 32-bit binaries. Either way, we can easily extend our multilib machinery to distinguish the regular abi_x86_32 target from abi_x86_t64 (and we probably should do that anyway), and then create new multilib x86 profiles that would support both ABIs.

The second part is much harder. Obviously, as soon as we’re past the 2038 cutoff date, all 32-bit programs — using system libraries or not — will simply start failing in horrible ways. One possibility is to work with faketime to control the system clock. Another is to run a whole VM that’s moved back in time.

Summary

As 2038 is approaching, 32-bit applications exercising 32-bit time_t are up to stop working. At this point, it is pretty clear that the only way forward is to rebuild these applications with 64-bit time_t (and while at it, force LFS as well). Unfortunately, that’s not a trivial task since it involves an ABI change, and mixing time32 and time64 programs and libraries can lead to horrible runtime bugs.

While the exact details are still in the making, the proposed changes revolve around three ideas that can be implemented independently to some degree: changing the platform tuple (CHOST), changing libdir and preventing accidentally mixing time32 and time64 binaries.

The tuple change is mostly a more formal way of distinguishing builds for the regular time32 ABI (e.g. i686-pc-linux-gnu) from ones specifically targeting time64 (e.g. i686-pc-linux-gnut64). It should be relatively harmless and easy to carry out, with minimal amount of fixing necessary. For example, clang will need to be updated to accept new tuples.

The libdir change is probably the most important of all, as it permits a breakage-free transition, thanks to Portage’s preserved-libs feature. Long story short, time64 libraries get installed to a new libdir (e.g. libt64), and the original time32 libraries remain in lib until the applications using them are rebuilt. Unfortunately, it’s a bit harder to implement — it requires toolchain changes, and ensuring that all software correctly respects libdir. The extra difficulty is that with this change alone, the dynamic loader won’t ignore time32 libraries if e.g. -Wl,-rpath,/usr/lib is injected somewhere.

The incompatibility part is quite important, but also quite difficult. Ideally, we’d like to stop the linker from trying to accidentally link time32 libraries with time64 programs, and likewise the dynamic loader from trying to load them. Unfortunately, so far we weren’t able to come up with a realistic way of doing that, short of actually making some intrusive changes to the toolchain. On the positive side, writing a QA check to detect accidental mixing at build time shouldn’t be that hard.

Doing all three should enable us to provide a clean and relatively safe transition path for 32-bit Gentoo systems using glibc. However, these only solve problems for packages built from source. Prebuilt 32-bit applications, particularly proprietary software like old games, can’t be helped that way. And even if time64 changes won’t break them via breaking the ABI compatibility with system libraries, then year 2038 will. Unfortunately, there does not seem to be a good solution to that, short of actually running them with faked system time, one way or another.

Of course, all of this is still only a rough draft. A lot may still change, following experiments, discussion and patch submission.

Acknowledgements

I would like to thank the following people for proof-reading and suggestions, and for their overall work towards time64 support in Gentoo: Arsen Arsenović, Andreas K. Hüttel, Sam James and Alexander Monakov.

2024-09-30 correction

Unfortunately, my original ideas were too optimistic. I’ve entirely missed the fact that all libdirs are listed in ld.so.conf, and therefore we cannot rely on hardcoding the libdir path inside ld.so itself. In retrospect, I should have seen that coming — after all, we already adjust these paths for custom LLVM prefix, and that one would require special handling too.

This effectively means that the libdir change probably needs to depend on the binary incompatibility part. Overall, we need to meet three basic goals:

  1. The dynamic loader needs to be able to distinguish time32 and time64 binaries. For time32 programs, it needs to load only time32 libraries; for time64 programs, it needs to load only time64 libraries. In both cases, we need to assume that both kind of libraries will appear in path.
  2. For backwards compatibility, we need to assume that all binaries that do not have an explicit time64 marking are time32.
  3. Therefore, all newly built binaries must carry an explicit time64 marking. This includes binaries built by non-C environments, such as Rust, even if they do not interact with time_t ABI at all. Otherwise, these binaries would forever depend on time32 libraries.

Meeting all these goals is a lot of effort. None of the hacks we debated so far seem sufficient to achieve that, so we are probably talking about the level of effort on par with patching multiple toolchains for a variety of programming languages. Naturally, this is not something we can carry locally in Gentoo, so it also requires cooperation from multiple parties. All that for architectures that are largely considered legacy, and sometimes not even really supported anymore.

Of course, another problem is whether these other toolchains are actually going to produce correct time64 executables. After all, unless they are specifically adapted to respect _TIME_BITS the way C programs do, they are probably going to hardcode specific time_t width, and break horribly when it changes. However, that’s really an upstream problem to solve, and tangential to the issues we are discussing here.

On top of that, we are talking of a major incompatibility. All binaries that aren’t explicitly marked as time64 are going to use time32 libraries, even if they use time64 ABI. Gentoo won’t be able to run third-party executables unless they are patched to carry the correct marking.

Perhaps a better solution is to set our aims lower. Rather than actually distinguishing time32 and time64 binaries, we could instead inject RPATH to all time64 executables, directly forcing the time64 libdir there. This definitely won’t prevent the dynamic loader from using time32 libraries, but it should help transition without causing major incompatibility concerns.

Alternatively, we could consider the problem the other way around. Rather than changing libdir permanently for time64 libraries, we could change it temporarily for time32 libraries. This would imply injecting RPATH into all existing programs and renaming the libdir. Newly built time64 libraries would be installed back into the old libdir, and newly built time64 programs would lack the RPATH forcing time32 libraries. A clear advantage of this solution is that it would remain entirely compatible with other distributions that have taken the leap already.

As you can see, the situation is developing rapidly. Every day is bringing new challenges, and new ideas how to overcome them.

9 thoughts on “The perils of transition to 64-bit time_t”

  1. To nitpick, in the example of how the size of the struct changes the offset of member c with 64-bit time_t is 16 and not 12 as claimed. This is because natural alignment of 64-bit b requires 4 bytes of padding between a and b.

      1. Actually, now that I think about it a bit more, your original statement is correct, kind of. Or both our statements are correct depending on which target we’re looking at. The i386 ABI is one of those weird ones where 64-bit integer (and floating point) types only have 4 byte alignment. So on i386 there’s no need for that extra 4 bytes of padding between members a and b.

        However, other 32-bit ABI’s, like aarch32, do require 8 byte alignment for 64-bit types.

        Argh!

  2. Where you say 32-bit time_t you should say 31-bit time_t (signed int), since AFAIK a true 32-bit time_t (unsigned int) won’t have any problem until 2106.

    With that being said, have you explored the possibility to just change it to an unsigned time_t? This would keep time_t variables to 4 bytes.

    1. Well, 32-bit signed. While I get the potential for confusion, this is how it is ually referred to.

      As for switching to unsigned, that doesn’t really solve anything. You still get incompatible values, everything still breaks — just that it breaks in less dangerous way.

  3. Slightly off-topic: I write apps in C. How can I make sure my code uses 64-bit `time_t`? I’m not doing anything special; I `#include ` and use it normally. I’d hate to have a dependency on 32-bit `time_t` without realizing it.

Leave a Reply

Your email address will not be published.