Metadata Cache Backend Based on Extended Attributes

I got this idea about writing a Portage metadata cache backend based on extended file attributes. We are talking about file metadata after all and the key=value format fits the cache quite fine. I have it working now. On the road I hit a couple of interesting issues. The cache can have arbitrary long lines but all file systems I tested have a limit on how long the values can be. I decided to just split the values into multiple attributes when they are too long. I also found out that ext4 and btrfs use the wrong errno to signal the value being too long. man xattr_set says it should be E2BIG but both of those file systems return ENOSPC. I opened an upstream kernel bug about this to see what they think:

This is what it looks like currently:

betelgeuse@pena /mnt/test/dev-java/java-config $ getfattr -d java-config-2.1.7.ebuild  | head
# file: java-config-2.1.7.ebuild
user.DEFINED_PHASES="1:compile install postinst postrm unpack"
user.DEPEND="1:dev-lang/python >=sys-apps/sed-4 virtual/python"
user.DESCRIPTION="1:Java environment configuration tool"
user.KEYWORDS="1:~alpha ~amd64 ~arm ~ia64 ~ppc ~ppc64 ~x86 ~x86-fbsd"

As for performance the current implementation seems to perform about the same for emerge -uDpv world as the default cache.
These results are with a warm file system cache.

Results on btrfs/xattrs:
real 0m13.194s
user 0m10.906s
sys 0m1.811s
real 0m12.101s
user 0m10.847s
sys 0m0.980s

xfs does a little better because it has a longer limit for attribute values. I guess that most of the time is spend in doing something else than cache lookups but will try to profile later. The code isn’t committed anywhere outside my portage trunk git svn checkout yet but will try to see if this is something zmedico accepts to Portage trunk. Probably not going to be a documented option any time soon though.

3 thoughts on “Metadata Cache Backend Based on Extended Attributes”

  1. Awesome! I’m so glad that you implemented and tested this, since it’s the same as an idea I had a while back.

    Profiling sounds like a great idea. Might be that the performance would out-do current if a bug or few get fixed.

    How’s the cold-cache comparison? I’d expect that accessing half the number of files (ebuild only instead of cache+ebuild) doesn’t improve performance.

  2. It was interesting to test and implement in order to find out how extended attributes work. Not everything implemented has to be something that would actually become something users would use or even usable in the general sense.

Comments are closed.