Categories: Gentoo, Gentoo-Stats, SoC, Portage
Time to say goodbye
So, time has come for me to realize that my time with Gentoo is over. I
haven't actually been doing much Gentoo work over the last months due
to personal reasons (nothing Gentoo related), and I don't see that
situation changing in the near future. In fact I've already reassigned
or dropped most of my responsibilites in Gentoo a while ago, so there
are just a few pet projects left to give away:
- my gentoo-stats project (in the portage/gentoo-stats svn repository).
I know quite a few people are interested in the idea of collecting
various statistic data from gentoo user systems, and I'd encourage
everyone who wants to implement such a system to at least look at it (I
may have even finished it if I wouldn't have wasted my time focusing on
the wrong problems). There is quite a bit of documentation also that
should help to get you started
- a graphical security update tool (see bug #190397)
So if anyone wants to adopt those, complete or just parts, just take
them. As for Portage, Zac has practically already filled my role.
So I guess that wraps it up. It's been a nice ride most of the time,
but now it's time for me to leave the Gentoo train.
More extensions to package set support
After writing my previous post about set operators I've added a few more things related to package sets to portage. First, operators can now also be used inside sets.conf files using the extend, remove and intersect options, each taking a whitespace separated list of set names (without the @ prefix), working analog to the operators in set expressions described in the previous post. The main difference is that the evaluation order is fixed now (unions come first, differences second and intersections last) while in expressions it's left-to-right.
The second new feature is that package sets can now be (re)defined on the emerge command line. This is done using the following syntax:
emerge '@setname{key1=value1,key2=value2}'
where setname can either be an existing package set, or a new one to define a set without having to modify any files. Note the quotes that are necessary to ensure that emerge gets the argument as-is without interference from the shell. The nice thing is this syntax also works inside set expressions. The not-so-nice thing is that for now there are a few restrictions about the values you can use, as there is no quoting mechanism implemented yet (this is planned however). So using any of the following characters or whitespace inside the braces will lead to undefined behavior: { } @ = ,
Another restriction is that you may not redefine package sets that are created by a multiset section in sets.conf (as those use different options that only make sense when defining multiple sets at once).
Note that for redefining existing package sets you only have to pass those options that should be different from the sets.conf definition.
And last but not least, to make the above features a bit easier to use there is also a new DummyPackageSet class that can be used to build a package set only by using operators, and/or to include a few packages without having to edit an external file. So it's even easier than before to define a new set @world-without-system, using
[world-without-system]
class=portage.sets.base.DummyPackageSet
extend=world
remove=system
Package set operators
Ok, just a quick notice that portage 2.2_rc10 (or 2.2 final if there isn't another RC) will not only support package sets as defined in sets.conf, but also expressions to generate unions, intersections and differences of multiple package sets. This for example allow you to temporarily exclude @system from @world (assuming you have @system in your world_sets file) by running emerge @world-@system.
Other operators are / for intersections (select only atoms included in both sets) and + for unions. The latter is useful as expressions can contain more than one operator, e.g. emerge @kde+@gnome/@installed to reinstall all kde and gnome packages that are already installed (assuming kde and gnome sets are defined somewhere).
This feature is just a few minutes old, so it will probably be extended or otherwise changed in the future. Current restrictions include
- strict left-to-right evaluation order
- only defined package sets can be used as operands (no package names)
- feature is currently only available on the commandline, not via sets.conf
And while I'm on it, I've also added a new AgeSet class to select installed packages that are older/newer than a given number of days.
Portage-2.2_pre2 is in the tree
As of a few minutes ago a portage-2.2 test release is finally available
for public consumption. This is a test release (somewhere between
alpha and beta I'd say), NOT a release candidate, so expect a few rough
edges and not always up2date/complete documentation.
Please see the shipped NEWS and RELEASE-NOTES for changes from the 2.1
series, and check bugs.gentoo.org before reporting issues in
#gentoo-portage.
Note for Ebuild developers: This test release includes a partially
rewritten version of repoman that's not heavily tested, so do not use
it for committing anything to the tree and double check its reports
with other tools or a 2.1 version.
Marius
Portage-2.2 preview
So, while Zac has been keeping everyone distracted with new portage-2.1 releases over the last months I've been mostly working on the new features in trunk, which will become portage-2.2, and I think it's time to give a short preview about things to expect as we plan to release it before the end of the year, so the feature set probably won't change much from now on:
- The most important new things will be package sets. Sounds boring at first, I know, but due to a flexible framework they allow us (and you) to do interesting things, like eventually replacing glsa-check and revdep-rebuild (while the security set is pretty much identical to what glsa-check did, the set for rebuilding packages with broken linkage is very experimental, incomplete and not enabled by default yet). Or simply update or remerge all packages in a specific category. And that's not even touching the power of the CommandOutputSet class

- Support for GLEP 42 news, as an alternative for package maintainers to the elog framework
- Visibility filtering based on licenses, aka ACCEPT_LICENSE, which allows you for example to build a RMS-approved system and will render the interactive license prompts currently found in some packages obsolete
- A new FEATURES flag to keep libraries that are still used on a soname change, including a simply way to rebuild all packages using the old library (using another package set). A bit too late for the expat issue, but hopefully it helps to prevent future incidents of that kind
- And of course all the things that already appeared in portage-2.1.3
But no light without darkness, there will be some important changes requiring your attention:
- While not set in stone yet, the behavior of system and world will likely change to match that of other package set and single packages. Currently
emerge worldis the same asemerge --noreplace world, meaning that installed packages aren't rebuilt (unlikeemerge $foowhich will rebuild $foo). With 2.2emerge worldis likely going to be the same asemerge $(< /var/lib/portage/world), if you want the old behavior you'll have to use --noreplace. That change also has other benefits beyond consistency, like removing the restriction that world/system could not be combined with other packages on the commandline. - "world" will likely no longer include "system", if you want to update both you'll have to specify both
- Due to a change in the namespace many portage related tools will require an update or generate a lot of deprecation warnings.
As said, it's just a preview, and some things are still work in progress, but it should give you a first impression what portage-2.2 will be about. I think we might create the first test releases in late November, but that's no promise. Though if you want to test it you don't have to wait that long, just install subversion and read http://www.gentoo.org/proj/en/portage/doc/testing.xml (that's especially recommended for maintainers of portage related tools), just don't expect everything to work perfectly yet.
[RFC] Properties of package sets
One missing feature in portage is the lack of package sets. Before we
(re)start working on that however I'd like to get some feedback about
what properties/features people would expect from portage package set
support.
Some key questions:
- should they simply act like aliases for multiple packages? E.g.
should `emerge -C sets/kde` be equivalent to `emerge -C kdepkg1 kdepkg2
kdepkg3 ...`? Or does the behavior need to be "smarter" in some ways?
- what kind of atoms should be supported in sets? Simple and versioned
atoms for sure, but what about complex atoms (use-conditional, any-of,
blockers)?
- should sets be supported everywhere, or only in selected use cases?
(everywhere would include depstrings for example)
- what use cases are there for package sets? Other than the established
"system" and "world", and the planned "all" and "security" sets.
- how/where should sets be stored/distributed?
News on the portage front
So, it's been a while since my last post, so people may wonder what happened since then within portage. Well, besides the usual maintenance releases of 2.1.2 there hasn't been a lot of exciting stuff as I've been mostly inactive, but there are still a number of interesting things:
- We've decided that trunk will be released as 2.2, not 2.1.3, due to the structural changes in the codebase (which aren't complete yet)
- I've finished Alecs work on the portage implementation of Glep 42 and added Portage support to the eselect module that was shipped with Paludis and made it compliant with the Glep, now we just have to wait for the eselect and Paludis people to get their act together for the module to be released (bug 179064)
- The new preserve-libs feature is now more or less complete except for the support in revdep-rebuild (more on this in a later post)
- KEYWORDS="-*" is now completely unsupported, gvisible() will throw a warning if it encounters packages using it (see KEYWORDS.stupid for reasons)
- Zac merged the license visibility code (aka ACCEPT_LICENSE)
- lots of other minor things Zac merged that I don't remember now, but most of those are also in 2.1.2
- I've added some basic instructions to our project page how interested people can use/test portage versions or svn without having to install them system-wide
There are still a lot of things I'd like to do, but most of those have been on the todo list for so long that it's unlikely to get them into 2.2, as my time and motivation is quite limited these days.
As if we needed a confirmation ...
... that perl is evil I just noticed this in the output of my Manifest version checker:
dev-perl 666 ![]()
(= there are 666 packages in dev-perl) This also means that dev-perl is by far the category with the most packages, almost 300 more than the second-largest category dev-java.
diet for portage/__init__.py
So, as I said earlier I've now moved the dbapi stuff into it's own subpackage, and portage/__init__.py (formerly portage.py) has now shrunk to 5k lines. However, that's still way too much for me, so I'll see what I can remove from it next, likely candidates are config() and/or doebuild stuff.
Hopefully at some point no module will have more than 1k lines, so things get managable again and we can start working again without getting lost in files that span hundreds of pages, and maybe even break some of teh larger functions/classes (config, fetch, treewalk, ...) down into smaller pieces. Now what's the point of breaking things up? Well, one thing is that the smaller a code block the easier it usually is to reuse it. Same for replacing it with something better. Also as I also have to determine what symbols each new module actually uses to rewrite the import statements it might also give us a better view on which symbols are actually used, the dependencies between modules and eventually give us a clue how to group them better (so that semantically related symbols are in the same namespace).
Namespace sanitizing and splitting up the tree
Something that's bugged me for while in portage was the crappy namespace handling we had since whenever we moved the python modules to /usr/lib/portage/pym. Originally there was no real problem as we only had a single module portage.py, so all you needed was a 'import portage', but over time more modules were created, which Nick started to name portage_foo.py due to the lack of a "portage" python package to use as container. Also there were a number of modules without any "portage" part in the name, such as xpak, cvstree, output or the cache package, which could potentially cause a namespace collision with other packages in site-packages or even the standard library, not a very pleasant thought.
But as of today that's history, I finally fixed this annoyance and moved all the portage related code into the new "portage" package (so portage.py is now portage/__init__.py and portage_foo.py is now portage/foo.py). For now the code is mostly a 1:1 translation, but over time it hopefully gets a bit cleaner by removing redundant qualifiers. Also this now allows us to split the big portage.py (or now __init__.py) up further without fearing namespace collisions, I'll probably move the dbapi classes into their own package later this week.
But what does this all mean to you? If you're just a normal user it shouldn't affect you in any way (assuming I didn't screw up anything and Zac updates the ebuild accordingly). If you have some custom scripts or are a developer of a tool using the portage API you should prepare for updating it after portage-2.1.3 is released, though for the time being the old names should just continue to work as I've also added some symlinks to avoid a large-scale API breakage.
On another note I fully agree with Diego on the idea of splitting the tree up. I've never been a big fan of the recent overlay hype, but at this point it's still manageable. Also besides any technical problems a tree split would increase the "repo hunting" problem which we're already starting to see and is IMHO one of the major downsides of most other (rpm-based) distributions, and that's something I'd like to avoid in Gentoo.
Getting rid of KEYWORDS=-*, step 2
After raising the awareness about KEYWORDS="-*" being a stupid thing to use in the last months today I decided to eliminate the remaining reason for using it (one couldn't unmask a package that had KEYWORDS="" without editing it) by adding support for a new token in package.keywords. So now when portage-2.1.3 goes live all theses live-cvs-completely-unsupported packages can stop using the broken KEYWORDS="-*" and use KEYWORDS="" instead without loosing functionality. And once we get the tree clean from those KEYWORDS="-*" abusers we can also finally fix the -* handling for package.keywords to do what it should do (act like ACCEPT_KEYWORDS).
Gentoo-Stats isn't dead (yet)
I assume some of you have been wondering what has happened to my gentoo-stats project as there haven't been any news or updates recently. Well, unfortunately there isn't much going on, I guess I've been just a bit too frustrated with it to work on it in the last weeks/months. That frustration mainly comes from the package-filemap module and its crappy performance and the conceptual failure of the auth encryption I had planned/implemented. The latter is just frustrating simply due to the wasted time, but the former means the lack of a key feature, namely finding which packages provide a given file even for uninstalled packages. Already tried several things to get it faster but without real success so far
Now I have two more ideas how to get it still working: First is to simply reduce the amount of data to the bare minimum (e.g. just recording executables and libraries), the second is using a custom storage backend for filenames instead of using MySQL for everything (as the DBMS is the slow part). I really want to avoid the first (as it would reduce functionality and likely just delay the problem a bit) and only use it as a last resort before dropping the module completely, so a while ago I wrote a custom backend based for storing filenames efficiently, but haven't integrated it yet into the processing module. We'll see if I can find some time in the coming days/weeks to get this project back on track.
If you're interested in helping with it:
- I don't have any design for the web interface yet, so far it's just basic HTML-2.0 or so. I'm not a big designer, so this is something where I'd definitely welcome external help
- A GUI for the client would be nice (like for selecting data modules or performing complex queries), but I'm not a big fan of GUI programming (though I could help with any missing backend parts in the client)
- Wouldn't hurt to have someone else who's an expert with (My)SQL/mod_python/security have a look at the current code/db schema before this service goes into public testing.
what to do with deleted ebuilds
So today I wrote a little script to store the ebuilds and associated files of installed packages into a separate overlay as a first measure to solve bug #126059. It's still far from perfect as it only adds to the overlay, isn't limited to ebuilds deleted by an `emerge --sync` (which can be seen as a benefit as well) and most importantly doesn't deal with Manifests yet.
The last point makes it rather useless for general usage, but it's just a prototype script to see if this is a viable solution to the problem of disappearing ebuilds.
So any feedback about this would be appreciated.
more elog goodness
After venting about annoying users last week I'll try to post something a bit more useful today, maybe this will even evolve into a series about what's new in portage land.
Well, today I added a little extension to the elog subsystem to make multi-target logging a bit more useful by extending the PORTAGE_ELOG_SYSTEM syntax a bit. Now you can override PORTAGE_ELOG_CLASSES per module, so for example one might send all messages into a file (using the save-summary module added in 2.1.2) and additionally send the important ones also by mail. Another related extension is that you can now use a * wildcard whereever a loglevel is wanted to include all loglevels.
So using the example above you would put the following in your make.conf:
PORTAGE_ELOG_SYSTEM="save-summary:* mail:log,warn,error"
Dear users, ...
sometimes I just hate you. Why? Because every now and then someone you haven't heard of before comes around and asks a question, then doesn't like the answer and calls you and idiot in one way or the other. Like today when somone joined our portage IRC channel, asked how to test a package on a arch it wasn't keyworded for and then refused to understand how the keyword system works. He even called using package.keywords for anything other than ~arch a misuse of that feature (which is funny as I wrote that feature initially).
And after trying to explain it three times he still insisted that we should implement another way because the current one (with ACCEPT_KEYWORDS and package.keywords) wasn't "logical".
Instead he insisted that we add a feature to allow users to change package metadata (like KEYWORDS) with a simple config file, which isn't just redundant but as any dev will assure you quite stupid and possibly harmful.
In the end he left after some nasty remarks, without having accomplished anything other than to upset a couple of devs due to simple ignorance. Now I assume he probably feels the same about us, but sometimes you just have to accept that the other party is right, and in this case we have >99% of the userbase and several years of experience backing our way up.
Now I know that such incidents are the rare exception and in most cases dealing with users is a nice experience, but they can make you pretty upset for a while sometimes.
So what's the moral of the story? If you're in (heated) discussion, always consider that you may be wrong, try to look at the other parties arguments from a different angle. If you notice that you're hitting a wall go away for a while so everyone can cool down (and try the first advice again), and maybe try to resume the discussion at a later date. And if you're talking with someone who's an expert in a domain while you're not, don't try to beat him in that domain (e.g. don't say that you change package metadata with package.keywords when you aren't sure what package metadata is in the first place). That doesn't mean that you can't talk about it, but realize that the other person likely knows a lot more about the topic (and maybe doesn't want to explain every little detail to you when rejecting an idea).
In the end it's just for your own benefit (and nobody likes grumpy devs
)
transport code works, performance issues
Made a big step forward in the last few days as I've implemented the basic client ID management and the DRF transport code, in other words I can now register a client and upload stats data to the server
(in theory at least)
Of course this is very fragile at the moment, partially because the client is a bit too optimistic when generating deltas, and sending a delta record if the server doesn't have the matching base record isn't all that useful.
On another note I'm thinking about disabling the packagefilemap module on the serverside as processing the data for it simply takes way too much time in the current state, at least on my box. All I can hope right now is that a box with better IO and/or CPU is going to make a very large difference.
The main problem is simply that inserting a DRF with the packagefilemap data included can result
in several 100k or even a few millions inserts (one or two per file) and selects (to lookup foreign keys). It's not going to be a very common operation but will happen regulary (at least whenever a client submits the first DRF with packagefilemap data). Now I expected this to take some time, but I didn't expect it to take over ten minutes, and currently it's close to 30 (with an almost empty db, would probably be worse for a populated db).
While I know of a few ways to decrease the number of inserts and selects quite a bit I don't like them as they require dropping certain functionality (like not being able to associate a filemap entry with package use flags anymore).
Well, we'll see how it goes once I can test this on a not-so-crappy box.
how to authenticate
So now I'm at the point where I need to work on the authentication part for the stats server code, and I noticed that my plan to use http digest authentication doesn't work as that requires to store the plaintext password of clients on the server which I'd like to avoid (generally one should only store a hash of the passwords in the authentication backend).
Before going into alternatives let me list a few requirements I have for them:
- don't require the real password in the auth backend
- don't transmit the real password unsecured over the network
- must work with only http headers, don't touch the body in any way
- must be easily scriptable
- preemptive authorization (e.g. send the auth data with the first request)
- should work within a webbrowser
So, what options do I have now? Well, I can't see a single alternative that fits all requirements (if you know one let me know), the closest is http basic auth, but I really don't want to send the password over network as almost-plaintext. This lead me to the idea of extending it with gpg-encrypting the password, but that's not transparent when you use the browser (not that important for the current use case) and more importantly gpg adds about 600 bytes of protocol overhead for encrypted data (without using --armor), with the base64 encoding required for http that's almost one kilobyte just for a password that originally only had a few bytes.
So, right now I have to select between a rather hackish, inefficient and untested but secure solution and a well-tested, relatively efficient and well-specified but insecure one. What would people prefer here?
Or does anyone know another solution to the problem that satisfies the above requirements? (the first four are hard requirements, the other two I could work around)
gentoo-stats test request 1
Don't get too excited about the title, most stuff isn't usable yet, though I think it doesn't hurt if a few people start testing the parts in the client that are supposed to work. If you feel brave enough start reading the little test-howto (work in progress).
gentoo-stats status
So after slacking for about a week or two due to the crappy weather
here (I guess most people would call >=30
Oh, and I just got the cheque with the initial payment an hour ago. It's about 390 Euro (stupid exchange rates) and was delivered by FedEx, but no tracking mail, so I was quite suprised (as the "surprise" package was delivered by DHL and included a tracking mail).
:: Next >>