Aug 16

Quick summary:

I’m writing a CMS for the Gentoo website, that will offer an LDAP web interface, plus it will replace Gorg and provide Beacon as WYSIWYG editor to edit the XML file

There were some serious bugs in the edit account page. The ACL is very complex there, since there are public attributes (accessed by everyone), semi-private attributes (accessed by the user only and the admins (eg. birthday)), and private ones (accessed only by admins). Keep in mind that everything is configurable, but there is some duplication between the Django and LDAP ACL, since there is no easy way to parse the LDAP slapd.conf yet, we need to migrate our infra to cn=config first, which is a not easy long term task. The bug was not in the LDAP part, remember that the user changes his/others’ (in case he has the right privs) attributes with his own account, not by using a global admin account. The bug was in the Django part, where the system expected to be able to change some data, and weird error messages/exceptions were thrown out. Unfortunately this is not complete yet, it needs more investigation in order to ensure we are not opening any security holes here. The good news is that I tested with our current official configuration, and various tweaks on it, and seems to perform fine. Plus, it seems ready for the improvements I intend to do (for adding regular users in LDAP etc).

I was also able to plug in some CSS/JS written by my mentor. It is just some preliminary work, nothing complete yet, we’ll need more help on this, especially from people with some experience in web design stuff.

Beacon didn’t work out as expected. It became too complex, consisting of lots of JS and XSLT, for reading the XML files and printing them. It even stores accounts in its own DB to keep track of the documents that users edit. This was way out of our needs, we just need the WYSIWYG part only and plug it in in a separate web app. Obviously in its current state it is not a workable solution without significant additional effort. What we could do for now is to split some parts of its code, like the python scripts for converting XML to HTML and the opposite, which is also not an easy task.

I must admit that I am really happy that the GSoC is coming to its end, and the real fun begins :)

Jul 30

Introduction

As a KDE packager, I usually have to write and test patches, especially build system related (Examples: Choqok, Amarok, Plasma and happy KStatusNotifierItem’d Akregator and Kaffeine, they don’t look ancient any more :) ). Gentoo, as a source based distro, has the ability to provide packages that clone/checkout the source from upstream’s SCM and compile it directly (called live ebuilds). For KDE, it provides live ebuilds from KDE SC master/trunk and the branch(es) (currently 4.7), plus live ebuilds for many extragear/playground and other packages (all of the above are available in the kde overlay). Also, we provide Qt live ebuilds both from master and many branches, in the qting-edge overlay. I wanted to use our Gentoo live ebuilds in order to test patches, but there were multiple problems. Emerge downloads the sources in $DISTDIR and stores them as the portage user. Plus, the git eclass was using bare repos, and it would reset the repo to master before each emerge. In order to solve those problems, I created a few scripts and wrappers, and convinced Tomas to introduce two new variables in the new git-2 eclass to fit my needs (thanks a lot bro, you owe me a beer).

Define the needs

In short, what I want is:

  • download the sources somewhere in my homedir
  • my everyday user to have write permissions to them
  • non-bare clones
  • url = anongit.kde.org AND pushUrl = git.kde.org, if possible directly on initial clone
  • if possible, have a live and a regular release side by side

The last dot was solvable, but not any more. We used to provide a kdeprefix USE flag, that allowed us to do exactly this (install multiple KDE versions using different prefix (eg /usr/kde/4.7 /usr/kde/live). It had many problems though, that forced us to remove it. The problems it had were mostly in non-KDE packages, eg Sip, which also needed to be prefixed, which was too much workload. Anyway, a chroot could solve that issue.

As for the permission issue, I asked Zac if portage itself could provide something like this (using my user instead of the portage user), and he suggested that creating a git wrapper would be a clean solution.

After a while I was able to extend the above for my gentoo overlays (unofficial ebuild git repositories), since I have write access to most of the ones I use in my system.

Configuration

All the scripts mentioned can be found here. Although well tested here for the past few months, use them at your own risk. In the following examples I’m going to use the configurations for both the KDE and Gentoo git repos. Of course, you can ignore them (“Gentoo repos” blocks in the following scripts).

First, we need to set the following in /etc/make.conf:

# Needed by the Git wrapper
KDE_DEVELOPER=1 # For the KDE repos
GENTOO_DEVELOPER=1 # For the Gentoo repos
EGIT_NONBARE=1 # This one sets the git-2 eclass to clone non-bare repos

Next, we set up some git aliases in ~/.gitconfig, as suggested here:

# KDE Repos
[url "git://anongit.kde.org/"]
    insteadOf = kde:
[url "git@git.kde.org:"]
    pushInsteadOf = kde:
# Gentoo Repos
[url "git://git.overlays.gentoo.org/"]
    insteadOf = gentoo:
[url "git@git.overlays.gentoo.org:"]
    pushInsteadOf = gentoo:

And the git wrapper, which should be put in /usr/local/sbin/git:

#!/bin/bash
source /etc/make.conf
USER="tampakrap"
GROUP="tampakrap"
GIT="/usr/bin/git"
 
if [[ $1 == 'clone' ]]; then
    # KDE Repos
    if [[ $2 == "git://anongit.kde.org/"* ]] && [[ $KDE_DEVELOPER == 1 ]]; then
        KDE_REPO=$(echo $2 | sed 's:git\://anongit.kde.org/::')
        $GIT "$@"
        chown -R $USER:$USER $DISTDIR/egit-src/$KDE_REPO
    # Gentoo Repos
    elif [[ $2 == "git://git.overlays.gentoo.org/"* ]] && [[ $GENTOO_DEVELOPER == 1 ]];then
        GENTOO_REPO=$(echo $2 | sed 's:git\://git.overlays.gentoo.org/::')
        $GIT "$@"
        chown -R $USER:$GROUP $DISTDIR/egit-src/$GENTOO_REPO
    else
        $GIT "$@"
    fi
else 
    if [[ ${PWD%/*} == $DISTDIR/egit-src ]] && ( grep -s -q gentoo .git/config || grep -s -q kde .git/config ); then
        sudo -u $USER $GIT "$@"
    else
        $GIT "$@"
    fi
fi

The above script consists of two parts: if the git argument is clone, it checks if the URL is a KDE or Gentoo one and chowns the repo after cloning. If it is something else (eg pull), it checks again if the URL is a KDE or Gentoo one, and uses sudo -u $USER:$GROUP to preserve the permissions of the repo. The repos are still in the $DISTDIR/egit-src dir ($DISTDIR is usually /usr/portage/distfiles, but it can be changed in /etc/make.conf), so the following script creates symlinks of those somewhere in the homedir (put it in /usr/local/bin/create_repolinks):

#!/bin/bash
 
# Headers
source /etc/make.conf
. /etc/init.d/functions.sh
 
# Variables
REPO_DIR="/home/tampakrap/Source_Code/" # Where to store the symlinks of the repos
GENTOO_REPO_DIR="${REPO_DIR}gentoo/"  # Gentoo repos
KDE_REPO_DIR="${REPO_DIR}kde/" # KDE repos
OVERLAY_DIR="/var/lib/layman"
 
# No root
if [[ $UID == 0 ]]; then
	   eerror 'root is forbidden'
	   exit 1
fi
 
# Gentoo Overlays
cd $OVERLAY_DIR
for repo in `ls -d */`
do
	   pushd $repo > /dev/null
	   einfo "Checking $repo overlay"
	   if [[ ! -z `grep git.overlays.gentoo.org .git/config` ]]; then
		      sed -i -e 's:git\://git.overlays.gentoo.org/:gentoo\::' .git/config
		      ewarn "gentoo git url corrected for $repo overlay"
	   fi
	   [[ -L ${GENTOO_REPO_DIR}${repo%/*} ]] || (ln -s /var/lib/layman/$repo ${GENTOO_REPO_DIR} && ewarn "New symlink for $repo overlay")
	   popd > /dev/null
done
 
# KDE Repositories
cd $DISTDIR/egit-src
for repo in `ls -d */`
do
	   pushd $repo > /dev/null
	   einfo "Checking $repo repository"
	   if [[ ! -z `grep anongit.kde.org .git/config` ]]; then
		      sed -i -e 's:git\://anongit.kde.org:kde\::' .git/config
		      ewarn "kde git url corrected for $repo"
	   fi
	   if [[ ! -z `grep kde: .git/config` ]]; then
		      [[ -L ${KDE_REPO_DIR}${repo%/*} ]] || (ln -s ${DISTDIR}/egit-src/$repo ${KDE_REPO_DIR} && ewarn "New symlink for $repo")
	   fi
	   popd > /dev/null
done

Last but not least, we need the kde overlay, to get the live ebuilds:

layman -f -a kde

For more information on this, take a look at the Gentoo KDE Guide

Usage

With the above configuration, we can use:

emerge -av =amarok-9999
create_repolinks

and get the amarok repository in our homedir, ready to patch it. As I said, in case we modified the code and tried to re-emerge the ebuild to get our patch in action, emerge will reset our repo to master again. Thus, we need to use the following variable:

EVCS_OFFLINE=1 emerge -av1 amarok

This will prevet the reset of the repo. In case we want to use a full live environment, we can even put that var in make.conf, but it is not recommended, better to use it in single emerge runs like the above.

That’s it. I plan to write a PyKDE UI for easy installation of the scripts, and maybe write a proper techbase article for it. Any feedback is appreciated.

Jul 28

Quick summary:

I’m writing a CMS for the Gentoo website, that will offer an LDAP web interface, plus it will replace Gorg and provide Beacon as WYSIWYG editor to edit the XML file

Two important things hapenned: 1) I passed the midterm (thanks to my mentor and everyone involved) 2) I graduated YEY!

I’ve left the LDAP bits behind for now (apart from bugfixing here and there). It is working fine, and supports:

  • login (with any of user’s mail)
  • registration (the admin can specify which OU will be used for initial user creation) (for development purposes, it can even create top O and OU in an empty LDAP server)
  • map LDAP ACL to Django ACL
  • view some user’s data (in settings we can specify which attrs the user himself can see, and which ones privileged users can see)
  • edit own data (again, only specific attrs based on perms)
  • edit other user’s data (if the logged in user has the correct permissions for that)
  • An addressbook (list of users, separated in developers, exdevs, others (the lists are configurable))

I’m still working on the UI, and started messing around with Beacon. It is a very interesting project, which is getting more love again, through a Fedora GSoC project (it even started as a GSoC project). It has two backends, a PHP and a Django one. I already talked to the upstream guys, they showed me their TODO list [1]. Some of those are needed for me as well, which is very nice, since my patches can go upstream directly. I was going to write a custom script to export the generated XML output, which is one of the things Beacon itself needs as well. Another important thing is to load external files in order to edit them. Finally, the git integration I was going to implement also sounded like a nice feature. Really glad to see that we are on the same road, my plan was to not fork the project but keep the changes there as possible. Matt, my mentor, was helping Beacon with the Django part since the beginning. I plan to work on those three features for the next week (weekend included).

Apart from the above, I’m working on our XSLT and Python’s decorators to create Django templates based on our XML files.

Okupy is deployed in the server, I need a final review from my mentor and will open it to some people for testing really soon (target: this weekend).

[1] http://tinyurl.com/3g4424o

Jul 12

Quick summary:

I’m writing a CMS for the Gentoo website, that will offer an LDAP web interface, plus it will replace Gorg and provide Beacon as WYSIWYG editor to edit the XML files.

This is going to be small but really important. Robin set up for me an LDAP instance in vulture for me, plus reviewed my cfengine patches for OpenLDAP, Django and the various depedencies, thanks a lot for this! I’m in the process of deploying the web application to the server, and will move development fully there. I plan to open it for a few people for more beta testing in the following week. There has also been some internal Infra discussion on whether to use multiple OUs (OU=users, OU=developers etc), without an agreement yet, but my code works either way. Also I need to expand our LDAP configs and add a few more groups there, like a user.group, and some other privileged groups like devrel, pr (currently we have only infra, recruiters and devrel I think).

As for the development of the app itself, the past days I’ve been doing various bugfixing in the LDAP frontend and playing around with the UI mostly. It is very configurable, the admin can choose which LDAP values to print, and in which form (eg human readable: username / first name / last name OR keep the LDAP names: uid / givenName / sn). The user can view his own attributes or someone else’s public attributes. A privileged user can see more attributes from other users, plus add/remove another user from some groups. There has been some ACL duplication here, but unfortunately there isn’t a better way to do it at the moment. Robin proposed another long term solution: if we move our LDAP configs to the new cn=Config style, the app then could parse that config and generate the ACL accordingly to Django settings. It can’t be done now though, since Infra needs to migrate LDAP to that style first, which I know it’s going to be painful (I’ve done it already for a uni server about a year ago). I’m working on the UI of the edit view now, which is a generated form by the user profile model. Although it works (user can edit his data successfully, admins (eg infra/recruiters in Gentoo case) can edit other users’ data as well), there has been some pain in printin nice the multivalued attributes of LDAP. Currently, the multivalued attrs are transfered to a TextField in the DB, and the values are separated with :: for easy split-desplit. With the help of Matt I wrote a form widget, but it still needs to look prettier when the user wants to add or delete a new value.

Apart from the above, I’ve also started working in general on the UI, and the front page. Matt gave me some some CSS to plug in to my templates, but my overall goal would be to create an easy way to create new themes to the app, instead of having to touch the templates (should be easy in Django). The UI and the front page is what I’m going to do for the next few days, and then start working on the Beacon and XSLT/XML parts. Last but not least, I wrote an addressbook as a replacement to userinfo.xml.

Jun 30

Quick summary:

I’m writing a CMS for the Gentoo website, that will offer an LDAP web interface, plus it will replace Gorg and provide Beacon as WYSIWYG editor to edit the XML files.

The past two weeks I’ve finished the LDAP bits, plus I’ve added some more features mostly needed for development purposes. In the settings files, the administrator can provide a bunch of variables:
  • the OU(s) the users are stored (there is support for multiple OUs, for example to separate users from developers with ou=users and ou=developers, while keeping unique usernames)
  • the credentials for the anon user (minimal privileged user to perform LDAP queries in case the anonymous search is disabled, both cases are covered in the app)
  • credentials of the admin user (needed mostly for user creation), the objectClasses for new users, the base attribute to search for users (uid and cn are the most common)
  • a map with user profile attributes (Django has only username/password/email/real name in its base profile, it is easily extendable though by specifying a connection between user profile fields and LDAP attributes)
  • a map with LDAP and ACP groups (for example, is_infra, is_devrel etc, depending on the LDAP permissions the user is able to view or touch other user’s data)

The login system had to change though. Robin wanted mail logins instead of username logins. This needed a lot of changes, since in LDAP mail is a multi-valued attribute, and in Django is single-valued field. I created an all_mails field in user profile instead, that has all the mails, but the user has to verify about them first. In initial registration, the user’s mail is stored in a DB table, along with a 30char string, and a mail is sent to the user which contains the same string in the form of a URL. The system checks if those two match, and if they do, it removes the entry from that table and moves the mail to the user’s LDAP mail attribute (and in the all_mails field in the DB, if applicable). The same procedure is followed when the user wants to add a new email to his account, for which he has to verify before getting it in the list. Afterwards, the user can log in with any of those emails he has verified. For password recovery, the user fills in the mail he wants to use for that session.

The user profile is extendable, if other people want to use the LDAP frontend. For now there is a GentooProfile class that extends the UserProfile class, that has gentoo-specific fields based on the LDAP attributes Gentoo uses, plus the custom gentoo LDAP schema.

User settings are available, under accounts/$USER subURL. The system checks if the URL maps to the user currently logged in, or another user in the LDAP server, then checks if the user is in the DB, migrates it if not, and shows the fields according to the logged in user’s permissions. Edit settings is also available and works with the same logic.

I’ve also added a lot of docstrings there, and started messing around with sphinx.

The logging system is improved as well. The errors are printed in console if the project is run with Django’s runserver for development purposes, and in /var/log/messages (which is configurable, it can go to a dedicated dir easily) for production use.

More tests were written, and the ebuild is almost complete. I’ve set up an instance in one of my home servers, which will run tests automatically and notify me for failures.

There is an addressbook available, as a replacement to userinfo.xml we currently have. I’m going to play around with genmap as well to replace the developer map.

Since the LDAP work is done, with only bugfixes and small improvements needed here and there, I’ve started working on the front page. It will follow the steps of the one we currently have. It will be a syndication-like page, combining the info from planet/blogs, news items written by PR team, new packages etc. I also started working on the lxml scripts to parse our XML documentation, and next week I’ll plug in the design done in www-redesign repo, and improve it as possible.

PS. The report was delayed, because I’ve been offline pretty frequent due to multiple reasons. I had my last exams, which went good and I probably graduated (finally!), I had to be on another city without internet for some days, and finally, the frequent power cut in Greece (as part of the general strikes, riots and frustration of the economic crysis here) not only kept me offline, but also destroyed one of my drives in my desktop, and one of my home servers completely. I learned from that though, I follow their website for future power cuts.

Jun 11

This comes with a dealy, as I’ve been sick the past days. The LDAP related code is 90% done. It now has the following features:

  •  Login to the system (report #1 explains in detail how login works). It previously was using only the basic info (real name, primary email), but now it is configurable to use more info, where the sysadmin is able to define in the config files. This was easy to do, by creating a second dictionary to map the django user profile fields with LDAP attributes.
  • Signup. For this, an admin LDAP account is needed to be put in the config file. The admin account, contrary to other backends, is used only to create new users. Other LDAP implementations use that admin account for everything though. So, now the user declares username/password, the anon account searches if the user already exists (both the username and the email have to be unique), and if not, it creates the account, using the same dictionary to map django DB fields with LDAP attributes.
  • User settings. There are some forms that allow the user to change his data. This is done by using his own account, and not by using the admin account to do that. A second password is being created for the session, since we didn’t want to cache the regular password. (again, report #1 has more info about it).
  • Map LDAP ACL to Django groups. For that, a special multivalued attribute is used, in gentoo it is called gentooAccess, which contains some *.group entries that specify the user’s special permissions. This gives the abillity to a special team to touch other users’ data, eg infra. While the mapping is complete, the UI is not yet.

Other things that I did:

  • I set up the service in one of my home servers, so that Matt can test it too. The LDAP used there is very minimalistic.
  • I gave Robin some cfengine patches for both the webapp and the LDAP (which should be as much identical to the official as possible). They are not complete yet though. Once the webapp is up and running in vulture ( the soc.dev server) I’ll be able to test it in our official configuration.

What I’m going to do during the weekend:

  • Improve documentation (docstrings) and fire up sphinx
  • Improve logging system
  • I started writing some tests for the backend, I’m going to finish it, and plus write tests for all the above as well.
  • Create an ebuild to automate tests
  • Finish the “touch other users’ data” UI

After that, the LDAP system will be finished, and let the tests show me bugs. Next week I’ll start working on the website part, beginning with the LXML parsing of our docs.

Time to sleep, it is 0640 already here, I didn’t even realize the sun is up.

Jun 03

The past week I’ve been experimenting with LDAP mostly. I set up a clone of ldap1.gentoo.org (with fake data of course) at my home server, and gave a Cfengine patch to Robin, pending for review, in order to have the testing LDAP service to vulture (the soc.dev box). I also set up my git repo, and split the settings.py file in many files, under a settings/ dir. This is how the transifex guys and my mentor Matt do it, and this approach allows us to put the config files even in /etc for example.

My major goal was to finish the LDAP backend part, either by using an existing library or writing it from scratch. Finally, after taking a look at many libraries and implementations, I wrote my own. More specifically, what the Django LDAP authentication backend does is to override the default Django DB authentication backend. When a user logs in, the backend checks if the user exists in the LDAP server. In order to do the search, the OpenLDAP server has to provide either an account to do that (an “anonymous” account with minimal privileges, just to do those kinds of ldap queries) or allow anonymous searches. In case the account is being used, it has to be declared in the settings file. I took it one step further from what the other backends did actually: in a common ldap configuration, all users are under OU=users or something. I intend to split it though to OU=users and OU=developers, thus I allowed the backend to search in multiple organizational units, by converting the variable to a python list. If the query sends a result (meaning the user is found), then it tries to bind with the credentials provided in the login form. If it suceeds, then the user data (apart from the password) are transfered to the django db, where they are going to be used for the rest of the session. Django actually has only email, real name and username in its accounts, but it gives the opportunity to extend those by creating a profile. That is technically an extra table in the db, with the ability to add custom fields, really handy.

The major problem that I’ve faced with all those ldap backends was that they all asked for an admin account, and they performed all changes with that account. That is acceptable for user creations but not for all the other cases. If a user wants to change his data, he should be able to do it with his regular account. Another problem emerged here though. OpenLDAP asks for the password before every action and one solution was to do it that way. Bcooksley had another idea though, to create a second 50char password, which will stay only for one session, and will be destroyed using itself at the end. I liked that idea very much, as asking the password is not too user friendly in my opinion, and the web frontend looses its purpose. For important changes though the password will be required (that includes uploading a new SSH/GPG key, or resetting the password).

The LDAP backend is now working, which is really cool, I didn’t expect it to be done so fast. My next step is to write some tests and documentation, for which I’ll use sphinx. Also, I plan to continue working on the ldap web frontend, by finishing the login/signup systems, and the user settings page, and then start messing with the UI.

Jun 02

Planet KDE readers: this is not a KDE GSoC project, feel free to skip it

I’ve been accepted for this year’s GSoC yey!! Many thanks to my mentor, Matthew Summers, for all the help. I’ll do a complete rewrite of the gentoo.org website using Django. The project actually consists basically of three parts: 1) create a django web frontend for our LDAP server 2) Create a django CMS replacement for the current website that will be able to read the XML files 3) Plug in the Beacon Editor, a WYSIWYG GuideXML editor, to the web app. Below is a part of my full proposal, which can be found at google-melange and has more technical information.

Currently Gentoo employs a number of websites with separate user accounts. LDAP servers are used for the developers only. Developers update their LDAP data through a perl script. Complaints about difficult handling of forgotten passwords occur frequently. A web interface around LDAP will make LDAP more user friendly, plus it can be easily extended to support non-developers, just by adding a new schema and OU (organizational unit) for them. The choice of Django was made for several reasons. It is a web framework based on Python, that is a programming language very easy to learn and maintain. A Django project is equally easy to maintain and extend. In Django parlance, the term “project” refers to a collection of pluggable, modular applications, that together form a more complex web application, the project. This modularity will let the Gentoo Community easily write new Django applications on top of the main instance, in order to enhance its features. Apart from the above, the Django and Python upstream communities are doing a great job, notably by releasing often, especially for security reasons.

Today, documentation is stored in XML files in a CVS repository. An XML to HTML converter (wrapped up with Ruby), called Gorg, is currently used to display the actual result. People who want to contribute to documentation or translations have to locally install a Gorg instance and check out the CVS repository, which can be rather cumbersome. Having a WYSIWYG editor embedded in the gentoo.org website would be preferred, as it will make it really easy to contribute to documentation, or even send improvements through bugzilla to the appropriate documentation or translation team. By all appearances Gorg has been abandoned as a project. This has resulted in difficulty in maintaining the Gentoo website, and also makes the work of updating the website’s design or functionality all the more troublesome. Therefore, a complete redesign is easier than resurrecting the old system. The overall end result of the new system will be a complete rewrite of the old Gentoo website, including everything under www.gentoo.org

  • A new user registration system, that will import the data in the LDAP server.
  • A System Admin, with support of different group permissions, based on the current LDAP ACL, e.g. recruiters, sysadmins, documentation moderators, Gentoo developers, users. For more strict LDAP related operations, the Django Admin will be used, as it will be better especially for following the LDAP ACL. The Django Admin will be used mainly by privileged users, like Infra and Recruiters. Anyone else will be able to view and edit his data through his account profile page and a custom made System Admin. There will be no required data for users apart from a primary email address and a nickname/password.
  • A syndicator-like frontpage, that will display recent GLSAs, planet posts, PR team’s news.
  • A continuous integration testing model. There will be automated regression testing, by using Portage to build and test the package on a regular basis.
  • Support for viewing XML documents.
  • Beacon, a WYSIWYG editor, for easily editing those XML documents.
  • Git backend support for storing the XML files, using a dummy Git account.
  • An easy to translate interface. Translators will be able to select a document and translate it using Beacon.
  • Statistics for translators.
  • A “send for review” button, so that end users will be able to file a bug to Gentoo Documentation team attaching the XML diff. It needs a custom pybugz script to file the bug. The same system will be used for translations.
  • A new improved look, using JavaScript/jQuery. On this area, there will be strict adherence to web standards and accessibility best practices, since the Gentoo website and Documentation pages are being viewed by too many people every day.
  • An improved devmap, where developers will be able to click their position on the map.

The name I chose is okupy (you know, k instead of c, and py because it is a python app :P ) Well, it is not related to KDE, but I am :) Not to mention the new fashion in KDE apps not having K-names. I’m bringing a new fashion to the world, non-KDE apps with K-names.

Robin Johnson, the Gentoo Infra lead, will co-mentor me on the LDAP specific parts. Matt will be my guide for the django parts. The idea of this project was mine, inspired by both my Django/LDAP thesis project and the identity.kde.org work. I came in touch with the KDE sysadmins as well, and especially Ben Cooksley, who’s been great help so far and gave me lots of ideas already based on his experience maintaining the identity website. I’d be really happy to provide a Django LDAP that could suit both my favorite projects. I created a gsoc tag, where I’ll be posting weekly updates in Planet Gentoo, and the gentoo-soc mailing list. That’s it, time to get back to work.

Jun 01

This blog post comes about a year late, but who cares. This particular project is very important to me, as it was really fun, and my first steps to python/django and sysadmin tasks like LDAP, SVN/Git and friends. The project was written along with my friend Cephalon (Γιάννης Σπανός) under the guidance of our professor Χρήστος Σωμαράς. It is still available at cronos.teilar.gr, but you won’t be able to sign up unless you are a student at TEI of Larissa.

The Problem

My school had an irrational number of websites, providing crucial announcements. It is very hard to follow all of them (the knowledge gets lost when it is spread around), not to mention the unavailability of some during most of the weekends :P Actually, that belongs to the past I hope, I haven’t noticed anything recently, but in the past the problem was really obvious. Every one or two weekends a power cut was hapenning, even in exams periods. The web services were too many though, some students were not even aware of some (for example the career.teilar.gr website has info about job positions, grad school programs etc). Another very important problem is that most of those websites require different accounts, and I couldn’t even imagine how many students didn’t remember their credentials, and did a password reset every new semester (remember, we are not talking only about CS students here, actually they are the minority, and they are not an exception to that unfortunatelly). Last but not least, none of the mentioned websites provided RSS feeds, making it even harder to track them (some of the major ones do now though).

Brainstorming

So, what would be really useful is a web application that will combine all those announcements, in a syndicator-like page. Plus, a number of other information (like grades, the semester’s declaration, their student mail account, even a list of teachers) could be also included to that app. And, most importantly, we need a unified RSS feed for all those announcements, since some (like me) prefer desktop applications to read their news. Well, we don’t need announcements of every school or every teacher though, the student should have the ability to select what they want to view. Since the uni websites don’t provide RSS feeds, we’ll have to parse the html pages instead, and store the output in a db. Some of them require authorization first, so we’d need the user’s credentials, which should be encrypted with blowfish in order for the system to be able to reuse them when needed. Everything looks good so far, let’s move on:

Implementation

First step was to set up the server. We got a new box and needed to do everything from scrach. We installed and configured OpenLDAP, MySQL, Django with mod_python, Subversion, phpLDAPadmin, phpMyAdmin, WebSVN, and the apache vhost files. Afterwards, we started messing with pycurl and BeautifulSoup. It took us some days, but we finally had two python scripts (django standalone files actually). The first was parsing the names of the schools, the teachers, the lessons as well as the names of the web services from which we were going to collect announcements. The second was actually parsing the announcements, of about 10 websites, which were put in a database. As I said there were no RSS feeds, so we had to send thousands http requests with pycurl and parse with beautifulsoup. I know, very unreliable, but there was no better way we could think of, plus it worked just fine. Some of the websites:

  • www.teilar.gr: The most important website, it contains the announcements of all the teachers, school specific announcements, even college and various other announcements.
  • e-class.teilar.gr: Equally important. e-class is a very popular webapp among greek universities, which offers tools for better communication between teachers and students. Apart from announcements, it also provides the ability to upload and store files, like presentation files or weekly projects, a calendar, a mini-chat, contact form and others. Many teachers prefer that to announce things instead of the first one.
  • dionysos.teilar.gr: This one has the accounts of the students. This is where the student sees his grades, declares classes for the semester, etc. School wide announcements are also provided.
  • myweb.teilar.gr: webmail service (no announcements provided here obviously :P )
  • some informational only websites: LinuxTeam, PR, Library, Career, NOC and others (still adding new ones actually).

The above python scripts were put in cron.d, the first runs weekly and the second hourly. It takes about half an hour for the second to complete, and in the old box (a very old celeron laptop used as server) it took 45-50 mins. After having all that data in the db, it is now time to create the signup/login systems, and then print those data to the users. The login system took a while to get ready. We played a lot with LDAP, python-ldap and some django authentication backends to fully understand everything. We used the library ldap_groups, which pretty much had everything we wanted, although we had to tweak it a bit. In short, django allows the usage of another authorization backend instead of its ModelBackend, by adding the following in settings.py:

 
AUTHENTICATION_BACKENS = (
    'path.to.custom.backend',
    'django.contrib.auth.backends.ModelBackend',
)

When a user logs in, the custom backend (assuming it is an LDAP backend) searches for the user in the django DB first, and if not found, it searches in the LDAP server afterwards. If the user is found in the LDAP server, its data will be transfered to the django DB. With that system, the data will stay in the LDAP server (and will be used easily) even if the DB gets wiped out. Similarly, the signup system was searching for duplicate entries only in the LDAP server. The signup got a bit more complex though. The system is asking for the accounts of dionysos, e-class (optional) and webmail (optional), and in order to verify them, it uses them immediatelly, performs a pycurl and parses the output. If successfull, it also parses the student’s data, like his name, semester, registration number, grades, e-class lessons he is subscribed, and put them in the ldap server. Using the django syndicator module we were able to create a unified RSS feed of all those announcements. The pages we created were the following:

  • The first page which prints some personal information
  • A syndicator-like page with all the announcements
  • An e-class specific page, which shows the lessons a student follows, recently uploaded files, and pending deadlines for projects
  • An e-mail page, which only shows the mails
  • A library page, where the user can get results for books through the library.teilar.gr web page
  • A list of all the teachers, with the school they belong and their emails
  • An about page
  • A settings page, where the student can change his password, his credentials of the other websites, update his grades/declaration/e-class lessons list, and select the teacher/other announcements he wants to follow

The future

Our target was to merge the LDAP server we created with the one the school uses. But due to some changes the NOC did, it was impossible to do that, so I had to drop the LDAP support in our application as well, and use the DB only.

After getting it online, two other guys created a similar django project as their thesis project, which would offer online registration to labs. The original plan was to merge those projects, due to many similarities in the code (especially regarding the registration system, which was also using the same LDAP authorization), but went on a separate web app finally, diogenis.teilar.gr (by Στέφανος Χρούσης and Γιώργος Τσιώκος).

About a month ago, the two services were moved to a new box, who was put under LinuxTeam control, and features two VMs (more on this on separate blog post). We have a long todo list now, and people that have interest in contributing. The service is still online, I hope it will stay for a while, thanks to Δημήτρης Παπαπούλιος and Άλεξ-Π. Νάτσιος for taking care of the server and the VMs, Γιώργος Κούτσικος for the artwork and Γιώργος Τσιώκος for the design! Let’s hope it will stay online for much longer, and more thesis projects on top of this will follow.

May 31

Added a few more domains for the lulz:

  • ServerName blog.tampakrap.gr
  • ServerAlias theo.chatzimichos.gr
  • ServerAlias theo.chatz.gr
  • ServerAlias thodoris.chatzimichos.gr
  • ServerAlias thodoris.chatz.gr
  • ServerAlias theodore.chatzimichos.gr
  • ServerAlias theodore.chatz.gr

Also for my brother:

  • ServerName yannis.chatz.gr
  • ServerAlias yannis.chatzimichos.gr
  • ServerAlias y.chatz.gr
  • ServerAlias y.chatzimichos.gr

We’re under the same umbrella now: chatzimichos.gr. Don’t forget to look at his blog, his blog is amazing.