Future Proofing

I’d like to talk about future-proofing for a bit. One of the things that’s niggled at me for quite a few years has been the suffixes you find in websites. In the beginning you had either “.htm” or “.html”, and even that was annoying. It’s one thing to have everything be .html. But then (I think FrontPage or its ilk were responsible, but someone please set the record straight here) we got .htm pages as well. Now, it was a 50-50 chance that your memory of a URL was correct. In time, we’ve gotten .jsp (with the hideous jsessionid nonsense), .php, .cgi, and (God help us) .pl pages or .py pages, and probably a whole host more. Is there *any* good reason for this rubbish?

You know, technologies underlying your webpages are going to change. There’s nothing you can do about it, they just will. Slashdot is a good example of using .pl extensions. And again, why? Why use any extensions? What happens when you switch from perl to ruby or python?

The point is that the web is evolving. So, why deliberately lock down your site to embrace one web technology? So your website is up for 5 years with your ugly-assed jsp pages and you’ve gotten up there in your google rankings and what not. Then along comes your new business needs, driving your web infrastructure away from jsp to php. All of a sudden you change your extensions, and now you have to do a whole bunch of redirects. Do you see how this doesn’t scale at all? Is it just me?

Honestly, .html extensions are there needs to be, if anything at all. What I really like are no extensions whatsoever. Django embraces this idea. You get really beautiful URL’s with no extensions and other ugliness. Down the road, when you switch to RoR or Java or whatever, your URLS (gasp!) will not have to change!!

OK, that came off pretty rant-like, but the ultimate point I wanted to make is this: my employers are cool like that. They’re paying attention to the things I say 🙂 I brought up the point of future-proofing url’s, and if you look around on the site (especially the revamped DevZone, you’ll see a lot more future-proofed URLs. There’s still some ways to go before we stop exposing the technology behind our web infrastructure, but it’s a great start.

Tell me your thoughts on this — I’m especially interested in you .jsp and .php people. How do you possibly justify that nonsense?

19 thoughts on “Future Proofing”

  1. I guess it’s been that long since I’ve visited Slashdot, but I see that they’ve dispensed with their .pl URLs. How did they map from the new scheme (.shtml, for crying out loud) to the old, I wonder?

  2. The point is not that someone would actually “justify” using extensions, it is just on of those legacy things that stick around.
    People do still not think of a website as a set of resources that people might want to access (and therefore should have a decent name), but as a directory with files.

    We are just too used to the directory/file analogy and it’s always hard to think outside of the box, so many people just don’t.

  3. Well, this is an old problem. Solved by Apache IMO. No matter what backend you use, clever Apache configuration can give you consistent URLs.

    For example, I always think up of creative file extensions for websites I work on (.mnit for my institute); but stick to it throughout the website no matter what the backend is. Sure, the extensions differ across different websites; but that doesn’t matter does it?

  4. Extensions in URLs are usually caused by the website relying on a physical FS layout. So the question becomes: why do you use file extensions? Because it makes it easier to identify the type of a file. While there are other ways as well (e.g. using mime-magic) having that information directly in the name can be a very useful thing, for example to enable syntax highlighting in your editor when you work on your website.

    Of course there are ways to make the website independent from the physical FS layout, but that usually means extra work for something most people just don’t care about.

  5. tante++

    Besidesm, web frameworks are partisan. They want your site to have telltale URLs that let visitors think what framework/language is being used. Then, assuming the sites look good (or the site owner is well respected), the framework gets some credit.

    And of course the widely used proprietary frameworks, like MS tech, don’t want you to be able to switch frameworks easily.

  6. Dan nailed it pretty precisely, almost every kit exposes itself for publicity reasons.

    That is of course understandable and many people will argue that “having the website like the file layout is logical”. What they mean when they say “logical” is “I’m used to it”. People are not usually very keen on abstracting from things so abstracting from actual files is a hard thing for many to wrap their minds around.

  7. Thanks for the comments, everyone. I’m just of the mind that it seems worthwhile for a web developer (I’m not expecting it of the higher level people in an organisation who don’t care about the details and nitty-gritty, but rather hear buzzwords like “flash” and think “ooh, we have to use that”) to consider these things.

    The web is a lot richer of an experience than nautilus or windows explorer, surely. Granted, nautilus and kde’s file explorer are richer experiences with their previews than windows explorer, but that’s not the point :p

  8. Displaying extensions is also a potential security risk! PHP and Java doubtless have language-specific holes or potential vulnerabilities that crackers could try to exploit based on the .php or .jsp they see in the URL.

  9. Seemant, your concerns over extensions are moot.

    When you see a site using .php, .asp or .jsp extensions, more often than not they’re using some kind of web framework or CMS, and these tend to use not just extensions peculiar to their language, but their own URL schemes as well — resource names, hierarchies, query formats, etc.

    Now, when a site changes its web framework or CMS, not only could the extensions change, but the URL schemes will probably be quite different as well. So you’re going to need a ton of redirects anyhow, even if the extension hasn’t changed.

  10. This isn’t just another layer of abstraction but a very important one: separation of implementation (the web framework) from the interface (the URLs). I think that’s the best way to make people understand the issue quickly.

    Alex, I think youre missing the point. Not just the file extensions but the entire URL schemes should be, ideally, framework-independent. Most frameworks don’t encourage this, so you end up building a redirection layer for your very first framework, and replacing it with the framework. But at least it’ll be a well-designed redirection layer because you’ll be able to think it out in advance.

    Another point: this isn’t just about switching frameworks. It’s about having URLs visitors can remember (without bookmarking); which they can understand when they see them for the first time; and, sometmies, which they can predct and go to a new page direcly without having been there before.

    Very few big websites, today, allow any of these three things, and I think that’s bad design on their part. Compare these two extreme examples:

    http://en.wikipedia.org/wiki/Java_(programming_language)

    https://sdlc5a.sun.com:443/ECom/EComActionServlet;
    jsessionid=221C711469A4F7593641A100402675EF

    The second URL (broken here into two lines) is for downloading the Sun JRE. Look how many ugly implementation details it exposes! Even part of the dns name looks autogenerated. As if it was designed to scare away non-techie users from trying to understand or use URLs manually. And I’m not even sure I can bookmark it or put it in an ebuild’s manual download instructions – will the jsessionid vaue I got work for others, will it expire someday?

    Having autogenerated URLs for more-or-less static web pages is bad.

  11. Alex, you’d be surprised at how intuitive people do come to expect URLs to be. While subdomains are popular, and fine, a lot of people in my experience do actually try and guess the URLs.

    I’m ok with Google handling a lot of it, but the point still remains: ugly URLs are just that, UGLY; and they don’t actually add anything to the experience. Dan’s jsessionid url example is a perfect illustration.

  12. Alex: even if most people won’t benefit from it, that’s no reason not to design things well so that people like you and me do benefit. After all, there’s a lot of stuff in my Linux system most people wouldn’t understand!

    Of course there’s no magic answer that always works. You have to design a URL scheme appropriate to your site. And the user has to know what content to expect in order to be able to predict URLs. But far too many sites don’t do even obvious and simple things that could be easily achieved with some trivial URL rewriting.

    Take amazon.com. A very popular site and many other sites link to it. Would it kill them to support URLs of the form “www.amazon.com/isbn/[ISBN]”? Or “www.amazon.com/search/[terms]”? No, I have to go to their frontpage, locate the search field and use that. They’re the owners of the one-click patent, but it seems they don’t understand how important it is to let people find books easily as well as buy them. Even if only 10% of their clientelle profts by it.

    I mean, how ridiculous is this:

    Homepage: http://www.amazon.com

    Books section homepage (multiple lines): http://www.amazon.com/
    books-used-books-textbooks/b/
    ref=gw_br_bo/103-7715330-7318231
    ?%5Fencoding=UTF8&node=283155

    Don’t tell me you can’t easily improve on that with existing tech!

  13. Here is our fundamental point of disagreement then.

    You guys are saying, “Let’s use the URL according to its original conception as a human-grokkable string.” A noble goal, but unfortunately we humans aren’t very good at taking strings which have a technical purpose and making them grokkable by both machines and humans. XML is one of the latest failures in this struggle — machines aren’t very adept at parsing it, and most people hate to read it.

    My position is that we should just abandon this URL-monkeying approach as a limitation and embrace higher-level interfaces to the URL (hyperlinks, menus, Google, etc.). This to me seems an area not fully explored and gives me at least some hope we might finally discover adequate solutions. 🙂

  14. Dan:
    Amazon is a particulary bad example – because it actually wants to encode lots of hidden information about the user obfuscated in the url. There is no easier way to link “consumer interests” and users …
    (For example, if you are logged in and find an interesting book, mail it to a friend who opens the link -> amazons profiler links your user profiles and is able to do better targeting of ads and products …)

  15. Alex:

    we humans aren’t very good at taking strings which have a technical purpose and making them grokkable by both machines and humans. XML is one of the latest failures in this struggle — machines aren’t very adept at parsing it, and most people hate to read it.

    Counterexample: filesystem paths. Counterexample: source code. There’s lots of stuff which is perfectly well grokkable by both machines and people, failures like XML notwithstanding. And URLs should be compared to filenames rather than the far-more-complex XML.

    Bj

  16. RoR is a pretty bad example I think. I’m pretty sure it relies heavily on a certain url format (something like /app/action/object/, although I may be wrong).

    Django is awesome. Apache’s mod_rewrite is a solution, but it has some significant downsides (too much voodoo).

    P.S. If you want a buzzword, they have one: REST

  17. Dan: I don’t agree with your counter-examples.

    Counterexample: filesystem paths.

    My mother has been using computers for 25 years, and she even has trouble navigating filesystem hierarchies that she herself devised. The problem is the filesystem hierarchy scheme relies heavily on one’s ability to remember hierarchical associations, and if you make the hierarchy too broad or too deep in areas, it places significant demand on the user’s memory. This format is too easily (and too often) abused to be considered a success IMO.

    When you design a human/machine-grokkable string format, you need to take into account that many people (especially the middle aged and elderly) are to varying degrees challenged in the memory department.

    Counterexample: source code. There’s lots of stuff which is perfectly well grokkable by both machines and people,

    Does your mother grok source code? Mine doesn’t very easily, and she used to program.

    For that matter, can you, being an actual coder, usually grok source code written by others, without first having to study it somewhat? (Have you had a look around CPAN?)

  18. You took my words out of context. I wasn’t saying all computer or web users understand code, or even paths. I am saying there are things, like paths, which a large majority of users does understand. And you can make URLs at least as simple as ordinary filesystem paths.

    It would be a lot better than the situation today. How many people can really understand the URLs generated by amazon.com in the example above? And even those few who can understand them perfectly may not be able to type in URLs manually to navigate directly to a page (session id generated by the server is required), or to store a bookmark if a URL includes a time-limited token. These aren’t failures of understanding, but of technology, and can be remedied by better site design.

  19. I think you guys have missed the point, Why not forget all about url from a user interface perspective. Not only is the extension dictated by the underlying technology, but the whole concept of a url in the first place is dictated by the technology! Lets improve the technology so the use of the url bar is no longer nescessary.

    I want to type in/remember a URL no more than I want to type in / remember a phone number!

Comments are closed.