I had a totally bullshit research-free day today because I was busy trying to fix a bunch of SQL databases and PHP scripts living on two antiquated linux boxes (incidentially, both displaying DNS bugs I've never seen before and that had me completely stumped). I finally gave up trying to fix them and decided to install a third fresh ubuntu server machine to migrate the dbs onto. So there I was ... and if you know me, you know "so there I was" is the cue that the story turns bad at this junction.
I decided it might be nice to have X and gnome on the server, just in case someone other than me has to administer it. I'm perfectly happy with just a terminal, but the people in my lab aren't computer scientists. So I apt-get install gdm gnome, and I'll be damned if it isn't more than an hour and a half later that the bugger finishes installing all the dependencies. If you want to be able to press every button a default gnome installation comes with and get the right answer, I can see how an hour and a half's worth of dependencies are legitimately needed. If, however, you needed gnome because you're insecure about your ifconfig skills and would rather use network-manager or whatever the little thingamajiggy is called, as I expected would be the case for anyone other than me logging into this server, then you do not need CUPS and screensavers and game-data and all the other shit that gnome depends on. In fact, it will be nowhere near a tragedy if at some later point you click on the games menu folder and find it empty, or try to run the printer manager and get a message that you need to apt-get CUPS first. Far from it.
So this got me thinking about how the idea of "Just in Time", JiT for short, could be applied in this context. Let's define it first. JiT compilation combines elements of static and dynamic code compilation. In static code compilation, all the code is compiled ahead of time. If a function is never encountered in a particular execution path, then the time and resources spent compiling it have been wasted. In dynamic compilation, each unit of code, usually a statement or line, is compiled as it is encountered in the execution path. In JiT compilation the unit of compilation is increased from a line of code to, usually, a function or block. When the execution path enters the block or function, all the code in that "compilation unit" is compiled at once. In this way it is available "just in time" for its first execution (incurring some latency) and statically thereafter, since it's cached.
Now, what does any of this have to do with installation? Well, the only game in town when it comes to installation of files (which are almost exclusively either binaries, libraries or config files) supplied by a distribution is static installation, if we borrow the nomenclature of compilation. What might dynamic installation look like? Well, as soon as a request for access to a file is made, then the file is obtained. Why is this kind of silly? Well, because, while the file is the unit on the filesystem, and certainly is a unit of access, it is not really the unit of installation. Instead, the unit of installation is the "package", a .deb, .rpm , .tar.gz or whatever, containing all the files necessary to successfully do something, as well as information about dependencies on other packages.
So what might a JiT installation strategy look like? When a user first tries to execute a binary, if the binary is present on the system, it is executed, if it is not, some latency is incurred, and the package containing the binary, as well as any immediate dependencies are installed. Can the notion of an immediate dependency be defined? Perhaps it is subjective, but I do think a common-sense, useful choice can be made in most cases.
So how might something like JiT installation be implemented? The obvious way seems to me to be as a shell. Instead of running bash, one would run ubuntu-jit-bash. When a user types the name of a binary, say foo, into bash, the shell looks in the search path (which lives in $PATH) for a binary with that name. This can be simulated by running 'which' on the name of the binary, i.e. which foo. So what ubuntu-jit-bash will need to do instead of run which foo is go to a list of all binaries provided by all .deb files that live in all the apt-accessible repositories for that machine, and find the package (or packages, and this is a problem) which includes foo. Then ubuntu-jit-bash will download the package and its immediate dependencies, install them, and run the binary.
Are there potential issues? Sure there are, as always. Security, for one, but that's always a problem, so let's drop it. Dependency on the distribution and the packaging mechanism. I certainly hope serious distributions provide their package information in some legitimate database format, but I don't know this for a fact. What about binaries with multiple installation candidates? Query the user, maybe. What about the latency? Sucks, but maybe not as badly as waiting an hour and a half for every imaginable dependency to be satisfied. What about non-immediate dependencies on libraries. There could be a JiT linker. Bottom line, this gets a little complicated, but I think the JiT installation model is not a complete bust. For one, it can seamlessly coexist with the static installation model, so it need not be imposed on anyone. But it does provide a way, when there is time pressure, to proceed to add features, services and packages to a machine in a barebones, JiT way.
Maybe I'll try to implement it one of these days.
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment