EBD, EBuild Daemon

This page is a mirror of most of a blog entry from March 2005, details are still the same.

Way back in around roughly june/july of 2004, I had a crazy idea of how to kill off a bunch of bugs, and optimize regen runs- regen runs being basically a massive set of calls to bash to source an ebuild in a carefully defined environment, to get it's 'metadata', DEPENDS,RDEPENDS,SRC_URI, etc which is then store by python portage in a cache backend. This process is a bit slow- regening the full tree on a p4 2.4ghz is well over 30 minutes if the cache is empty. Why do we need this process/action? Because if you didn't have the auxdb cache (which holds said metadata), you'd have to re-source the ebuild each time, which is *slow*. Jason Stubbs pegged it around 400x slower then cache.

The regen time is directly affected by a bunch of things- for example, if an eclass used by an ebuild is modified (mtime changes), then all ebuilds that use that eclass must now be re-sourced- this is because there is no way to determine what was changed in the eclass, thus the only safe course is to go and re-source all ebuilds.

Like I was saying, it's a slow process. I figured the startup of bash, and the initialization of the bash environment for getting the metadata keys could likely be sidestepped- why do this? Because in a full regen, you must suffer the cost of bash startup/re-initialization for every ebuild. Currently, 18,913 ebuilds (8,976 distinct packages), which adds up.

Essentially, what is needed is to rewrite ebuild.sh (bash portage) such that's it's a library, and callable. More importantly, you need to be able to save and load the environment via function calls- this must work. Consider it akin to how the kernel suspends a process- the process when restarted must be the same as what it was previously. The environment between 'phases' (unpack/compile/setup/install) is the same way. More importantly, you cannot have the environment from one 'depends' phase (the specific phase that gets metadata from an ebuild) bleed into another ebuild.

So long story short, you need to load/dump environments on the fly, and contain each executing ebuild/phase such that it doesn't taint the environment for when another ebuild is processed. This is tricky, but required if you want to avoid the bash startup costs for a regen. Yes, this is a lot of crap to fix/implement for a potentially crazy idea/scheme, but I still tried it. :)

In doing env fix ups, a lot of long standing bugs were fixed also- the restructuring detailed above allows for portage to run completely from a saved env- moreso, it requires portage to run from saved envs for everything past the setup phase (exemption to this is binpkgs, which is a matter for another blog entry). This makes it such that installed ebuilds no longer rely on eclasses in the tree- they just use the saved env, which already has the eclass. This fixes bug 46223, and also allows for the clean break of forced backwards compatibility for all eclass apis/existance (detailed in glep 33). Beyond that, env attributes (export/readonly) are tracked, and a host of other naggles were nailed down and fixed. Honestly, getting the env handling right, and doing this shift fixes a *lot* of issues with ebuild processing. Continuing on however..

So env handling is now sane, a nasty collection of long standing bugs are waxed... but I started this thing because I wanted to see what could be done to speed up regens. Effectively, what I call 'ebd', ebuild-daemon, is an ebuild processor. Python portage spawns ebuild-daemon with a set of pipes into the bash side of portage. This allows the daemon and python portage to have nice little chats, including things like dumping the env straight through the pipes for an ebuild to process, notifying it what phases to process, and telling it to start processing, and report the results. This is not a one sided conversation though- bash portage can command python portage too, within limits. It allows the bash portage to -

* request a confcache be transferred in (confcache ~== global autoconf cache, speeds configures up)

* report detected sandbox failures back to the python side. Why is this needed? Because previously, sandbox errors were dumped only when the sandbox binary exited. This isn't viable with a sandbox'd ebd- you can't exit the ebd just to find out if a sandbox failure occured. So an alternative method is used.

* hijack (yes, hijack) portageq calls directly into the existing portage process. This is a major speed-up- portage import on it's own is over half a second typically. With the hijack, it's well below .1s, since all that must be done is process the request, not load the relevant portage modules, *then* process the request, then go through shutdown code.

* Not implemented yet, but the same hijack approach can be used to have useradd/groupadd run in a de-priv'ed environment, and have the request 'hijacked' back to the python process (which has higher privs). This basically allows pkg_setup to run de-prived, which is a good thing.

* Various crazy ways to improve performance, like preloading an eclass into memory if you know it's going to be used heavily (eutils for example).

"Yeah yeah yeah. Give me stats, before I wallop you with a red herring" you're thinking... ok. Original data sets, and methods are available here. Basic summary, wiped the cache between every run, ran each target 6 times via time emerge ${TARGET} --nopspinner --quiet &> /dev/null avgs the datasets, and compared the resultant run times. These stats were collected with portage 2.0.51-r2 as the base, and ebd patch 20041027-2.

targetvanilla/ebdrealusersys
timed @gnomevan00:59.1300:38.2800:18.04
timed @gnomeebd00:43.4600:28.1200:13.41
timed @dev-utilvan00:52.5900:33.2100:17.54
timed @dev-utilebd00:36.1600:23.4600:12.10
timed @sysvan02:29.4301:35.2100:49.10
timed @sysebd01:50.1001:11.5300:36.42
timed @phpvan00:38.2700:25.3400:11.12
timed @phpebd00:18.0400:12.1500:05.13
timed regenvan34:06.4621:25.4111:48.25
timed regenebd22:52.1914:23.3507:39.23