wiki:SearchSpeed

My (marienz) answer to "Why does pkgcore not provide a fast "emerge -S" like eix, esearch and others do" and similar questions:

Because the cache esearch (and afaik also eix) uses is way too specific. To explain this properly a little bit of background on the various cache(s) involved is necessary.

The normal cache (used by both portage and pkgcore) contains various pieces of metadata (description, dependencies, etc) for every ebuild in simple flat text files, stored in directories and files named after the ebuild category and ebuild name. Contrary to popular belief, this approach is quite fast as long as you know the ebuild category and name and need to look up the other data. Think of it as a database table indexed by category/package-version. Normal operations like "emerge -e world" will actually not be much faster if you use a db backend instead of the standard flat file backend.

Where the flat file backend breaks down is for things like description searches. For those it has to walk through the entire cache, opening thousands of files to look at their description text. Tools like esearch (and probably eix too) "solve" this by keeping their own secondary cache around, containing a subset of the data in the full cache that can be opened and searched much faster than the "full" cache can be. The disadvantage of this approach is it only speeds up a handful of specific operations, and requires those to go through a different cache lookup mechanism. Things like --revdep can not be sped up this way, unless a third specific extra cache is introduced. This gets really messy really fast.

(An extra disadvantage of those external tools is they cannot regenerate their own cache if the ebuild changes: they rely on the "normal" cache being up to date. Integrating the secondary cache into pkgcore would solve that, but keeping both caches up to date would still be rather messy).

A "proper" db (sqlite is the most obvious example) with the proper indices could handle all this nicely, but this requires adding a mechanism for converting pkgcore's cache lookups into sqlite queries. Implementing a trivial sqlite cache backend is easy (there is one in the tree, although it may have bitrotted a bit), but this would convert "pquery -S pkgcore" into an sql "SELECT *" and do the actual matching in python, so it does not buy you any speed. To speed things up the cache backend has to "translate" (part of) the restriction object handed to it into an sql query so the db backend does more of the hard work. This could then also speed up things like --revdep by doing part of the work in the db backend.