| 1 | ======================================================= |
|---|
| 2 | How to use guppy/heapy for tracking down memory usage |
|---|
| 3 | ======================================================= |
|---|
| 4 | |
|---|
| 5 | This is a work in progress. It will grow a bit and it may not be |
|---|
| 6 | entirely accurate everywhere. |
|---|
| 7 | |
|---|
| 8 | Tutorial of sorts |
|---|
| 9 | ================= |
|---|
| 10 | |
|---|
| 11 | All this was done on a checkout of |
|---|
| 12 | marienz@gentoo.org-20060908111256-540d8fb3db5b337e, you should be able |
|---|
| 13 | to check that out and follow along using something like:: |
|---|
| 14 | |
|---|
| 15 | bzr revert -rrevid:marienz@gentoo.org-20060908111256-540d8fb3db5b337e |
|---|
| 16 | |
|---|
| 17 | in a pkgcore branch. |
|---|
| 18 | |
|---|
| 19 | Heapy is powerful but has a learning curve. Problems are the |
|---|
| 20 | documentation (http://guppy-pe.sourceforge.net/heapy_Use.html among |
|---|
| 21 | others) is a bit unusual and there are various dynamic importing and |
|---|
| 22 | other tricks in use that mean things like dir() are less helpful than |
|---|
| 23 | they are on more "normal" python objects. This document's main purpose |
|---|
| 24 | is to show you how to ask heapy various kinds of questions. It may or |
|---|
| 25 | may not show a few cases where pkgcore uses more memory than it should |
|---|
| 26 | too. |
|---|
| 27 | |
|---|
| 28 | First, get an x86. Heapy currently does not like 64 bit archs much. |
|---|
| 29 | |
|---|
| 30 | Emerge it:: |
|---|
| 31 | |
|---|
| 32 | emerge guppy |
|---|
| 33 | |
|---|
| 34 | Fire up an interactive python prompt, set stuff up:: |
|---|
| 35 | |
|---|
| 36 | >>> from guppy import hpy |
|---|
| 37 | >>> from pkgcore.config import load_config |
|---|
| 38 | >>> c = load_config() |
|---|
| 39 | >>> hp = hpy() |
|---|
| 40 | |
|---|
| 41 | Just to show how annoying heapy's internal tricks are:: |
|---|
| 42 | |
|---|
| 43 | >>> dir(hp) |
|---|
| 44 | ['__doc__', '__getattr__', '__init__', '__module__', '__setattr__', '_hiding_tag_', '_import', '_name', '_owner', '_share'] |
|---|
| 45 | >>> help(hp) |
|---|
| 46 | Help on class _GLUECLAMP_ in module guppy.etc.Glue: |
|---|
| 47 | |
|---|
| 48 | _GLUECLAMP_ = <guppy.heapy.Use interface at 0x-484b8554> |
|---|
| 49 | |
|---|
| 50 | This object is your "starting point", but as you can see the |
|---|
| 51 | underlying machinery is not giving away any useful usage instructions. |
|---|
| 52 | |
|---|
| 53 | Do everything that allocates some memory but is not the problem you |
|---|
| 54 | are tracking down now. Then do:: |
|---|
| 55 | |
|---|
| 56 | >>> hp.setrelheap() |
|---|
| 57 | |
|---|
| 58 | Everything allocated before this call will not be in the data sets you |
|---|
| 59 | get later. |
|---|
| 60 | |
|---|
| 61 | Now do your memory-intensive thing:: |
|---|
| 62 | |
|---|
| 63 | >>> l = list(x for x in c.repo["portdir"] if x.data) |
|---|
| 64 | |
|---|
| 65 | Keep an eye on system memory consumption. You want to use up a lot but |
|---|
| 66 | not all of your system ram for nicer statistics. The python process |
|---|
| 67 | was eating about 109M res in top when the above stuff finished, which |
|---|
| 68 | is pretty good (for my 512mb ram box). |
|---|
| 69 | |
|---|
| 70 | :: |
|---|
| 71 | |
|---|
| 72 | >>> h = hp.heap() |
|---|
| 73 | |
|---|
| 74 | The fun one. This object is basically a snapshot of what's reachable |
|---|
| 75 | in ram (minus the stuff excluded through setrelheap earlier) which you |
|---|
| 76 | can do various fun tricks with. Its str() is a summary:: |
|---|
| 77 | |
|---|
| 78 | >>> h |
|---|
| 79 | Partition of a set of 1449133 objects. Total size = 102766644 bytes. |
|---|
| 80 | Index Count % Size % Cumulative % Kind (class / dict of class) |
|---|
| 81 | 0 985931 68 46300932 45 46300932 45 str |
|---|
| 82 | 1 24681 2 22311624 22 68612556 67 dict of pkgcore.ebuild.ebuild_src.package |
|---|
| 83 | 2 49391 3 21311864 21 89924420 88 dict (no owner) |
|---|
| 84 | 3 115974 8 3776948 4 93701368 91 tuple |
|---|
| 85 | 4 152181 11 3043616 3 96744984 94 long |
|---|
| 86 | 5 36009 2 1584396 2 98329380 96 weakref.KeyedRef |
|---|
| 87 | 6 11328 1 1540608 1 99869988 97 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace |
|---|
| 88 | 7 24702 2 889272 1 100759260 98 types.MethodType |
|---|
| 89 | 8 11424 1 851840 1 101611100 99 list |
|---|
| 90 | 9 24681 2 691068 1 102302168 100 pkgcore.ebuild.ebuild_src.package |
|---|
| 91 | <54 more rows. Type e.g. '_.more' to view.> |
|---|
| 92 | |
|---|
| 93 | (You might want to keep an eye on ram usage: heapy made the process |
|---|
| 94 | grow another dozen mb here. It gets painfully slow if it starts |
|---|
| 95 | swapping, so if that happens reduce your data set). |
|---|
| 96 | |
|---|
| 97 | Notice the "Total size" in the top right: about 100M. That's what we |
|---|
| 98 | need to compare later numbers with. |
|---|
| 99 | |
|---|
| 100 | So here we can see that (surprise!) we have a ton of strings in |
|---|
| 101 | memory. We also have various kinds of dicts. Dicts are treated a bit |
|---|
| 102 | specially: the "dict of pkgcore.ebuild.ebuild_src.package" simply |
|---|
| 103 | means "all the dicts that are __dict__ attributes of instances of that |
|---|
| 104 | class". "dict (no owner)" are all the dicts that are not used as |
|---|
| 105 | __dict__ attribute. |
|---|
| 106 | |
|---|
| 107 | You probably guessed what you can use "index" for:: |
|---|
| 108 | |
|---|
| 109 | >>> h[0] |
|---|
| 110 | Partition of a set of 985931 objects. Total size = 46300932 bytes. |
|---|
| 111 | Index Count % Size % Cumulative % Kind (class / dict of class) |
|---|
| 112 | 0 985931 100 46300932 100 46300932 100 str |
|---|
| 113 | |
|---|
| 114 | Ok, that looks pretty useless, but it really is not. The "sets" heapy |
|---|
| 115 | gives you (like "h" and "h[0]") are a bunch of objects, grouped |
|---|
| 116 | together by an "equivalence relation". The default one (with the crazy |
|---|
| 117 | name "Clodo" for "Class or dict owner") groups together all objects of |
|---|
| 118 | the same class and dicts with the same owner. We can also partition |
|---|
| 119 | the sets by a different equivalence relation. Let's do a silly example |
|---|
| 120 | first:: |
|---|
| 121 | |
|---|
| 122 | >>> h.bytype |
|---|
| 123 | Partition of a set of 1449133 objects. Total size = 102766644 bytes. |
|---|
| 124 | Index Count % Size % Cumulative % Type |
|---|
| 125 | 0 985931 68 46300932 45 46300932 45 str |
|---|
| 126 | 1 85556 6 45226592 44 91527524 89 dict |
|---|
| 127 | 2 115974 8 3776948 4 95304472 93 tuple |
|---|
| 128 | 3 152181 11 3043616 3 98348088 96 long |
|---|
| 129 | 4 36009 2 1584396 2 99932484 97 weakref.KeyedRef |
|---|
| 130 | 5 24702 2 889272 1 100821756 98 types.MethodType |
|---|
| 131 | 6 11424 1 851840 1 101673596 99 list |
|---|
| 132 | 7 24681 2 691068 1 102364664 100 pkgcore.ebuild.ebuild_src.package |
|---|
| 133 | 8 11328 1 317184 0 102681848 100 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace |
|---|
| 134 | 9 408 0 26112 0 102707960 100 types.CodeType |
|---|
| 135 | <32 more rows. Type e.g. '_.more' to view.> |
|---|
| 136 | |
|---|
| 137 | As you can see this is the same thing as the default view, but with |
|---|
| 138 | all the dicts lumped together. A more useful one is:: |
|---|
| 139 | |
|---|
| 140 | >>> h.byrcs |
|---|
| 141 | Partition of a set of 1449133 objects. Total size = 102766644 bytes. |
|---|
| 142 | Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) |
|---|
| 143 | 0 870779 60 43608088 42 43608088 42 dict (no owner) |
|---|
| 144 | 1 24681 2 22311624 22 65919712 64 pkgcore.ebuild.ebuild_src.package |
|---|
| 145 | 2 221936 15 20575932 20 86495644 84 dict of pkgcore.ebuild.ebuild_src.package |
|---|
| 146 | 3 242236 17 8588560 8 95084204 93 tuple |
|---|
| 147 | 4 6 0 1966736 2 97050940 94 dict of weakref.WeakValueDictionary |
|---|
| 148 | 5 36009 2 1773024 2 98823964 96 dict (no owner), dict of |
|---|
| 149 | pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef |
|---|
| 150 | 6 11328 1 1540608 1 100364572 98 pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace |
|---|
| 151 | 7 26483 2 800432 1 101165004 98 list |
|---|
| 152 | 8 11328 1 724992 1 101889996 99 dict of pkgcore.ebuild.ebuild_src.ThrowAwayNameSpace |
|---|
| 153 | 9 3 0 393444 0 102283440 100 dict of pkgcore.repository.prototype.IterValLazyDict |
|---|
| 154 | <132 more rows. Type e.g. '_.more' to view.> |
|---|
| 155 | |
|---|
| 156 | What this does is: |
|---|
| 157 | |
|---|
| 158 | - for every object, find all its referrers |
|---|
| 159 | - Classify those referrers using the "Clodo" relation you saw earlier |
|---|
| 160 | - Create a set of those classifiers of referrers. That means a set of |
|---|
| 161 | things like "tuple, dict of someclass", *not* of actual referring objects. |
|---|
| 162 | - Group together all the objects with the same set of classifiers of referrers. |
|---|
| 163 | |
|---|
| 164 | So now we know that we have a lot of objects referenced *only* by one |
|---|
| 165 | or more dicts (still not very useful) and also a lot of them |
|---|
| 166 | referenced by one "normal" dict, referenced by the dict of (meaning |
|---|
| 167 | "an attribute of") ebuild_src.package, and referenced by a WeakRef. |
|---|
| 168 | Hmm, I wonder what those are. But let's store this view of the data |
|---|
| 169 | first, since it took a while to generate ("_" is a feature of the |
|---|
| 170 | python interpreter, it's always the last result):: |
|---|
| 171 | |
|---|
| 172 | >>> byrcs = _ |
|---|
| 173 | >>> byrcs[5] |
|---|
| 174 | Partition of a set of 36009 objects. Total size = 1773024 bytes. |
|---|
| 175 | Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) |
|---|
| 176 | 0 36009 100 1773024 100 1773024 100 dict (no owner), dict of |
|---|
| 177 | pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef |
|---|
| 178 | |
|---|
| 179 | Erm, yes, we knew that already. If you look in the top right of the |
|---|
| 180 | table you can see it is still grouping the items by the kind of their |
|---|
| 181 | referrer, which is not very useful here. To get more information we |
|---|
| 182 | can change what they are grouped by:: |
|---|
| 183 | |
|---|
| 184 | >>> byrcs[5].byclodo |
|---|
| 185 | Partition of a set of 36009 objects. Total size = 1773024 bytes. |
|---|
| 186 | Index Count % Size % Cumulative % Kind (class / dict of class) |
|---|
| 187 | 0 36009 100 1773024 100 1773024 100 str |
|---|
| 188 | >>> byrcs[5].bysize |
|---|
| 189 | Partition of a set of 36009 objects. Total size = 1773024 bytes. |
|---|
| 190 | Index Count % Size % Cumulative % Individual Size |
|---|
| 191 | 0 10190 28 489120 28 489120 28 48 |
|---|
| 192 | 1 7584 21 394368 22 883488 50 52 |
|---|
| 193 | 2 7335 20 322740 18 1206228 68 44 |
|---|
| 194 | 3 3947 11 221032 12 1427260 80 56 |
|---|
| 195 | 4 3364 9 134560 8 1561820 88 40 |
|---|
| 196 | 5 1903 5 114180 6 1676000 95 60 |
|---|
| 197 | 6 877 2 56128 3 1732128 98 64 |
|---|
| 198 | 7 285 1 19380 1 1751508 99 68 |
|---|
| 199 | 8 451 1 16236 1 1767744 100 36 |
|---|
| 200 | 9 57 0 4104 0 1771848 100 72 |
|---|
| 201 | |
|---|
| 202 | |
|---|
| 203 | This took the set of objects with that odd set of referrers and |
|---|
| 204 | redisplayed them grouped by "clodo". So now we know they're all |
|---|
| 205 | strings. Most of them are pretty small too. To get some idea of what |
|---|
| 206 | we're dealing with we can pull some random examples out:: |
|---|
| 207 | |
|---|
| 208 | >>> byrcs[5].byid |
|---|
| 209 | Set of 36009 <str> objects. Total size = 1773024 bytes. |
|---|
| 210 | Index Size % Cumulative % Representation (limited) |
|---|
| 211 | 0 80 0.0 80 0.0 'media-plugin...re20051219-r1' |
|---|
| 212 | 1 76 0.0 156 0.0 'app-emulatio...4.20041102-r1' |
|---|
| 213 | 2 76 0.0 232 0.0 'dev-php5/ezc...hemaTiein-1.0' |
|---|
| 214 | 3 76 0.0 308 0.0 'games-misc/f...wski-20030120' |
|---|
| 215 | 4 76 0.0 384 0.0 'mail-client/...pt-viewer-0.8' |
|---|
| 216 | 5 76 0.0 460 0.0 'media-fonts/...-100dpi-1.0.0' |
|---|
| 217 | 6 76 0.0 536 0.0 'media-plugin...gdemux-0.10.4' |
|---|
| 218 | 7 76 0.0 612 0.0 'media-plugin...3_pre20051219' |
|---|
| 219 | 8 76 0.0 688 0.0 'media-plugin...3_pre20051219' |
|---|
| 220 | 9 76 0.0 764 0.0 'media-plugin...3_pre20060502' |
|---|
| 221 | >>> byrcs[5].byid[0].theone |
|---|
| 222 | 'media-plugins/vdr-streamdev-server-0.3.3_pre20051219-r1' |
|---|
| 223 | |
|---|
| 224 | A pattern emerges! (sets with one item have a "theone" attribute with |
|---|
| 225 | the actual item, all sets have a "nodes" attribute that returns an |
|---|
| 226 | iterator yielding the items). |
|---|
| 227 | |
|---|
| 228 | We could have used another heapy trick to get a better idea of what |
|---|
| 229 | kind of string this was:: |
|---|
| 230 | |
|---|
| 231 | >>> byrcs[5].byvia |
|---|
| 232 | Partition of a set of 36009 objects. Total size = 1773024 bytes. |
|---|
| 233 | Index Count % Size % Cumulative % Referred Via: |
|---|
| 234 | 0 1 0 80 0 80 0 "['cpvstr']", '.key', '.keys()[23147]' |
|---|
| 235 | 1 1 0 76 0 156 0 "['cpvstr']", '.key', '.keys()[12285]' |
|---|
| 236 | 2 1 0 76 0 232 0 "['cpvstr']", '.key', '.keys()[12286]' |
|---|
| 237 | 3 1 0 76 0 308 0 "['cpvstr']", '.key', '.keys()[16327]' |
|---|
| 238 | 4 1 0 76 0 384 0 "['cpvstr']", '.key', '.keys()[17754]' |
|---|
| 239 | 5 1 0 76 0 460 0 "['cpvstr']", '.key', '.keys()[19079]' |
|---|
| 240 | 6 1 0 76 0 536 0 "['cpvstr']", '.key', '.keys()[21704]' |
|---|
| 241 | 7 1 0 76 0 612 0 "['cpvstr']", '.key', '.keys()[23473]' |
|---|
| 242 | 8 1 0 76 0 688 0 "['cpvstr']", '.key', '.keys()[24239]' |
|---|
| 243 | 9 1 0 76 0 764 0 "['cpvstr']", '.key', '.keys()[3070]' |
|---|
| 244 | <35999 more rows. Type e.g. '_.more' to view.> |
|---|
| 245 | |
|---|
| 246 | Ouch, 36009 total rows for 36009 objects. What this did is similar to |
|---|
| 247 | what "byrcs" did: for every object in the set it determined how they |
|---|
| 248 | can be reached through their referrers, then groups objects that can |
|---|
| 249 | be reached in the same ways together. Unfortunately it is grouping |
|---|
| 250 | everything reachable as a dictionary key differently, so this is not |
|---|
| 251 | very useful. |
|---|
| 252 | |
|---|
| 253 | XXX WTF XXX |
|---|
| 254 | |
|---|
| 255 | It is not likely this accomplishes anything, but let's assume we want |
|---|
| 256 | to know if there are any objects in this set *not* reachable as the |
|---|
| 257 | "key" attribute. Heapy can tell us (although this is *very* slow... |
|---|
| 258 | there might be a better way but I do not know it yet):: |
|---|
| 259 | |
|---|
| 260 | >>> nonkeys = byrcs[5] & hp.Via('.key').alt('<') |
|---|
| 261 | >>> nonkeys.byrcs |
|---|
| 262 | hp.Nothing |
|---|
| 263 | |
|---|
| 264 | (remember "hp" was our main entrance into heapy, the object that gave |
|---|
| 265 | us the set of all objects we're interested in earlier). |
|---|
| 266 | |
|---|
| 267 | What does this do? "hp.Via('.key')" creates a "symbolic set" of "all |
|---|
| 268 | objects reachable *only* as the 'key' attribute of something" (it's a |
|---|
| 269 | "symbolic set" because there are no actual objects in it). The "alt" |
|---|
| 270 | method gives us a new symbolic set of everything reachable via "less |
|---|
| 271 | than" this way. We then intersect this with our set and discover there |
|---|
| 272 | is nothing left. |
|---|
| 273 | |
|---|
| 274 | A similar construct that does not do what we want is:: |
|---|
| 275 | |
|---|
| 276 | >>> nonkeys = byrcs[5] & ~hp.Via('.key') |
|---|
| 277 | |
|---|
| 278 | The "~" operator inverts the symbolic set, giving a set matching |
|---|
| 279 | everything not reachable *exactly* as a "key" attribute. The key word |
|---|
| 280 | here is "exactly": since everything in our set was also reachable in |
|---|
| 281 | two other ways this intersection matches everything. |
|---|
| 282 | |
|---|
| 283 | Ok, let's get back to the stuff actually eating memory:: |
|---|
| 284 | |
|---|
| 285 | >>> h[0].byrcs |
|---|
| 286 | Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) |
|---|
| 287 | 0 670791 68 31716096 68 31716096 68 dict (no owner) |
|---|
| 288 | 1 139232 14 6525856 14 38241952 83 tuple |
|---|
| 289 | 2 136558 14 6042408 13 44284360 96 dict of pkgcore.ebuild.ebuild_src.package |
|---|
| 290 | 3 36009 4 1773024 4 46057384 99 dict (no owner), dict of |
|---|
| 291 | pkgcore.ebuild.ebuild_src.package, weakref.KeyedRef |
|---|
| 292 | 4 1762 0 107772 0 46165156 100 list |
|---|
| 293 | 5 824 0 69476 0 46234632 100 types.CodeType |
|---|
| 294 | 6 140 0 31312 0 46265944 100 function, tuple |
|---|
| 295 | 7 194 0 11504 0 46277448 100 dict of module |
|---|
| 296 | 8 30 0 6284 0 46283732 100 dict of type |
|---|
| 297 | 9 55 0 1972 0 46285704 100 dict of module, tuple |
|---|
| 298 | |
|---|
| 299 | Remember h[0] gave us all str objects, so this is all string objects |
|---|
| 300 | grouped by the kind(s) of their referrers. Also notice index 3 here is |
|---|
| 301 | the same set of stuff we saw earlier:: |
|---|
| 302 | |
|---|
| 303 | >>> h[0].byrcs[3] ^ byrcs[5] |
|---|
| 304 | hp.Nothing |
|---|
| 305 | |
|---|
| 306 | Most operators do what you would expect, & intersects for example. |
|---|
| 307 | |
|---|
| 308 | "We have a lot of strings in dicts" is not that useful either, let's |
|---|
| 309 | see if we can narrow that down a little:: |
|---|
| 310 | |
|---|
| 311 | >>> h[0].byrcs[0].referrers.byrcs |
|---|
| 312 | Partition of a set of 44124 objects. Total size = 18636768 bytes. |
|---|
| 313 | Index Count % Size % Cumulative % Referrers by Kind (class / dict of class) |
|---|
| 314 | 0 24681 56 12834120 69 12834120 69 dict of pkgcore.ebuild.ebuild_src.package |
|---|
| 315 | 1 19426 44 5371024 29 18205144 98 dict (no owner) |
|---|
| 316 | 2 1 0 393352 2 18598496 100 dict of pkgcore.repository.prototype.IterValLazyDict |
|---|
| 317 | 3 1 0 6280 0 18604776 100 __builtin__.set |
|---|
| 318 | 4 1 0 6280 0 18611056 100 dict of module, guppy.heapy.heapyc.RootStateType |
|---|
| 319 | 5 1 0 6280 0 18617336 100 dict of pkgcore.ebuild.eclass_cache.cache |
|---|
| 320 | 6 1 0 6280 0 18623616 100 dict of |
|---|
| 321 | pkgcore.repository.prototype.PackageIterValLazyDict |
|---|
| 322 | 7 4 0 5536 0 18629152 100 type |
|---|
| 323 | 8 4 0 3616 0 18632768 100 dict of type |
|---|
| 324 | 9 1 0 1672 0 18634440 100 dict of module, dict of os._Environ |
|---|
| 325 | |
|---|
| 326 | (Broken down: h[0].byrcs[0] is the set of all str objects referenced |
|---|
| 327 | only by dicts, h[0].byrcs[0].referrers is the set of those dicts, and |
|---|
| 328 | the final .byrcs displays those dicts grouped by *their* referrers) |
|---|
| 329 | |
|---|
| 330 | Keep an eye on the size column. We have over 12M worth of just dicts |
|---|
| 331 | (not counting the stuff in them) referenced only as attribute of |
|---|
| 332 | ebuild_src.package. If we include the stuff kept alive by those dicts |
|---|
| 333 | we're talking about a big chunk of the 100MB total here:: |
|---|
| 334 | |
|---|
| 335 | >>> t = _ |
|---|
| 336 | >>> t[0].domisize |
|---|
| 337 | 61269552 |
|---|
| 338 | |
|---|
| 339 | 60M out of our 100M would be deallocated if we killed those dicts. So |
|---|
| 340 | let's ask heapy what dicts that are:: |
|---|
| 341 | |
|---|
| 342 | >>> t[0].byvia |
|---|
| 343 | Partition of a set of 24681 objects. Total size = 12834120 bytes. |
|---|
| 344 | Index Count % Size % Cumulative % Referred Via: |
|---|
| 345 | 0 24681 100 12834120 100 12834120 100 "['data']" |
|---|
| 346 | |
|---|
| 347 | (it is easy to get confused by the "byrcs" view of our "t". t[0] is |
|---|
| 348 | *not* a bunch of "dict of ebuild_src.package". It is a bunch of dicts |
|---|
| 349 | with strings in them, namely those that are *referred to* by the dict |
|---|
| 350 | of ebuild_src.package, and not by anything else. So the byvia output |
|---|
| 351 | means those dicts with strings in them are all "data" attributes of |
|---|
| 352 | ebuild_src.package instances). |
|---|
| 353 | |
|---|
| 354 | (sidenote: earlier we saw byvia say ".key", now it says "['data']". |
|---|
| 355 | It's different because the previous type used __slots__ (so there was |
|---|
| 356 | no "dict of" involved) and this type does not (so there is a "dict of" |
|---|
| 357 | and our dicts are the "data" key in it). |
|---|
| 358 | |
|---|
| 359 | So what is in the dicts:: |
|---|
| 360 | |
|---|
| 361 | >>> t[0].referents |
|---|
| 362 | Partition of a set of 605577 objects. Total size = 34289392 bytes. |
|---|
| 363 | Index Count % Size % Cumulative % Kind (class / dict of class) |
|---|
| 364 | 0 556215 92 27710068 81 27710068 81 str |
|---|
| 365 | 1 24681 4 6085704 18 33795772 99 dict (no owner) |
|---|
| 366 | 2 24681 4 493620 1 34289392 100 long |
|---|
| 367 | >>> _.byvia |
|---|
| 368 | Partition of a set of 605577 objects. Total size = 34289392 bytes. |
|---|
| 369 | Index Count % Size % Cumulative % Referred Via: |
|---|
| 370 | 0 24681 4 6085704 18 6085704 18 "['_eclasses_']" |
|---|
| 371 | 1 21954 4 3742976 11 9828680 29 "['DEPEND']" |
|---|
| 372 | 2 22511 4 3300052 10 13128732 38 "['RDEPEND']" |
|---|
| 373 | 3 24202 4 2631304 8 15760036 46 "['SRC_URI']" |
|---|
| 374 | 4 24681 4 1831668 5 17591704 51 "['DESCRIPTION']" |
|---|
| 375 | 5 24674 4 1476680 4 19068384 56 "['HOMEPAGE']" |
|---|
| 376 | 6 24681 4 1297680 4 20366064 59 "['KEYWORDS']" |
|---|
| 377 | 7 24681 4 888516 3 21254580 62 '.keys()[3]' |
|---|
| 378 | 8 24681 4 888516 3 22143096 65 '.keys()[9]' |
|---|
| 379 | 9 24681 4 810108 2 22953204 67 "['LICENSE']" |
|---|
| 380 | <32 more rows. Type e.g. '_.more' to view.> |
|---|
| 381 | |
|---|
| 382 | Strings, nested dicts and longs, and most size eaten up by the |
|---|
| 383 | "_eclasses_" values. There is also a significant amount eaten up by |
|---|
| 384 | keys values, which is a bit odd, so let's investigate:: |
|---|
| 385 | |
|---|
| 386 | >>> refs = t[0].referents |
|---|
| 387 | >>> i=iter(refs.byvia[7].nodes) |
|---|
| 388 | >>> i.next() |
|---|
| 389 | 'DESCRIPTION' |
|---|
| 390 | >>> i.next() |
|---|
| 391 | 'DESCRIPTION' |
|---|
| 392 | >>> i.next() |
|---|
| 393 | 'DESCRIPTION' |
|---|
| 394 | >>> i.next() |
|---|
| 395 | 'DESCRIPTION' |
|---|
| 396 | >>> i.next() |
|---|
| 397 | 'DESCRIPTION' |
|---|
| 398 | |
|---|
| 399 | Eep! |
|---|
| 400 | |
|---|
| 401 | :: |
|---|
| 402 | |
|---|
| 403 | >>> refs.byvia[7].bysize |
|---|
| 404 | Partition of a set of 24681 objects. Total size = 888516 bytes. |
|---|
| 405 | Index Count % Size % Cumulative % Individual Size |
|---|
| 406 | 0 24681 100 888516 100 888516 100 36 |
|---|
| 407 | |
|---|
| 408 | It looks like we have 24681 identical strings here, using up about 1M |
|---|
| 409 | of memory. The other odd entry is the '_eclasses_' string apparently. |
|---|
| 410 | |
|---|
| 411 | Extra stuff for c extension developers |
|---|
| 412 | ====================================== |
|---|
| 413 | |
|---|
| 414 | To provide accurate statistics if your code uses extension types you |
|---|
| 415 | must provide heapy with a way to get the following data for your |
|---|
| 416 | custom types: |
|---|
| 417 | |
|---|
| 418 | - How large is a certain instance? |
|---|
| 419 | - What objects does an instance contain? |
|---|
| 420 | - How does the instance refer to a contained object? |
|---|
| 421 | |
|---|
| 422 | You provide these through a NyHeapDef struct, defined in heapdef.h in |
|---|
| 423 | the guppy source. This header is not installed, so you should just |
|---|
| 424 | copy it into your source tree. It is a good idea to read this header |
|---|
| 425 | file side by side with the following descriptions, since it contains |
|---|
| 426 | details omitted here. The stdtypes.c file contains implementations for |
|---|
| 427 | the basic python types which you can read for inspiration. |
|---|
| 428 | |
|---|
| 429 | The NyHeapDef struct provides heapy with three function pointers: |
|---|
| 430 | |
|---|
| 431 | SizeGetter |
|---|
| 432 | ---------- |
|---|
| 433 | |
|---|
| 434 | To answer "how large is an instance" you provide a |
|---|
| 435 | NyHeapDef_SizeGetter function that is called with a PyObject* and |
|---|
| 436 | returns an int: the number of bytes the object occupies. If you do not |
|---|
| 437 | provide this function heapy uses a default that looks at the |
|---|
| 438 | tp_basicsize and tp_itemsize fields of the type. This means that if |
|---|
| 439 | you do not allocate any extra memory for non-python objects (e.g. for |
|---|
| 440 | c strings) you do not need to provide this function. |
|---|
| 441 | |
|---|
| 442 | Traverser |
|---|
| 443 | --------- |
|---|
| 444 | |
|---|
| 445 | To answer "What objects does an instance contain" you provide a |
|---|
| 446 | traversal function (NyHeapDef_Traverser). This is called with a |
|---|
| 447 | pointer to a "visit procedure", an instance of your extension type and |
|---|
| 448 | some other stuff. You should then call the visit procedure for every |
|---|
| 449 | python object contained in your object. |
|---|
| 450 | |
|---|
| 451 | This might sound familiar: to support the python garbage collector you |
|---|
| 452 | provide a very similar function (tp_traverse). Actually heapy will use |
|---|
| 453 | tp_traverse if you do not provide a heapy-specific traverse function. |
|---|
| 454 | Doing this makes sense if you do not support the garbage collector for |
|---|
| 455 | some reason, or if you contain objects that are irrelevant to the |
|---|
| 456 | garbage collector. |
|---|
| 457 | |
|---|
| 458 | An example would be a type that contains a single python string |
|---|
| 459 | object (that no other code can get a reference to). If this object |
|---|
| 460 | does not have references to other python objects it cannot be involved |
|---|
| 461 | in cycles so supporting gc would be useless. However you do still want |
|---|
| 462 | heapy to know about the memory occupied by the contained string. You |
|---|
| 463 | could do that by adding that size in your NyHeapDef_SizeGetter |
|---|
| 464 | function but it is probably easier to tell heapy about the string |
|---|
| 465 | through the traversal function (so you do not have to calculate the |
|---|
| 466 | memory occupied by the string). |
|---|
| 467 | |
|---|
| 468 | If the above type would also have a reference to some arbitrary |
|---|
| 469 | (non-private) python object it should support gc, but it does not need |
|---|
| 470 | to tell gc about the contained string. So you would have two traversal |
|---|
| 471 | functions, one for heapy that visits the string and one for gc that |
|---|
| 472 | does not. |
|---|
| 473 | |
|---|
| 474 | RelationGetter |
|---|
| 475 | -------------- |
|---|
| 476 | |
|---|
| 477 | The last function heapy wants tells it in what way your instance |
|---|
| 478 | refers to some contained object. It is used to provide the "byvia" |
|---|
| 479 | view. This calls a visit function once for each way your instance |
|---|
| 480 | refers to a target object, telling it what kind of reference it is. |
|---|
| 481 | |
|---|
| 482 | Providing the heapdef struct to heapy |
|---|
| 483 | ------------------------------------- |
|---|
| 484 | |
|---|
| 485 | Once you have the needed function pointers in a struct you need to |
|---|
| 486 | pass this to heapy somehow. This is done through a standard cpython |
|---|
| 487 | mechanism called "cobjects". From python these look like rather stupid |
|---|
| 488 | objects you cannot do anything with, but from c you can pull out a |
|---|
| 489 | void* that was put in when the object was constructed. You can wrap an |
|---|
| 490 | arbitrary pointer in a CObject, make it available as attribute of your |
|---|
| 491 | module, then import it from some other module, pull the void* back |
|---|
| 492 | out and cast it to the original type. |
|---|
| 493 | |
|---|
| 494 | heapy looks for a _NyHeapDefs_ attribute on all loaded modules. If |
|---|
| 495 | this attribute exists and is a CObject the pointer in it is used as a |
|---|
| 496 | pointer to an array of NyHeapDef struct (terminated with a struct with |
|---|
| 497 | only nulls). Example code doing this is in sets.c in the guppy source. |
|---|