Fleshed out how-bloaty-works.md.

commit: 156582895e30fd3b1410b1ecc4aaf2a9c5309372 [log] [tgz]
author: Joshua Haberman <jhaberman@gmail.com> Sun May 24 14:15:32 2020 -0700
committer: Joshua Haberman <jhaberman@gmail.com> Sun May 24 14:15:32 2020 -0700
tree: 9bc5008194c85d56f9f24707afeeea4875db8fea
parent: cdd563b733f4cf579c5006167d96d54ebed8e77f [diff]
diff --git a/README.md b/README.md
index db9526b..c194eaa 100644
--- a/README.md
+++ b/README.md

@@ -7,6 +7,14 @@
 McBloatface will show you a size profile of the binary so
 you can understand what's taking up space inside.
 
+Bloaty performs a deep analysis of the binary. Using custom
+ELF, DWARF, and Mach-O parsers, Bloaty aims to accurately
+attribute every byte of the binary to the symbol or
+compileunit that produced it. It will even disassemble the
+binary looking for references to anonymous data. For more
+information about the analysis performed by Bloaty, please
+see [docs/how-bloaty-works.md](docs/how-bloaty-works.md).
+
 Bloaty works on binaries, shared objects, object files, and
 static libraries (`.a` files).  The following file formats
 are supported:
@@ -498,6 +506,15 @@
  100.0%  29.5Mi 100.0%  6.69Mi    TOTAL
 ```
 
+Sections are regions of the binary that are the linker
+treats as atomic when linking. The linker will never break
+apart or rearrange the data within a section. This is why it
+is necessary to compile with `-ffunction-sections` and
+`-fdata-sections` if you want the linker to strip out
+individual functions or variables that have no references.
+However the linker will often combine many input sections
+into a single output section.
+
 ## Symbols
 
 Symbols come from the symbol table, and represent individual

diff --git a/doc/how-bloaty-works.md b/doc/how-bloaty-works.md
new file mode 100644
index 0000000..9196f73
--- /dev/null
+++ b/doc/how-bloaty-works.md

@@ -0,0 +1,311 @@
+
+# How Bloaty Works
+
+At a high level, Bloaty's goal is to create a map of the binary where every
+byte has a label attached to it.  Every byte starts out as unknown
+(unattributed).  As we scan the binary we assign labels to different ranges of
+the file.  For example, if the user selected the "symbols" data source we scan
+the symbol table and use the symbol name as the label for each range.
+
+Ideally these labeled ranges will cover the entire file by the time we are
+done.  In practice we usually can't achieve perfect 100% coverage.  To
+compensate for this, we have various kinds of "fallback" labels we attach to
+mystery regions of the file.  This is how we guarantee an important invariant
+of Bloaty: the totals given in Bloaty's output will always match the total size
+of the file.  This ensures that we always account for the entire file, even if
+we don't have detailed information for every byte.
+
+The ELF/Mach-O/etc. data structures we are traversing were not designed to
+enable size profiling.  They were designed to assist linkers, loaders,
+debuggers, stack unwinders, etc. to run and debug the binary.  This means that
+Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O
+metadata.  Bloaty has to be clever about how to use the available information to
+achieve its goal.  This can pose a challenge, but also makes Bloaty fun to work
+on.  Getting the coverage close to 100% requires a lot of ingenuity (and some
+heuristics).
+
+## Range Map
+
+RangeMap (as defined in [range_map.h](https://github.com/google/bloaty/blob/master/src/range_map.h))
+is the core data structure of Bloaty.  It is a sparse map of
+`[start, end) -> std::string` that associates regions of VM or file space to
+a label.
+
+Sometimes we know a region's start but not its end.  For example, Mach-O
+symbols have an address but not a size (whereas ELF symbols have both).
+To support this case, `RangeMap` supports adding an address with `kUnknownSize`.
+A range with unknown size will automatically extend to the beginning of the
+next region, even if the next region is added later.
+
+If we try to add a label to a range of the binary that has already been assigned
+a label, the first label assigned takes precedence.  This means that the order
+in which we scan data structures is significant.  So our general strategy is to
+scan our most granular and detailed information first.  We scan generic
+information as a last resort, to give at least some information for parts of the
+binary that we couldn't find any more specific information about.
+
+## VM Space and File Space
+
+Loadable binaries have two fundamental domains of space we are trying to map:
+*VM space* and *file space*.  File space is the bytes of the input file.  VM
+space is the bytes of memory when the executable is loaded at runtime.  Some
+regions of the binary exist only in file space (like debug info) and some
+regions exist only in VM space (like `.bss`, zero-initialized data).  Even
+entities that exist in both spaces can have different sizes in each.
+
+We create two separate `RangeMap` structures for these two domains.  For
+convenience, we put them together into a single structure called `DualMap`:
+
+```cpp
+struct DualMap {
+  RangeMap vm_map;
+  RangeMap file_map;
+};
+```
+
+We populate these two maps simultaneously as we scan the file.  We must populate
+both maps even if we only care about one of them, because most of the metadata we're
+scanning gives us VM addresses *or* file offsets, not both.  For example,
+debug info always refers to VM addresses, because it's intended for debugging at
+runtime.  Even if we only care about file size, we still have to scan VM addresses
+and translate them to file offsets.
+
+Bloaty's overall analysis algorithm (in pseudo-code) is:
+
+```c++
+for (auto f : files) {
+  // Always start by creating the base map.
+  DualMap base_map = ScanBaseMap(f);
+
+  // Scan once for every data source the user selected with '-d'.
+  std::vector<DualMap> maps;
+  for (auto s : data_sources) {
+    maps.push_back(ScanDataSource(f, s));
+  }
+}
+```
+
+## Base Map
+
+To translate between VM and file space, we always begin by creating a "base
+map."  The base map is just a `DualMap` like any other, but we give it special
+meaning:
+
+* It defines what ranges of file and VM space constitute "the entire binary"
+  (ie. the "TOTALS" row of the final report).
+* We use it to translate between VM space and File space.
+
+This means that the base map must be exhaustive, and must also provide
+translation for any entity that exists in both VM and file space.  For example,
+suppose we are scanning the "symbols" data source and we see in the symbol
+table that address `0x12345` corresponds to symbol `foo`.  We will add that to
+VM map immediately, but we will also use the base map to translate address
+`0x12355` to a file offset so we can add that range to the file map.
+
+How does the base map store translation info?  I left one thing out about
+`RangeMap` above.  In addition to storing a label for every region, it can also
+(optionally) store a member called `other_start`.  This is the offset in the
+other space where this range maps to.
+
+We build the base map by scanning either the segments (program headers) or
+sections of the binary.  These give both VM address and file offset for regions
+of the binary that are loaded into memory.  To make sure we cover the entire
+file space, we use `[Unmapped]` as a last ditch fallback for any regions of the
+on-disk binary that didn't have any segment/section data associated with them.
+This ensures that Bloaty always accounts for the entire physical binary, even if
+we can't find any information about it.
+
+## Scanning Data Sources
+
+Once we have built the base map, we can get on to the meat of Bloaty's work.
+We can now scan the binary according to whatever data source(s) the user has
+selected.
+
+We must do more than just scan a single table though.  For example, if the user
+asked us to scan symbols we can't simply scan the symbol table, add all the
+symbol table entries to a `RangeMap`, and call it a day.  The first prerelease
+version of Bloaty did just that and I got very understandable complaints that
+Bloaty's coverage was low, and very large parts of the binary would show up as
+`[None]`.
+
+The problem is that the symbol table only describes the actual machine code of
+the function.  But a function can emit far more artifacts into a binary than
+just the machine code.  It can also emit:
+
+* unwind info into `.eh_frame`
+* relocations into `.rela.dyn`, etc.
+* debugging information into various `.debug_*` sections.
+* the symbol table entries themselves (yes, these do take up space!)
+
+If we want to achieve high coverage of the binary, we have to scan all of these
+different sections and attribute them as best we can to individual
+symbols/compileunits/etc.
+
+All of this means that we have to delve deep into ELF, DWARF, Mach-O, etc. to
+get good coverage. Bloaty includes custom ELF/DWARF/Mach-O parsers to not only
+extract every bit of data we can, but also to track where exactly in the files
+each bit of data lives.
+
+### Segments and Sections
+
+The `segments` and `sections` data sources are relatively straightforward. For
+the most part we can simply scan the segments/sections table and call it a day.
+
+For ELF, segments and sections have distinct tables in the binary that can be
+scanned independently. This means that technically a section could span multiple
+segments, but in practice segments/sections form a 1:many relationship, where
+each section is contained entirely within a single segment.
+
+Currently Bloaty only reports `PT_LOAD` and `PT_TLS` segments. We scan `PT_LOAD`
+segments first, so if there is overlap with `PT_TLS` the `PT_LOAD` label will
+win. In the future It may make sense to scan `PT_TLS` first, as this is more
+granular data that can give insight into the per-thread runtime overhead of TLS
+variables. It may also make sense to scan other segment types, to give more
+granular info.
+
+ELF segments do not have names. To distinguish between different `PT_LOAD`
+segments, we include both a segment offset and the segment flags in the label,
+eg. `LOAD #2 [R]`.
+
+For Mach-O, segments are contained within a file-level table of "load commands."
+Each load command has a type, and technically speaking, segments are a subset of
+all load commands. However Bloaty can parse non-segment load commands such as
+the symbol table (`LC_SYMTAB`, `LC_DYSYMTAB`), code signature
+(`LC_CODE_SIGNATURE`), and more. Segments can have zero or more sections, so in
+Mach-O files the 1:many nature of segments and sections is enforced by the file
+format.
+
+### Symbols
+
+The `symbols` data source requires a lot of deep analysis of the binary to give
+good coverage. For example, take the following data from running Bloaty on
+itself:
+
+```
+$ ./bloaty bloaty -d symbols,sections
+    FILE SIZE        VM SIZE
+ --------------  --------------
+[...]
+   0.2%   116Ki   1.6%   116Ki    AArch64_printInst
+    84.9%  98.8Ki  84.9%  98.8Ki    .text
+    14.9%  17.4Ki  14.9%  17.4Ki    .rodata
+     0.1%     156   0.1%     156    .eh_frame
+     0.0%      24   0.0%       0    .symtab
+     0.0%      18   0.0%       0    .strtab
+     0.0%       8   0.0%       8    .eh_frame_hdr
+[...]
+   0.1%  50.1Ki   0.7%  49.8Ki    reg_name_maps
+    59.6%  29.8Ki  59.8%  29.8Ki    .rela.dyn
+    40.0%  20.0Ki  40.2%  20.0Ki    .data.rel.ro
+     0.4%     216   0.0%       0    .symtab
+     0.0%      14   0.0%       0    .strtab
+```
+
+I excerpted two symbols from the report. Between these two symbols, Bloaty has
+found seven distinct kinds of sections that contributed to these two symbols. If
+you wrote a tool that naively just parsed the symbol table, you would only find
+one of those seven.
+
+Let's take a survey of what Bloaty has found here.
+
+1. `.text.`/`.data.rel.ro`: this is the data we obtain by simply following the
+   symbol table entry. This is the primary code or data emitted by the function
+   or variable.
+2. `.eh_frame`: this is the "unwind information" for a function. [It is used for
+   many things](https://stackoverflow.com/a/26302715/77070), including C++
+   exceptions and stack traces when no frame pointer is available.
+3. `.eh_frame_hdr`: this is metadata about the `.eh_frame` section.
+4. `.symtab`: this is the function/variable's symbol table entry itself. It is a
+   fixed size for every entry. The fact that `reg_name_maps` above has a
+   `.symtab` size of 216 indicates that there must actually be 9 different
+   symbols being represented by this entry. Bloaty has combined them because
+   they all have the same name. We can break them apart if we want using:
+
+   ```
+   $ ./bloaty bloaty -d compileunits,symbols --source-filter=reg_name_maps$
+       FILE SIZE        VM SIZE
+    --------------  --------------
+     20.3%  10.2Ki  20.3%  10.1Ki    ../third_party/capstone/arch/AArch64/AArch64Mapping.c
+      100.0%  10.2Ki 100.0%  10.1Ki    reg_name_maps
+     18.9%  9.45Ki  18.9%  9.43Ki    ../third_party/capstone/arch/X86/X86Mapping.c
+      100.0%  9.45Ki 100.0%  9.43Ki    reg_name_maps
+     16.4%  8.20Ki  16.4%  8.18Ki    ../third_party/capstone/arch/PowerPC/PPCMapping.c
+      100.0%  8.20Ki 100.0%  8.18Ki    reg_name_maps
+     10.7%  5.35Ki  10.7%  5.33Ki    ../third_party/capstone/arch/Mips/MipsMapping.c
+      100.0%  5.35Ki 100.0%  5.33Ki    reg_name_maps
+      9.1%  4.57Ki   9.1%  4.55Ki    ../third_party/capstone/arch/SystemZ/SystemZMapping.c
+      100.0%  4.57Ki 100.0%  4.55Ki    reg_name_maps
+      8.7%  4.35Ki   8.7%  4.31Ki    ../third_party/capstone/arch/ARM/ARMMapping.c
+      100.0%  4.35Ki 100.0%  4.31Ki    reg_name_maps
+      7.0%  3.52Ki   7.0%  3.49Ki    ../third_party/capstone/arch/TMS320C64x/TMS320C64xMapping.c
+      100.0%  3.52Ki 100.0%  3.49Ki    reg_name_maps
+      6.9%  3.44Ki   6.9%  3.41Ki    ../third_party/capstone/arch/Sparc/SparcMapping.c
+      100.0%  3.44Ki 100.0%  3.41Ki    reg_name_maps
+      2.0%  1.02Ki   2.0%    1016    ../third_party/capstone/arch/XCore/XCoreMapping.c
+      100.0%  1.02Ki 100.0%    1016    reg_name_maps
+    100.0%  50.1Ki 100.0%  49.8Ki    TOTAL
+   Filtering enabled (source_filter); omitted file = 46.7Mi, vm = 7.08Mi of entries
+   ```
+5. `.strtab`: this is the text of the function/variables's name itself in the
+   string table. Longer names take up more space in the binary, and Bloaty's
+   analysis here reflects that (though the symbol table is not loaded at
+   runtime, so it's not costing RAM).
+6. `.rela.dyn`: these are relocations embedded into the executable. Normally we
+   would associate relocations with `.o` files and not the final linked binary.
+   However shared objects and position-independent executables must also emit
+   relocations for any global variable that is initialized to an address of some
+   other data. These relocations can take up a significant amount of space,
+   indeed more space than the data itself in this case! Without this deep
+   analysis of the binary, this cost would be invisible. Bloaty scans all
+   relocation tables and "charges" each relocation entry to the function/data
+   that requires the relocation (*not* the function being pointed to).
+7. `.rodata`: Bloaty has found some data associated with this function.
+   Sometimes data doesn't get its own symbol table entry, for whatever reason.
+   Bloaty can attribute anonymous data to the function that uses it by
+   disassembling the binary looking for instructions that reference a different
+   part of the binary.
+
+Note that this is more granular information than you can get from a linker map
+file. A linker map file will break down some of these sections by compile unit,
+but the symbol-level granularity is limited to the primary code/data for each
+symbol (#1 in the list above).
+
+### Compile Units
+
+Like symbols, we can see that Bloaty is capable of breaking down lots of
+sections by compile unit:
+
+```
+$ ./bloaty bloaty -d compileunits,sections
+    FILE SIZE        VM SIZE
+ --------------  --------------
+  37.9%  17.7Mi  49.4%  3.52Mi    [160 Others]
+  15.0%  7.04Mi   3.4%   246Ki    ../third_party/protobuf/src/google/protobuf/descriptor.cc
+    33.9%  2.38Mi   0.0%       0    .debug_info
+    32.6%  2.29Mi   0.0%       0    .debug_loc
+    17.2%  1.21Mi   0.0%       0    .debug_str
+     6.5%   468Ki   0.0%       0    .debug_ranges
+     5.3%   381Ki   0.0%       0    .debug_line
+     2.8%   204Ki  83.1%   204Ki    .text
+     1.0%  70.9Ki   0.0%       0    .strtab
+     0.4%  25.7Ki  10.4%  25.7Ki    .eh_frame
+     0.2%  13.3Ki   0.0%       0    .symtab
+     0.1%  10.6Ki   4.3%  10.6Ki    .rodata
+     0.1%  3.97Ki   1.6%  3.97Ki    .eh_frame_hdr
+     0.0%  1.03Ki   0.4%  1.03Ki    .rela.dyn
+     0.0%     368   0.1%     368    .data.rel.ro
+     0.0%       0   0.0%      81    .bss
+[...]
+```
+
+To attribute all of the different `.debug_*` sections, Bloaty includes parsers
+for all of the different DWARF formats that live in these sections. We also use
+the DWARF data to find which symbols belong to which compile units.
+
+The `compileunits` data source contains much of the same data that you could get
+from a linker map. Since each compile unit generally comes from a separate `.o`
+file, a linker map can often give good data about which parts of the binary came
+from which translation units. However Bloaty is able to derive this data without
+needing a linker map file, which may be tricky to obtain. The `compileunits`
+data source is also useful when combined with other data sources in hierarchical
+profiles, as seen above.

diff --git a/how-bloaty-works.md b/how-bloaty-works.md
deleted file mode 100644
index f246a01..0000000
--- a/how-bloaty-works.md
+++ /dev/null

@@ -1,159 +0,0 @@
-
-# How Bloaty Works
-
-At a high level, Bloaty's goal is to create a map of the binary where every
-byte has a label attached to it.  Every byte starts out as unknown
-(unattributed).  As we scan the binary we assign labels to different ranges of
-the file.  For example, if the user selected the "symbols" data source we scan
-the symbol table and use the symbol name as the label for each range.
-
-Ideally these labeled ranges will cover the entire file by the time we are
-done.  In practice we usually can't achieve perfect 100% coverage.  To
-compensate for this, we have various kinds of "fallback" labels we attach to
-mystery regions of the file.  This is how we guarantee an important invariant
-of Bloaty: the totals given in Bloaty's output will always match the total size
-of the file.  This ensures that we always account for the entire file, even if
-we don't have detailed information for every byte.
-
-The ELF/Mach-O/etc. data structures we are traversing were not designed to
-enable size profiling.  They were designed to assist linkers, loaders,
-debuggers, stack unwinders, etc. to run and debug the binary.  This means that
-Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O
-metadata.  Bloaty has to be clever about how to use the available information to
-achieve its goal.  This can pose a challenge, but also makes Bloaty fun to work
-on.  Getting the coverage close to 100% requires a lot of ingenuity (and some
-heuristics).
-
-## Range Map
-
-RangeMap (as defined in [range_map.h](https://github.com/google/bloaty/blob/master/src/range_map.h))
-is the core data structure of Bloaty.  It is a sparse map of
-`[start, end) -> std::string` that associates regions of VM or file space to
-a label.
-
-Sometimes we know a region's start but not its end.  For example, Mach-O
-symbols have an address but not a size (whereas ELF symbols have both).
-To support this case, `RangeMap` supports adding an address with `kUnknownSize`.
-A range with unknown size will automatically extend to the beginning of the
-next region, even if the next region is added later.
-
-If we try to add a label to a range of the binary that has already been assigned
-a label, the first label assigned takes precedence.  This means that the order
-in which we scan data structures is significant.  So our general strategy is to
-scan our most granular and detailed information first.  We scan generic
-information as a last resort, to give at least some information for parts of the
-binary that we couldn't find any more specific information about.
-
-## VM Space and File Space
-
-Loadable binaries have two fundamental domains of space we are trying to map:
-*VM space* and *file space*.  File space is the bytes of the input file.  VM
-space is the bytes of memory when the executable is loaded at runtime.  Some
-regions of the binary exist only in file space (like debug info) and some
-regions exist only in VM space (like `.bss`, zero-initialized data).  Even
-entities that exist in both spaces can have different sizes in each.
-
-We create two separate `RangeMap` structures for these two domains.  For
-convenience, we put them together into a single structure called `DualMap`:
-
-```cpp
-struct DualMap {
-  RangeMap vm_map;
-  RangeMap file_map;
-};
-```
-
-We populate these two maps simultaneously as we scan the file.  We must populate
-both maps even if we only care about one of them, because most of the metadata we're
-scanning gives us VM addresses *or* file offsets, not both.  For example,
-debug info always refers to VM addresses, because it's intended for debugging at
-runtime.  Even if we only care about file size, we still have to scan VM addresses
-and translate them to file offsets.
-
-Bloaty's overall analysis algorithm (in pseudo-code) is:
-
-```c++
-for (auto f : files) {
-  // Always start by creating the base map.
-  DualMap base_map = ScanBaseMap(f);
-
-  // Scan once for every data source the user selected with '-d'.
-  std::vector<DualMap> maps;
-  for (auto s : data_sources) {
-    maps.push_back(ScanDataSource(f, s));
-  }
-}
-```
-
-## Base Map
-
-To translate between VM and file space, we always begin by creating a "base
-map."  The base map is just a `DualMap` like any other, but we give it special
-meaning:
-
-* It defines what ranges of file and VM space constitute "the entire binary"
-  (ie. the "TOTALS" row of the final report).
-* We use it to translate between VM space and File space.
-
-This means that the base map must be exhaustive, and must also provide
-translation for any entity that exists in both VM and file space.  For example,
-suppose we are scanning the "symbols" data source and we see in the symbol
-table that address `0x12345` corresponds to symbol `foo`.  We will add that to
-VM map immediately, but we will also use the base map to translate address
-`0x12355` to a file offset so we can add that range to the file map.
-
-How does the base map store translation info?  I left one thing out about
-`RangeMap` above.  In addition to storing a label for every region, it can also
-(optionally) store a member called `other_start`.  This is the offset in the
-other space where this range maps to.
-
-We build the base map by scanning either the segments (program headers) or
-sections of the binary.  These give both VM address and file offset for regions
-of the binary that are loaded into memory.  To make sure we cover the entire
-file space, we use `[Unmapped]` as a last ditch fallback for any regions of the
-on-disk binary that didn't have any segment/section data associated with them.
-This ensures that Bloaty always accounts for the entire physical binary, even if
-we can't find any information about it.
-
-## Scanning Data Sources
-
-Once we have built the base map, we can get on to the meat of Bloaty's work.
-We can now scan the binary according to whatever data source(s) the user has
-selected.
-
-We must do more than just scan a single table though.  For example, if the user
-asked us to scan symbols we can't simply scan the symbol table, add all the
-symbol table entries to a `RangeMap`, and call it a day.  The first prerelease
-version of Bloaty did just that and I got very understandable complaints that
-Bloaty's coverage was low, and very large parts of the binary would show up as
-`[None]`.
-
-The problem is that the symbol table only refers to the actual machine code of
-the function.  But a function can emit far more artifacts into a binary than
-just the machine code.  It can also emit:
-
-* unwind info into `.eh_frame`
-* relocations into `.rela.dyn`, etc.
-* debugging information into various `.debug_*` sections.
-* the symbol table entries themselves (yes, these do take up space!)
-
-If we want to achieve high coverage of the binary, we have to scan all of these
-different sections and attribute them as best we can to individual
-symbols/compileunits/etc.
-
-All of this means that we have to delve deep into ELF, DWARF, Mach-O, etc. to
-get good coverage.  We have to venture into dark arts like `.eh_frame` which is
-underspecified and underdocumented, and that few programs ever have any reason
-to parse.  For Bloaty to work well, we must go deeper.
-
-## ELF
-
-TODO
-
-## DWARF
-
-TODO
-
-## Mach-O
-
-TODO
commit	156582895e30fd3b1410b1ecc4aaf2a9c5309372	[log] [tgz]
author	Joshua Haberman <jhaberman@gmail.com>	Sun May 24 14:15:32 2020 -0700
committer	Joshua Haberman <jhaberman@gmail.com>	Sun May 24 14:15:32 2020 -0700
tree	9bc5008194c85d56f9f24707afeeea4875db8fea
parent	cdd563b733f4cf579c5006167d96d54ebed8e77f [diff]