(ToDo: document properly.)
It is possible to compile Haskell programs so that they will count lots and lots of interesting things, e.g., number of updates, number of data constructors entered, etc., etc. We call this “ticky-ticky” profiling, because that's the sound a CPU makes when it is running up all those counters (slowly).
Ticky-ticky profiling is mainly intended for implementors; it is quite separate from the main “cost-centre” profiling system, intended for all users everywhere.
      You don't need to build GHC, the libraries, or the RTS a special
      way in order to use ticky-ticky profiling.  You can decide on a
      module-by-module basis which parts of a program have the
      counters compiled in, using the
      compile-time -ticky option.  Those modules that
      were not compiled with -ticky won't contribute
      to the ticky-ticky profiling results, and that will normally
      include all the pre-compiled packages that your program links
      with.
    
To get your compiled program to spit out the ticky-ticky numbers:
            Link the program with -debug
            (-ticky is a synonym
            for -debug at link-time).  This links in
            the debug version of the RTS, which includes the code for
            aggregating and reporting the results of ticky-ticky
            profiling.
          
            Run the program with the -r RTS
            option.
            See Section 4.17, “Running a compiled program”.
          
Here is a sample ticky-ticky statistics file, generated by the invocation foo +RTS -rfoo.ticky.
 foo +RTS -rfoo.ticky
ALLOCATIONS: 3964631 (11330900 words total: 3999476 admin, 6098829 goods, 1232595 slop)
                                total words:        2     3     4     5    6+
  69647 (  1.8%) function values                 50.0  50.0   0.0   0.0   0.0
2382937 ( 60.1%) thunks                           0.0  83.9  16.1   0.0   0.0
1477218 ( 37.3%) data values                     66.8  33.2   0.0   0.0   0.0
      0 (  0.0%) big tuples
      2 (  0.0%) black holes                      0.0 100.0   0.0   0.0   0.0
      0 (  0.0%) prim things
  34825 (  0.9%) partial applications             0.0   0.0   0.0 100.0   0.0
      2 (  0.0%) thread state objects             0.0   0.0   0.0   0.0 100.0
Total storage-manager allocations: 3647137 (11882004 words)
        [551104 words lost to speculative heap-checks]
STACK USAGE:
ENTERS: 9400092  of which 2005772 (21.3%) direct to the entry code
                  [the rest indirected via Node's info ptr]
1860318 ( 19.8%) thunks
3733184 ( 39.7%) data values
3149544 ( 33.5%) function values
                  [of which 1999880 (63.5%) bypassed arg-satisfaction chk]
 348140 (  3.7%) partial applications
 308906 (  3.3%) normal indirections
      0 (  0.0%) permanent indirections
RETURNS: 5870443
2137257 ( 36.4%) from entering a new constructor
                  [the rest from entering an existing constructor]
2349219 ( 40.0%) vectored [the rest unvectored]
RET_NEW:         2137257:  32.5% 46.2% 21.3%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
RET_OLD:         3733184:   2.8% 67.9% 29.3%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
RET_UNBOXED_TUP:       2:   0.0%  0.0%100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
RET_VEC_RETURN : 2349219:   0.0%  0.0%100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
UPDATE FRAMES: 2241725 (0 omitted from thunks)
SEQ FRAMES:    1
CATCH FRAMES:  1
UPDATES: 2241725
      0 (  0.0%) data values
  34827 (  1.6%) partial applications
                  [2 in place, 34825 allocated new space]
2206898 ( 98.4%) updates to existing heap objects (46 by squeezing)
UPD_CON_IN_NEW:         0:       0      0      0      0      0      0      0      0      0
UPD_PAP_IN_NEW:     34825:       0      0      0  34825      0      0      0      0      0
NEW GEN UPDATES: 2274700 ( 99.9%)
OLD GEN UPDATES: 1852 (  0.1%)
Total bytes copied during GC: 190096
**************************************************
3647137 ALLOC_HEAP_ctr
11882004 ALLOC_HEAP_tot
  69647 ALLOC_FUN_ctr
  69647 ALLOC_FUN_adm
  69644 ALLOC_FUN_gds
  34819 ALLOC_FUN_slp
  34831 ALLOC_FUN_hst_0
  34816 ALLOC_FUN_hst_1
      0 ALLOC_FUN_hst_2
      0 ALLOC_FUN_hst_3
      0 ALLOC_FUN_hst_4
2382937 ALLOC_UP_THK_ctr
      0 ALLOC_SE_THK_ctr
 308906 ENT_IND_ctr
      0 E!NT_PERM_IND_ctr requires +RTS -Z
[... lots more info omitted ...]
      0 GC_SEL_ABANDONED_ctr
      0 GC_SEL_MINOR_ctr
      0 GC_SEL_MAJOR_ctr
      0 GC_FAILED_PROMOTION_ctr
  47524 GC_WORDS_COPIED_ctr
The formatting of the information above the row of asterisks is subject to change, but hopefully provides a useful human-readable summary. Below the asterisks all counters maintained by the ticky-ticky system are dumped, in a format intended to be machine-readable: zero or more spaces, an integer, a space, the counter name, and a newline.
In fact, not all counters are
    necessarily dumped; compile- or run-time flags can render certain
    counters invalid.  In this case, either the counter will simply
    not appear, or it will appear with a modified counter name,
    possibly along with an explanation for the omission (notice
    ENT_PERM_IND_ctr appears
    with an inserted ! above).  Software analysing
    this output should always check that it has the counters it
    expects.  Also, beware: some of the counters can have
    large values!