Copy examples to your home:

 cd ~
 cp -r /srv/app/kcachegrind/kcg-examples .
 cd kcg-examples
 make


Getting Started

* What happens in "ls /usr/bin" ?
  - functions with most instructions? Purpose?
  - where is the main function?


Matrix Multiplication

 Run different versions of mm on sixe <x>:
 ./mm <x>

 1) What is the fastest for sizes 100, 300, 500, 800 ?
    Can you explain why?
 1a) Run the cache simulator to get hints.
     How good is the exploitation of the L1/L3 caches?
 1b) Try to generate separate profile dumps for each
     version by using --dump-before=XXX
 1c) Copy header for Callgrind macros

  cp /srv/app/kcachegrind/include/valgrind/callgrind.h .
  Use macro CALLGRIND_START_INSTRUMENTATION and 
  --instr-atstart=no to "fast-forward" to last version, 
  and only do cache simulation for that one.

 2) What is IJK-T? Why does it help with large sizes?
 3) Run with sizes 127, 128, 129. What about KJI? Why?
    How to improve?
 4) Can you come up with a version blocking in all 3
    directions?


Jacobi

 Run different versions of jc on sixe <x>, <y> iterations:
 ./jc <x> <y>

 1) Why is the first version so slow?
 2) Can you write a version weaving 4 iterations?
 2a) Check the result also with the cache simulator
 2b) How good is the cache line utilization?
     Use --cacheuse=yes


