CSE 512/CS 554 - Machine Problem 1

Due Apr 1, 2008

  1. Implement the agglomerated version (multiple grid points per coarse-grain task) of the 1-D Laplace example from Chapter 2 using MPI. You may use some fixed number of iterations rather than iterating until convergence.
  2. Discuss the effects of the various communication modes of MPI with respect to the ordering and placement of sends and receives in this program.
    1. Is there any choice that risks deadlock?
    2. Is there any choice that will certainly deadlock?
    3. Is there any choice that results in effectively serial execution (i.e., only one task actually executes at a time)?
    4. What communication mode and placement of sends and receives enables the greatest overlap of communication and computation?
    Verify your conclusions by making test runs to demonstrate the issues discussed above (beware of explicit deadlock, however — do not let your program hang indefinitely).
  3. Run your program for a series of values of n and p and plot and discuss the resulting speedup and efficiency you observe.
  4. Formulate a performance model for this parallel program in terms of the usual parameters n, p, tc, ts, tw. Compare the predictions of your model with performance actually observed. You will need values for the computation and communication parameters. How can you obtain these?
  5. What is the isoefficiency function for this parallel program? Using the parameters determined in the previous question, plot isoefficiency curves given by your performance model and compare with your previous observations of actual efficiency. How would the isoefficiency change if a convergence test were required rather than taking a fixed number of iterations?
  6. Using a suitable choice of communication mode, implement the version of the Laplace example with overlapped communication and computation and compare its performance with that of the original version. How does overlapping affect your performance model?
  7. Generate trace files using SLOG or MPICL so that you can visualize the behavior and performance of your programs using Jumpshot or ParaGraph. Use such visualizations to illustrate some of the points made in answering the previous questions.
You should submit your program listings, output, and any written discussion either in hardcopy or via email (pdf format preferred).

For information on building an executable program and running it on Turing, see the tutorials and FAQ on the Turing website. To obtain MPICL tracing, you should use an appropriate compiler script (e.g., mpiclcc instead of mpicc) to build your executable.