Homework 04: due Tuesday, November 7, 2017 (HW04)


Note: No class on that day (Friday schedule) but HW due @COB anyways.

You now have from HW03 a climatological data set for Durham. We will now start to extract some useful information out of it. Before that, we need to deal with sorting algorithms.
  1. Sorting
    1. Write a simple bubble sort code in C, Fortran, and Perl. In each case, the code should read a file that has exactly one floating point number on each line, and output a file (STDIN/STDOUT for simplicity) of the sorted numbers in the same way. In order to see how efficient that algorithm is, time it using the 'time' command for files of length N=1000, 2000, 4000, etc. until the sorting time becomes unaccptable (>5 minutes). The files to sort should be the first N temperatures from your data file. Then use gnuplot to plot the times versus size and verify that bubble sort is a 'N^2' algorithm, that is, the execution time T is proportional to N*N.
    2. Repeat the above with the built-in Perl 'sort' function, and Quicksort code in C and Fortran that you need to find on the web. Update your previous plot with the new timing data. Also make plots that have the x-axix, y-axis, or both logarithmic. Which plot makes the 'N^2' best visible? Is there an obvious scaling for 'quicksort' and 'sort'?
    3. Given the knowledge that the scaling of the fast algrithms is either T ~ log(N)^2, log(N)^3, N*sqrt(N), N*log(N), N, or log(N), make plots that show which one is true.


  2. Statistics and superposed epoch analysis
    1. Calculate and plot the Percentiles for temperature T and wind speed V, based on all available 1 minute data, and plot them, i.e., 0-100 in steps of 1 on the x-axis and the corresponding T, V on the y-axis. Obviously, this requires sorting of fairly big data sets.
    2. Using the timing results above, estimate how long the sorting would take using bubble sort in Perl, C, and Fortran.
    3. Now produce an occurrence rate histogram (one bar for each 1 F, 0.1 m/s) using the sorted data. One could do this by sorting the data into bins, but that might take much longer. Also, one would not know the min/max values beforehand.
    4. Superposed epoch: For each day of the year (DOY, 0-364, ignore the leap days), calculate for both T an V the daily mean, min, max, 25th, 50th(median), and 75th percentile. Make a really neat plot, with a shade between min/max, another shade for 25th/75th, and lines for medial and mean. Note that for this analysis the 'Epoch' is January 1.
    5. (GS only) Produce a plot that, for every month (3x4 panels), produces a compass rose type plot depicting the occurrence rate of wind directions (30 degree bins, centered on 0, 30, 60, ..) properly averaged over all available data. Whenever there is a month with less than 90% data coverage, don't use it. Be aware that you cannot average angles (why not?).