Homework 04: due Tuesday, November 7, 2017 (HW04)
Note: No class on that day (Friday schedule) but HW due @COB anyways.
You now have from HW03 a climatological data set for Durham.
We will now start to extract some useful information out of it.
Before that, we need to deal with sorting algorithms.
- Sorting
- Write a simple bubble sort
code in C, Fortran, and Perl.
In each case, the code should read a file that has exactly one floating point number
on each line, and output a file (STDIN/STDOUT for simplicity) of the sorted numbers
in the same way. In order to see how efficient that algorithm is, time it using
the 'time' command for files of length N=1000, 2000, 4000, etc. until the sorting time
becomes unaccptable (>5 minutes). The files to sort should be the first N temperatures
from your data file. Then use gnuplot to plot the times versus size and verify that
bubble sort is a 'N^2' algorithm, that is, the execution time T is proportional to N*N.
- Repeat the above with the built-in Perl 'sort' function, and
Quicksort code
in C and Fortran that you need to find on the web.
Update your previous plot with the new timing data. Also make plots that
have the x-axix, y-axis, or both logarithmic. Which plot makes the 'N^2'
best visible? Is there an obvious scaling for 'quicksort' and 'sort'?
- Given the knowledge that the scaling of the fast algrithms is either
T ~ log(N)^2, log(N)^3, N*sqrt(N), N*log(N), N, or log(N),
make plots that show which one is true.
- Statistics and superposed epoch analysis
- Calculate and plot the Percentiles
for temperature T and wind speed V,
based on all available 1 minute data,
and plot them, i.e., 0-100 in steps of 1 on the x-axis and the corresponding T, V on the y-axis.
Obviously, this requires sorting of fairly big data sets.
- Using the timing results above, estimate how long the sorting would take using bubble sort
in Perl, C, and Fortran.
- Now produce an occurrence rate histogram (one bar for each 1 F, 0.1 m/s) using the sorted
data. One could do this by sorting the data into bins, but that might take much longer.
Also, one would not know the min/max values beforehand.
- Superposed epoch: For each day of the year (DOY, 0-364, ignore the leap days), calculate
for both T an V the daily mean, min, max, 25th, 50th(median), and 75th percentile.
Make a really neat plot, with a shade between min/max, another shade for 25th/75th,
and lines for medial and mean.
Note that for this analysis the 'Epoch' is January 1.
- (GS only) Produce a plot that, for every month (3x4 panels), produces a compass rose type plot
depicting the occurrence rate of wind directions (30 degree bins, centered on 0, 30, 60, ..)
properly averaged over all available data. Whenever there is a month with less than
90% data coverage, don't use it.
Be aware that you cannot average angles (why not?).