Note: No class on that day (Friday schedule) but HW due @COB anyways.
You now have from HW03 a climatological data set for Durham.
We will now start to extract some useful information out of it.
Before that, we need to deal with sorting algorithms.
Sorting
Write a simple bubble sort
code in C, Fortran, and Perl.
In each case, the code should read a file that has exactly one floating point number
on each line, and output a file (STDIN/STDOUT for simplicity) of the sorted numbers
in the same way. In order to see how efficient that algorithm is, time it using
the 'time' command for files of length N=1000, 2000, 4000, etc. until the sorting time
becomes unaccptable (>5 minutes). The files to sort should be the first N temperatures
from your data file. Then use gnuplot to plot the times versus size and verify that
bubble sort is a 'N^2' algorithm, that is, the execution time T is proportional to N*N.
Repeat the above with the built-in Perl 'sort' function, and
Quicksort code
in C and Fortran that you need to find on the web.
Update your previous plot with the new timing data. Also make plots that
have the x-axix, y-axis, or both logarithmic. Which plot makes the 'N^2'
best visible? Is there an obvious scaling for 'quicksort' and 'sort'?
Given the knowledge that the scaling of the fast algrithms is either
T ~ log(N)^2, log(N)^3, N*sqrt(N), N*log(N), N, or log(N),
make plots that show which one is true.
Statistics and superposed epoch analysis
Calculate and plot the Percentiles
for temperature T and wind speed V,
based on all available 1 minute data,
and plot them, i.e., 0-100 in steps of 1 on the x-axis and the corresponding T, V on the y-axis.
Obviously, this requires sorting of fairly big data sets.
Using the timing results above, estimate how long the sorting would take using bubble sort
in Perl, C, and Fortran.
Now produce an occurrence rate histogram (one bar for each 1 F, 0.1 m/s) using the sorted
data. One could do this by sorting the data into bins, but that might take much longer.
Also, one would not know the min/max values beforehand.
Superposed epoch: For each day of the year (DOY, 0-364, ignore the leap days), calculate
for both T an V the daily mean, min, max, 25th, 50th(median), and 75th percentile.
Make a really neat plot, with a shade between min/max, another shade for 25th/75th,
and lines for medial and mean.
Note that for this analysis the 'Epoch' is January 1.
(GS only) Produce a plot that, for every month (3x4 panels), produces a compass rose type plot
depicting the occurrence rate of wind directions (30 degree bins, centered on 0, 30, 60, ..)
properly averaged over all available data. Whenever there is a month with less than
90% data coverage, don't use it.
Be aware that you cannot average angles (why not?).