Program: trending
Author: Asif Shahidullah


Usage:
trending startfile endfile


Prequisites:
(1) Both files must exist;
(2) Words in the files are separated by newlines
(3) A newline character exists at the end of the file

(2) and (3) were consistent through all test files, therefore these conditions
have been assumed as true for all test files.


Description:

Given two files, 'trending' calculates the difference in frequences of individual
words in each files; it reports the top five trending up and trending down words.

"Trending up" is defined as an increase (postive delta) in word frequency, while
"trending down" is defined as a decrease (negative delta) in word frequency. A delta
of zero is defined as neither trending up or down.


Notes:

(1) This program was written on Microsoft Windows 7 Home Premium using Visual Studio
2012 Professional. Testing environments included the ECE codon server as well and an
Intel i5-2450M (2.5 Ghz, Sandy Bridge, w/ average power consumption of 15 W) system.

(2) Regarding optimizations: using the VS compiler (release build, with flag /O2 and
favored for speed); the resulting code took and average of 90 milliseconds to process
the extra_testfile(s) with the median existing in the mid-high 80 millisecond range.
On codon, the same code takes an average of 550 milliseconds to process the same files.
With no optimizations (and debug build), the code takes roughtly 210 milliseconds on
Windows.

(3) Checking for null pointers for insertion, deletion, trend analysis (et cetera)
were ommitted in the final code. The code, inhrently, is stable enough to operate with-
out these checks. However, should another programmer utilize this code, checks will have
to be made to ensure null pointers are not being passed to the appropriate functions.

(4) The binary tree utilized in this program is doubly linked. This scheme optimizes
operations performed on the list (such as insertion and deletion), since we can use iter-
ative alogrithms without the use of a stack. The memory-performance tradeoff seems
reasonable.

(5) In order to optimize the code, the binary tree destruction function was modified
to pass parameters to the trend finding unit. While not strictly portable (i.e., three
lines of code across the entire program need to be changed to make it fully portable)
it saves the program having to traverse the list four times. The algorithm developed to
utilize the doubly-linked property (and essentially perform a post order traversal:)

function destruct (BinaryTree: tree) begin
BinaryTreeNode: node
BinaryTreeNode: parent

if tree.left exists
node = tree.left
else
node = tree.right

while node exists begin
if node.left exists
node = node.left
else if node.right exists
node = node.right
else begin
parent = node
if parent does not exist
return
free node
node = parent
end
end
end

(6) The trend finding algorithm operates as follows:

1) Populate both the trending up and down lists;
2) Keep track of the edge of both lists when both lists are populated;
3) If a value is greater (up) or less than (down) the edge case, replace
the edge with it;
4) Recalculate the edge.

(7) The following flags are utilized for the gcc compilation

-O2 Optimzation level 2
-s Strip all unecessary symbols from the binary
-pipe Pipe input for faster compilation
-mcpu=ultrasparc CPU type, though this probably doesn't matter much
-ansi ANSI C standard
-Wall All warnings turned on
-Wextra Extra warnings (omitted from -Wall) turned on
-Wconversion Warn about bad converions (omitted from -Wall and -Wextra)
-Werror Turn all warnings into compile time errors
-pedantic-errors Turn on -pedantic and treat warnings as errors

Last edited Oct 27, 2012 at 5:54 PM by asifshahidullah, version 2