<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Kevin Dolan</title>
	<atom:link href="http://thekevindolan.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thekevindolan.com</link>
	<description>Putting the Kev in Dolan since 2009!</description>
	<lastBuildDate>Fri, 26 Feb 2010 23:12:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using the News!</title>
		<link>http://thekevindolan.com/2010/02/using-the-news/</link>
		<comments>http://thekevindolan.com/2010/02/using-the-news/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 23:12:01 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[test results]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=797</guid>
		<description><![CDATA[Last time, I tried out a couple different methods to assign sentiment to news articles, and found that the best performance seemed to come from using my Temporal Interference method initialized by zeroes.  Well there&#8217;s a little more information available to us, and that&#8217;s the news article content themselves!
So the basic idea here is to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/using-the-news/"><img class="aligncenter size-medium wp-image-798" title="newspaper" src="http://thekevindolan.com/wp-content/uploads/2010/02/newspaper-600x421.jpg" alt="newspaper" width="600" height="421" /></a><a href="http://thekevindolan.com/2010/02/perturbation-modeling/">Last time</a>, I tried out a couple different methods to assign sentiment to news articles, and found that the best performance seemed to come from using my Temporal Interference method initialized by zeroes.  Well there&#8217;s a little more information available to us, and that&#8217;s the news article content themselves!<span id="more-797"></span></p>
<p>So the basic idea here is to train a model using Temporal Interference, and then use that model to score each news article, and use the scores as the new Perturbation model.  This would potentially lead to an iterative process, but should eventually converge.  Of course, there wasn&#8217;t a particularly clear way to do this, so I tried several.</p>
<p>For this first set of results, I tried various orders of modeling on the data at its actual duration value.  For time&#8217;s sake I skipped data set 5 due to its size.  Only results for 1-4 and 6 are shown:</p>
<p>Z,TI: 0.944, 0.902, 0.887, 0.852, 0.889, 0.708<br />
Z,TI,NS: 0.858, 0.880, 0.810, 0.850, 0.854, 0.749<br />
Z,TI,NS,TI: 0.951, 0.921, 0.907, 0.858, 0.744<br />
Z,TI,NS,NS: 0.821, 0.853, 0.524, 0.761, 0.513<br />
Z,TI,NS,NS,TI: 0.947, 0.906, 0.890, 0.853, 0.716<br />
Z,TI,NS,TI,NS: 0.859, 0.880, 0.813, 0.848, 0.763<br />
Z,TI,(NS,TI)x2: 0.951, 0.921, 0.907, 0.858, 0.747</p>
<p>What we learn here is that incorporating news information into the model does have an advantage.  For the most part, we saw optimum performance alternating News and Temporal Interference, ending with Temporal Interference, however, in the case of data set 6, we saw the best performance when we ended with the News Analysis.  This was somewhat counter-intuitive.</p>
<p>Also, with most data sets, the progress stabilized after just one iteration, however, this was not the case with data set 6.  I tried adding two more steps.  Adding another News step gave us an accuracy of 0.765, and adding another Temporal Interference step brought us back to 0.747.  Considering the kind of information made available to the system, this is somewhat impressive, but really goes to show why my initial attempts at this definitely would not have ever worked.</p>
<p>I then considered what would happen if we did not know ahead of time what kind of duration to expect for perturbations.  I did these studies on data set 3 only.  If we reduce the duration to just 49, we see a significant drop in accuracy with Temporal Interference alone to 0.692.  If we keep alternately adding NS/TI steps, we see the accuracy rise to 0.739, 0.702, 0.745, 0.703, 0.747, 0.703.  Interestingly, this one also performs best when we end with the News analysis.</p>
<p>If we reduce the duration to 40, we see the accuracy decrease to 0.642.  Running 2 NS/TI loops and ending with NS, we find the accuracy actually decrease to 0.624.  Odd indeed!</p>
<p>For duration 51, we see a drop to 0.712, and  then an improvement to 0.776.</p>
<p>For duration 60, we see a drop to 0.658, and then an improvement to 0.726.</p>
<p>Clearly something odd is going on here where we do not know the duration, and unfortunately, this is the case in real life.  So we need to do some further thinking about Perturbation modeling if we want to move forward.</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/using-the-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perturbation Modeling, Initial Approach</title>
		<link>http://thekevindolan.com/2010/02/perturbation-modeling/</link>
		<comments>http://thekevindolan.com/2010/02/perturbation-modeling/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 17:45:53 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>
		<category><![CDATA[test results]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=790</guid>
		<description><![CDATA[
Last time I introduced some test data, and before that I formalized the Perturbation Model for Price Moves a bit further.  Well this required me to rewrite the code I had written before for Sentiment analysis.  I took advantage of interval trees to make my code fairly efficient, and also changed the way I initialize [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/perturbation-modeling/"><img class="aligncenter size-full wp-image-791" title="1235935919" src="http://thekevindolan.com/wp-content/uploads/2010/02/1235935919.jpg" alt="1235935919" width="320" height="312" /></a></p>
<p>Last time I introduced some <a href="http://thekevindolan.com/2010/02/more-test-data/">test data</a>, and before that I formalized the <a href="http://thekevindolan.com/2010/02/perturbation-model/">Perturbation Model</a> for Price Moves a bit further.  Well this required me to rewrite the code I had written before for Sentiment analysis.  I took advantage of <a href="http://thekevindolan.com/2010/02/interval-tree/">interval trees</a> to make my code fairly efficient, and also changed the way I initialize the price movements, yielding minor improvements over the naive methods.<span id="more-790"></span></p>
<p>The basic idea was that I created an object Perturbation, which has fields for influence, duration, and start time.  Start time was a controversial decision for inclusion, but I ultimately decided it was best.</p>
<p>The Sentiment interface has been replaced by Modeler and contains one method:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> Perturbation<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>As far as the initialization algorithm, I had previously initialized all sentiments to the price move observed in the interval of the news article.  The biggest problem I had with this was that a corpus of just two news articles at exactly the same time would initialize to the price move observed in that time for both, when in reality, it should be half that.</p>
<p>So basically the new method works by splitting the data into elementary intervals by the distribution of news articles.  For each elementary interval, the price move observed is distributed amongst all relevant perturbations, and then the resulting assignment is a weighted average of all of these observed price moves.  I called this Elementary Average.</p>
<p>The code is below:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> Perturbation<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	NewsCorpus corpus <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getCorpus</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	DataHistory data <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getDataHistory</span><span style="color: #009900;">&#40;</span>index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	List<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> newsList <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	Perturbation<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> perturbations <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Perturbation<span style="color: #009900;">&#91;</span>newsList.<span style="color: #006633;">size</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	IntervalTree<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> intervalTree <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IntervalTree<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	SortedSet<span style="color: #339933;">&lt;</span>Long<span style="color: #339933;">&gt;</span> endpoints <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TreeSet<span style="color: #339933;">&lt;</span>Long<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory story <span style="color: #339933;">:</span> newsList<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		perturbations<span style="color: #009900;">&#91;</span>story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Perturbation<span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span>,story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>,forecast<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		intervalTree.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> forecast, story<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		endpoints.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		endpoints.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> forecast<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #003399;">Long</span> last <span style="color: #339933;">=</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Long</span> next <span style="color: #339933;">:</span> endpoints<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>last <span style="color: #339933;">!=</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			List<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> stories <span style="color: #339933;">=</span> intervalTree.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>last, next<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #000066; font-weight: bold;">double</span> price0 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>last<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> price1 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>next<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> priceMove <span style="color: #339933;">=</span> price1 <span style="color: #339933;">-</span> price0<span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #666666; font-style: italic;">//System.out.println(priceMove);</span>
&nbsp;
			<span style="color: #000066; font-weight: bold;">double</span> distributed <span style="color: #339933;">=</span> priceMove <span style="color: #339933;">/</span> stories.<span style="color: #006633;">size</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #000066; font-weight: bold;">double</span> length <span style="color: #339933;">=</span> next <span style="color: #339933;">-</span> last<span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> proportion <span style="color: #339933;">=</span> length <span style="color: #339933;">/</span> forecast <span style="color: #339933;">/</span> forecast<span style="color: #339933;">;</span>
&nbsp;
			<span style="color: #666666; font-style: italic;">//System.out.println(stories.size());</span>
&nbsp;
			<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory story <span style="color: #339933;">:</span> stories<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
				<span style="color: #000066; font-weight: bold;">double</span> current <span style="color: #339933;">=</span> perturbations<span style="color: #009900;">&#91;</span>story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>.<span style="color: #006633;">getInfluence</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
				perturbations<span style="color: #009900;">&#91;</span>story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>.<span style="color: #006633;">setInfluence</span><span style="color: #009900;">&#40;</span>current <span style="color: #339933;">+</span> proportion <span style="color: #339933;">*</span> distributed<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #009900;">&#125;</span>
		<span style="color: #009900;">&#125;</span>
		last <span style="color: #339933;">=</span> next<span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">return</span> perturbations<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Temporal interference works essentially the same as before, but now takes advantage of interval trees, making it significantly faster.  Also, I added a parameter for learning rate.  This allows large changes in the estimates initially, but slowly limits the amount that can change, which guarantees convergence, which was not necessarily guaranteed before.</p>
<p>The code is below:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> Perturbation<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	NewsCorpus corpus <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getCorpus</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	DataHistory data <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getDataHistory</span><span style="color: #009900;">&#40;</span>index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	List<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> newsList <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	IntervalTree<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> intervalTree <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IntervalTree<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	Perturbation<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> perturbations <span style="color: #339933;">=</span> initializer.<span style="color: #006633;">getSentiment</span><span style="color: #009900;">&#40;</span>ticker, index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory story <span style="color: #339933;">:</span> newsList<span style="color: #009900;">&#41;</span>
		intervalTree.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> forecast, story<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000066; font-weight: bold;">double</span> maxError <span style="color: #339933;">=</span> <span style="color: #003399;">Double</span>.<span style="color: #006633;">MAX_VALUE</span><span style="color: #339933;">;</span>
	<span style="color: #000066; font-weight: bold;">int</span> iteration <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span>maxError <span style="color: #339933;">&gt;</span> epsilon<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066; font-weight: bold;">double</span> rate <span style="color: #339933;">=</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">exp</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">-</span>iteration<span style="color: #339933;">*</span>learningRate<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		maxError <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
		<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory story <span style="color: #339933;">:</span> newsList<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>				
			<span style="color: #000066; font-weight: bold;">double</span> price0 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> price1 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> forecast<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> priceMove <span style="color: #339933;">=</span> price1 <span style="color: #339933;">-</span> price0<span style="color: #339933;">;</span>
&nbsp;
&nbsp;
			List<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> interferons <span style="color: #339933;">=</span> intervalTree.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> forecast<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> sumInterference <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
			<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory interferon <span style="color: #339933;">:</span> interferons<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
				<span style="color: #000066; font-weight: bold;">long</span> intersection <span style="color: #339933;">=</span> forecast <span style="color: #339933;">-</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">abs</span><span style="color: #009900;">&#40;</span>interferon.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
				<span style="color: #000066; font-weight: bold;">double</span> interference <span style="color: #339933;">=</span> perturbations<span style="color: #009900;">&#91;</span>interferon.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>.<span style="color: #006633;">getInfluence</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> intersection<span style="color: #339933;">;</span>
				sumInterference <span style="color: #339933;">+=</span> interference<span style="color: #339933;">;</span>
			<span style="color: #009900;">&#125;</span>
&nbsp;
			<span style="color: #000066; font-weight: bold;">double</span> old <span style="color: #339933;">=</span> perturbations<span style="color: #009900;">&#91;</span>story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>.<span style="color: #006633;">getInfluence</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">double</span> error <span style="color: #339933;">=</span> priceMove <span style="color: #339933;">-</span> sumInterference<span style="color: #339933;">;</span>
			perturbations<span style="color: #009900;">&#91;</span>story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>.<span style="color: #006633;">setInfluence</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>old<span style="color: #339933;">*</span>forecast<span style="color: #339933;">+</span>rate<span style="color: #339933;">*</span>error<span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span>forecast<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
			maxError <span style="color: #339933;">=</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">max</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">abs</span><span style="color: #009900;">&#40;</span>rate<span style="color: #339933;">*</span>error<span style="color: #009900;">&#41;</span>, maxError<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
		iteration<span style="color: #339933;">++;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">return</span> perturbations<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Of course, how well does this new method perform compared with the old method?  For a baseline, I wrote a new NaivePriceMove Modeler, and analyzed the results for the <a href="http://thekevindolan.com/2010/02/more-test-data/">5 data sets</a>.  I used the correct duration for all of these tests.</p>
<p>Accuracy from Naive Price Move alone was: 0.637, 0.614, 0.566, 0.553, 0.568 respectively.  Error was:  1.42, 10.83, 15.51, 6.32, 9.60!</p>
<p>Accuracy from Elementary Average alone was: 0.637, 0.613, 0.569, 0.553, 0.568 &#8212; almost identical.  Error was: 0.312, 0.335, 0.336, 0.322, 0.328.</p>
<p>Note that though accuracy was almost identical, error (average squared) was significantly reduced in the case of Elementary Average&#8211;also more consistent.</p>
<p>Let&#8217;s see how temporal interference performs as initialized by either.</p>
<p>Temporal Interference initialized by Naive Price Move was: 0.944, 0.902, 0.887, 0.852, 0.889.  Error was: 0.032, 0.065, 0.114, 0.097, 0.070.</p>
<p>Temporal Interference initialized by Elementary Average was: 0.943, 0.902, 0.887, 0.852, 0.889.  Error was: 0.033, 0.065, 0.114, 0.098, 0.071.</p>
<p>As we can see, the results are almost identical.  This was interesting to me, and made me wonder whether or not the initial values even matter.  So I tried two more experiments, one in which I initialized the influences to 0.  The other where I initialized to random values between -1 and 1.</p>
<p>In the case of zeroes, the accuracy was obviously 0, and the error floated around 0.33 pretty closely.  When we initialize T.I. with zeroes, there is no change in accuracy or error at all.</p>
<p>In the case of random initialization, the accuracy was around 0.5, and the error floated around 0.66 pretty closely&#8211; as expected.  When we initialize T.I. with random values, we did see some change.  Accuracies exhibited were: 0.889, 0.864, 0.829, 0.799, 0.837.  Errors were: 0.064, 0.108, 0.179, 0.156, 0.117.</p>
<p>So it doesn&#8217;t converge to some value inherent to the system regardless of what we initialize to, but it does depend on what we seed, however, only to some degree.  I think the the zeroes case reduces to initializing with Naive Price move, but I may be wrong.  Also, Naive Price Move and Elementary Average both tend to result in the same direction of price move, so maybe that&#8217;s important.</p>
<p>There is one more test to run, and that&#8217;s a throwback to whether concurrent news articles introduce any major problems.  I generated another data set that creates two news articles at each time point.  Because there is no way to tease apart which one is good and which one is bad here, we expect low performance.  But we want to find out if one method performs better than the other on this particular test.</p>
<p>I used T.I. for all tests, and varied the initializer.  Initialized with zeroes, we saw an accuracy of 0.709, error of 0.280.  Initialized with random, we saw an accuracy of 0.609, error of 0.548.  Initialized with N.P.M, accuracy 0.517, error 12.619.  Initialized with E.A., accuracy 0.708, error 0.280.</p>
<p>Interesting!  Initialing with N.P.M. appears, as expected, to have disastrous results when there are news articles at the same time.  It actually performs worse than random initialization!  Also, note that zero-initialization seems to reduce to E.A., rather than N.P.M. as I had initially anticipated.</p>
<p>Now there is one final question to this whole analysis, efficiency.  Perhaps one method converges more quickly than another.  I counted the number of iterations taken to solve data set 4, initialized by different methods.  Zeroes: 388.  Random: 379.  N.P.M.: 385.  E.A.: 388.</p>
<p>It appears that there were no major differences in runtime.  Perhaps it is more strongly related to the learning rate than anything else.  Again, we see zeroes equaling E.A., indicating that perhaps the best idea is to simply initialize with zeroes strictly for code simplicity.</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/perturbation-modeling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More Test Data</title>
		<link>http://thekevindolan.com/2010/02/more-test-data/</link>
		<comments>http://thekevindolan.com/2010/02/more-test-data/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 01:30:40 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=786</guid>
		<description><![CDATA[
I got back to working on the automated news analysis algorithm again, and thought that it would be wise to generate some new test data that will have some more context to it.  I wrote a simple algorithm that I discuss here, and I generated some data sets.
The basic idea was simple.  I want to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/more-test-data"><img class="aligncenter size-full wp-image-787" title="tumblr_kvk45k0Nam1qznckp" src="http://thekevindolan.com/wp-content/uploads/2010/02/tumblr_kvk45k0Nam1qznckp.jpg" alt="tumblr_kvk45k0Nam1qznckp" width="487" height="500" /></a></p>
<p>I got back to working on the automated news analysis algorithm again, and thought that it would be wise to generate some new test data that will have some more context to it.  I wrote a simple algorithm that I discuss here, and I generated some data sets.<span id="more-786"></span></p>
<p>The basic idea was simple.  I want to in the future use some measure of similarity between documents that is smarter than the traditional <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">tf.idf approach</a> but at the moment, I don&#8217;t know which methods to use (as this is a large part of the project as a whole).  That being said, I still need to build a foundation for assigning sentiment to news stories for which we know what happens afterwards!</p>
<p>So a reasonable solution, I think, would be to generate some test data sets that will perform well using tf.idf, and then use that them to isolate the other aspects of the problem.</p>
<p>To generate these news corpuses, I used a method similar to the first method, except I now set the content of the news stories to be a nonsense string of several words.  I built 3 sets of 50 random words, and these sets are good, bad, and neutral.  Neutral words have some constant probability of appearing in any document, and the good/bad words are selected in proportion to influence of the generated news article.</p>
<p>The code I used specifically is reproduced below:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> NewsCorpus generate<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	NewsCorpus corpus <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> NewsCorpus<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000066; font-weight: bold;">long</span> time <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> numStories<span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		<span style="color: #000066; font-weight: bold;">double</span> positive <span style="color: #339933;">=</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">double</span> influence <span style="color: #339933;">=</span> positive <span style="color: #339933;">*</span> <span style="color: #cc66cc;">2</span> <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
&nbsp;
		<span style="color: #003399;">StringBuffer</span> sb <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">StringBuffer</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">int</span> numWords <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> maxWords<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> j <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> j <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> numWords<span style="color: #339933;">;</span> j<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> wordList<span style="color: #339933;">;</span>
			<span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> neutralProportion<span style="color: #009900;">&#41;</span>
				wordList <span style="color: #339933;">=</span> neutralWords<span style="color: #339933;">;</span>
			<span style="color: #000000; font-weight: bold;">else</span> <span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> positive<span style="color: #009900;">&#41;</span>
				wordList <span style="color: #339933;">=</span> goodWords<span style="color: #339933;">;</span>
			<span style="color: #000000; font-weight: bold;">else</span>
				wordList <span style="color: #339933;">=</span> badWords<span style="color: #339933;">;</span>
			sb.<span style="color: #006633;">append</span><span style="color: #009900;">&#40;</span>wordList<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">Math</span>.<span style="color: #006633;">random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> <span style="color: #cc66cc;">50</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; &quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
&nbsp;
		NewsStory story <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> NewsStory<span style="color: #009900;">&#40;</span>time, <span style="color: #0000ff;">&quot;Context Corpus Generator&quot;</span>,
				<span style="color: #0000ff;">&quot;News Story #&quot;</span><span style="color: #339933;">+</span>i, <span style="color: #0000ff;">&quot;LINEAR;&quot;</span><span style="color: #339933;">+</span>influence<span style="color: #339933;">+</span><span style="color: #0000ff;">&quot;;&quot;</span><span style="color: #339933;">+</span>timeFrame, sb.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
		corpus.<span style="color: #006633;">addNews</span><span style="color: #009900;">&#40;</span>story<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		time <span style="color: #339933;">+=</span> timeStep<span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #000000; font-weight: bold;">return</span> corpus<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>I also generated several data sets using this method.  For all data sets, I used a word limit of 50 and a timestep of 1.  The data set summaries are below:</p>
<p><strong>Data Set 1: </strong>1000 articles, timeframe of 10, 0.33 neutral proportion</p>
<p><strong>Data Set 2: </strong>1000 articles, timeframe of 30, 0.33 neutral proportion</p>
<p><strong>Data Set 3: </strong>1000 articles, timeframe of 50, 0.33 neutral proportion</p>
<p><strong>Data Set 4: </strong>1000 articles, timeframe of 50, 0.50 neutral proportion</p>
<p><strong>Data Set 5: </strong>3000 articles, timeframe of 50, 0.50 neutral proportion</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/more-test-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Interval Tree Java Implementation</title>
		<link>http://thekevindolan.com/2010/02/interval-tree/</link>
		<comments>http://thekevindolan.com/2010/02/interval-tree/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 00:49:07 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[data structure]]></category>
		<category><![CDATA[open-source]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=781</guid>
		<description><![CDATA[
In a recent Java project, I found myself needing to store several intervals of time which I could access readily and efficiently.  I only needed to build the tree once, so a static data structure would work fine, but queries needed to be as efficient as possible.
I found a data structure that accomplishes just this, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/interval-tree"><img class="aligncenter size-full wp-image-783" title="15_19_1---Tree--Sunrise--Northumberland_web" src="http://thekevindolan.com/wp-content/uploads/2010/02/15_19_1-Tree-Sunrise-Northumberland_web.jpg" alt="15_19_1---Tree--Sunrise--Northumberland_web" width="600" height="400" /></a></p>
<p>In a recent Java project, I found myself needing to store several intervals of time which I could access readily and efficiently.  I only needed to build the tree once, so a static data structure would work fine, but queries needed to be as efficient as possible.<span id="more-781"></span></p>
<p>I found a data structure that accomplishes just this, and<a href="http://en.wikipedia.org/wiki/Interval_tree" target="_blank"> interval tree</a>.</p>
<p>It&#8217;s a simple enough data structure, but I couldn&#8217;t find any Java implementations for it online.</p>
<p>I then went to coding the data structure at the airport last week, and just finished unit testing it to convince myself everything was good to go.  I&#8217;m making it available if you are interested, because it&#8217;s really a waste of time to hand-code a well-known data structure.</p>
<p>You can download <a href="http://thekevindolan.com/wp-content/uploads/2010/02/IntervalTree.jar">IntervalTree.jar</a>, and view the <a href="http://www.thekevindolan.com/code/intervalTree/IntervalTree.html">JavaDoc</a>.</p>
<p>It uses generic typing for the data object, but requires all the intervals to be expressed in terms of longs.  There are probably some obvious problems with catching programmer error.  For instance, if you search for an interval, but reverse start and end, I don&#8217;t know what will happen, nor do I care.</p>
<p>It is a static data structure, meaning it must be rebuilt anytime a change is made to the underlying data.  Rebuilds happen automatically if you try to make a query and it is out-of-sync, but you can build manually by calling .build() if you want to, and you can find out if it is currently in-sync by calling .inSync().</p>
<p>The following code snippet shows you how to use the library:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">IntervalTree it <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IntervalTree<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
it.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>0L,10L,<span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
it.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>20L,30L,<span style="color: #cc66cc;">2</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
it.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>15L,17L,<span style="color: #cc66cc;">3</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
it.<span style="color: #006633;">addInterval</span><span style="color: #009900;">&#40;</span>25L,35L,<span style="color: #cc66cc;">4</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003399;">List</span> result1 <span style="color: #339933;">=</span> it.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>5L<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">List</span> result2 <span style="color: #339933;">=</span> it.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>10L<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">List</span> result3 <span style="color: #339933;">=</span> it.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>29L<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">List</span> result4 <span style="color: #339933;">=</span> it.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>5L,15L<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Intervals that contain 5L:&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> r <span style="color: #339933;">:</span> result1<span style="color: #009900;">&#41;</span>
	<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>r<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Intervals that contain 10L:&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> r <span style="color: #339933;">:</span> result2<span style="color: #009900;">&#41;</span>
	<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>r<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Intervals that contain 29L:&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> r <span style="color: #339933;">:</span> result3<span style="color: #009900;">&#41;</span>
	<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>r<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Intervals that intersect (5L,15L):&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> r <span style="color: #339933;">:</span> result4<span style="color: #009900;">&#41;</span>
	<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>r<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>This code will output:</p>
<pre>Intervals that contain 5L:
1
Intervals that contain 10L:
1
Intervals that contain 29L:
2
4
Intervals that intersect (5L,15L):
3
1</pre>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/interval-tree/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Power Hower is Number One!</title>
		<link>http://thekevindolan.com/2010/02/power-hower-is-number-one/</link>
		<comments>http://thekevindolan.com/2010/02/power-hower-is-number-one/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 22:53:32 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[pagerank]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=778</guid>
		<description><![CDATA[
You may remember some time ago when I created the Power Hower: Power Hour Timer, well I was recently browsing my Google Analytics when I found a bizarre increase in traffic to the website, far above normal.  The cause, as investigation led me to discover, was somewhat surprising!
Power Hower is now the number one result [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/power-hower-is-number-one/"><img class="aligncenter size-full wp-image-779" title="number1" src="http://thekevindolan.com/wp-content/uploads/2010/02/number1.jpg" alt="number1" width="585" height="184" /></a></p>
<p>You may remember some time ago when I created the <a href="http://www.powerhower.com">Power Hower: Power Hour Timer</a>, well I was recently browsing my Google Analytics when I found a bizarre increase in traffic to the website, far above normal.  The cause, as investigation led me to discover, was somewhat surprising!<span id="more-778"></span></p>
<p>Power Hower is now the number one result on google for a search of Power Hour Timer.  For the longest time, it appeared near the bottom of the first page, but it&#8217;s now the very first.</p>
<p>Interestingly, the amount of traffic I&#8217;ve been getting since this happened, is about 4 times as much.</p>
<p>Just another demonstration of the power of good SERPs.</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/power-hower-is-number-one/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Never To See Any Other Way</title>
		<link>http://thekevindolan.com/2010/02/never-to-see-any-other-way/</link>
		<comments>http://thekevindolan.com/2010/02/never-to-see-any-other-way/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 22:43:49 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Potpourri]]></category>
		<category><![CDATA[illusion]]></category>
		<category><![CDATA[painting]]></category>
		<category><![CDATA[perspective]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=775</guid>
		<description><![CDATA[
I haven&#8217;t done any painting in a while, and frankly, I missed it a little.  So a couple weeks ago, when I visited NYC with Natalie, I saw a painting at the Metropolitan Museum of Art that I just fell in love with.
The painting looked like something along the lines of what you see here.  [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/never-to-see-any-other-way/"><img class="aligncenter size-medium wp-image-776" title="nevertosee" src="http://thekevindolan.com/wp-content/uploads/2010/02/nevertosee-437x500.jpg" alt="nevertosee" width="437" height="500" /></a></p>
<p>I haven&#8217;t done any painting in a while, and frankly, I missed it a little.  So a couple weeks ago, when I visited NYC with Natalie, I saw a painting at the Metropolitan Museum of Art that I just fell in love with.<span id="more-775"></span></p>
<p>The painting looked like something along the lines of what you see here.  The artist made a repetitive design of lines that formed an imaginary object boundary where the lines change direction.  The effect is that you see a diamond, when there really is none.</p>
<p>Another thing I found neat was that the artist made no real attempt at making lines that were particularly straight or even parallel.  The overall design of the painting sort-of just comes out as an emergent property.</p>
<p>The painting I did behaves similarly, but is quite different in its own right.  I wish I knew the name of the artist who did the original, but I can&#8217;t seem to remember.  If anyone recognizes the style, please let me know.</p>
<p>I have plans for embarking on a second painting in the same style, with a much more complex subject matter.  Get ready!</p>
<p><a href="http://thekevindolan.com/wp-content/uploads/2010/02/nevertosee.jpg">Click here for a full size.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/never-to-see-any-other-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Macro-Like Facebook GreaseMonkey Script</title>
		<link>http://thekevindolan.com/2010/02/macro-like/</link>
		<comments>http://thekevindolan.com/2010/02/macro-like/#comments</comments>
		<pubDate>Sun, 07 Feb 2010 21:44:10 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[annoying]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[greasemonkey]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=717</guid>
		<description><![CDATA[
Are you tired of micro-managing your likes?  We all know that a well-placed like on Facebook can be hilarious. But clicking like over and over is so difficult! Now you can just like everything with one easy click&#8230;
This is a GreaseMonkey script for clicking like on all stories on a page.  It works on any [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/macro-like/"><img class="aligncenter size-full wp-image-718" title="mattdamon" src="http://thekevindolan.com/wp-content/uploads/2010/02/mattdamon.jpg" alt="mattdamon" width="459" height="189" /></a></p>
<p>Are you tired of micro-managing your likes?  We all know that a well-placed like on Facebook can be hilarious. But clicking like over and over is so difficult! Now you can just like everything with one easy click&#8230;<span id="more-717"></span></p>
<p>This is a GreaseMonkey script for clicking like on all stories on a page.  It works on any Facebook page.</p>
<p>If you don&#8217;t have GreaseMonkey installed, you can find instructions from my blog on the <a href="http://thekevindolan.com/2009/07/follow-all-on-page/">Twitter Follow All script</a>.</p>
<p>Once you have it installed, you can install my Macro-Like script by clicking <a href="http://userscripts.org/scripts/source/68301.user.js">install</a>!</p>
<p>Now go get liking!</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/macro-like/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Perturbation Model of Price Movement</title>
		<link>http://thekevindolan.com/2010/02/perturbation-model/</link>
		<comments>http://thekevindolan.com/2010/02/perturbation-model/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 05:20:01 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>
		<category><![CDATA[perturbation model]]></category>
		<category><![CDATA[price history]]></category>
		<category><![CDATA[theory]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=706</guid>
		<description><![CDATA[
I was sitting in my networks class today, thinking of how it would be possible to implement an algorithm for taking into consideration the similarity of documents for teasing apart temporal interference, when I started coming to a more coherent model of what I&#8217;ve been trying to do in general.  This article will set up [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/perturbation-model/"><img class="aligncenter size-full wp-image-707" title="shwayze" src="http://thekevindolan.com/wp-content/uploads/2010/02/shwayze.jpg" alt="shwayze" width="399" height="399" /></a></p>
<p>I was sitting in my networks class today, thinking of how it would be possible to implement an algorithm for taking into consideration the similarity of documents for teasing apart temporal interference, when I started coming to a more coherent model of what I&#8217;ve been trying to do in general.  This article will set up some early ideas for a model of what&#8217;s going on, what we&#8217;re attempting to accomplish, and possible general procedures for doing so.  It also sets up some terminology.<span id="more-706"></span></p>
<p>Essentially, at this stage we have a data history.  This data history is made of two parts, a set of price point information and a set of several relevant news articles.</p>
<p>We will call our price point information, the <strong>Time-Sensitive Response Variable</strong>, or TSRV.  Let us explore what we are assuming about the TSRV.</p>
<p>I began to think about the idea of making analogies to physics, because that&#8217;s something I understand a little better than economics.  I think the way a lot of people approach the stock market for investing is to think about price as position.  This gives way to the idea that the market may often find itself trending one way or the other.  The idea behind a trend is that the price has a certain velocity, which is resistant to change (intertia),  until some outside influence (force/acceleration) causes a reversal or something of that nature.</p>
<p>Having looked at a lot of stock graphs, I am not so sure this is the case.  I understand many successful traders would disagree with me, but for the sake of this project, we are going to think of the TSRV for price as velocity, that is it is resistant to movement without outside influence&#8211;it experiences inertia.  In this concept the price over time is a derivative of some unknown value behind the scenes, which I intuitively feel might exist, that behaves more like the traditional concept of price.</p>
<p>You might say that the price over time is constantly gyrating madly about, so thinking that the TSRV is resistant to change is ridiculous, but keep in mind I said it was resistant to change&#8230;undisturbed.  There is a constant barrage of outside influence coming in to affect the price.  I consider these outside forces, the analogy of a force in physics, which is proportional to acceleration.</p>
<p>We will be calling these outside influence <strong>perturbations</strong>.  Perturbations could inevitably take many forms, but for simplicity we will be thinking of individual perturbations as being discrete chunks of constant force with finite lengths.  In physics, we know that the acceleration observed is due to the sum of forces acting on an object.  So too is the effect of the perturbations additive.</p>
<p>We understand that there are a great number of perturbations affecting any TSRV, some of which we know about, many of which we do not.  Furthermore, we generally only know the existence of some perturbations, not any details of their strength, direction, or duration.  The perturbations we do not know about, will generally seem to manifest themselves as noise, but it should be known that under this model, there is no random TSRV movement, only movement due to unconsidered perturbations.</p>
<p>For our purposes with regards to automated news analysis, we have a set of several relevant news articles that we assume have some effect on the movement of the price, in this fashion.  Our end-goal is to approximate the effect that news articles have.  According to our definition of the effect of perturbations, there are two dimensions of the effect of a perturbation, the strength/direction of the acceleration caused, the <strong>influence,</strong> and the length of time it effects the price, the  <strong>duration</strong>.</p>
<p>Given a set of perturbations, we want to find some way to determine their characteristics, so that we can hopefully use some similarity metric to indicate possibly future price moves.</p>
<p>It is well understood that the duration will vary for most news articles, but the computational difficulty of determining both seems at this juncture beyond the scope of what I can hope to accomplish.  Perhaps sometime in the future we can devise more complex algorithms, but for now we will focus on determining the influence of perturbations given some predetermined duration.</p>
<p>Now that we have a more formal understanding of our basic assumptions, we can consider possible means of accomplishing the approximation of the characteristics of perturbations, but I&#8217;ll save that for next time.</p>
<p>And yes, that&#8217;s a picture of Shwayze.</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/perturbation-model/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Two Approaches to News Rating</title>
		<link>http://thekevindolan.com/2010/02/two-approaches-to-news-rating/</link>
		<comments>http://thekevindolan.com/2010/02/two-approaches-to-news-rating/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 23:03:39 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>
		<category><![CDATA[test results]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=697</guid>
		<description><![CDATA[
Last time, I generated a few data sets for testing different methods of rating the training news articles.  This time, I actually implemented two of them, the naive approach I had used before, and the new-and-improved version taking into account temporal interference.
Let us first examine the architecture I am using.  I created an interface Sentiment, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/two-approaches-to-news-rating/"><img class="aligncenter size-full wp-image-698" title="flan" src="http://thekevindolan.com/wp-content/uploads/2010/02/flan.jpg" alt="flan" width="500" height="333" /></a></p>
<p><a href="http://thekevindolan.com/2010/02/test-data-sets/">Last time</a>, I generated a few data sets for testing different methods of rating the training news articles.  This time, I actually implemented two of them, the naive approach I had used before, and the new-and-improved version taking into account temporal interference.<span id="more-697"></span></p>
<p>Let us first examine the architecture I am using.  I created an interface Sentiment, which is to be implemented by classes that take a ticker and can find some sentiment estimations from it.  The signature for the method is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #008000; font-style: italic; font-weight: bold;">/**
* Get the sentiment for the news articles in a ticker
* @param ticker   the ticker to inspect
* @param forecast the forecast parameter
* @param index	   the data index to inspect
* @return		 an array where
* 					array[0][i] = influence of i
* 					array[1][i] = time-frame of i, in terms of multiples of forecast
*/</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">long</span> forecast, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The idea behind doing it this way, returning a two dimensional array, is that I may eventually want to implement a more complex learning method that can also tease apart what scale it is that a news article has an effect, since I imagine some news articles represent more long-term impact, while others represent more short-term impact.  But that&#8217;s for the future.</p>
<p>The implementation for the naive approach is reproduced below:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">long</span> forecast, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	DataHistory data <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getDataHistory</span><span style="color: #009900;">&#40;</span>index<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	NewsCorpus corpus <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getCorpus</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">2</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>corpus.<span style="color: #006633;">getCount</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> corpus.<span style="color: #006633;">getCount</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		NewsStory story <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span>i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">long</span> time <span style="color: #339933;">=</span> story.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">double</span> price1 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>time<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		time <span style="color: #339933;">+=</span> forecast<span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">double</span> price2 <span style="color: #339933;">=</span> data.<span style="color: #006633;">getData</span><span style="color: #009900;">&#40;</span>time<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
		result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>price2 <span style="color: #339933;">-</span> price1<span style="color: #009900;">&#41;</span> <span style="color: #339933;">/</span> forecast<span style="color: #339933;">;</span>
		result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #000000; font-weight: bold;">return</span> result<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The essential idea is that you just look ahead and see how the price moved some time after the news article.  This was the approach I had used on my original project.  Having now had the chance to test this method on my theoretically generated data, I can see just how ineffective this is.</p>
<p>For data set 1, the performance (in terms of proportion of sentiments in which the correct sign was estimated), was 0.84.  That value is not entirely bad, and would be acceptable, except we find that the data set is actually quite simple.  For the other sets, the performance was 0.72, 0.62, 0.56, and 0.70 respectively.  On the hard data set (#4), the value was only slightly better than chance by random guessing.</p>
<p>I also tried to see what would happen on a larger data set, a newly generated one, which can be found <a href="http://thekevindolan.com/wp-content/uploads/2010/02/Corpus6.zip">here</a>.  This corpus had a time-frame of 50 and a time-step of 1, making it similar to data set 4.  Performance on data set 6 was poor, again at about 0.56, indicating the performance is very heavily correlated to the ratio of time-frames to time-steps.</p>
<p>The summary of results for the naive approach can be found <a href="http://thekevindolan.com/wp-content/uploads/2010/02/Result1.zip">here</a>.</p>
<p>It&#8217;s no wonder my initial attempts failed so horribly, at the very base of the methodology was a very poorly thought-out concept.</p>
<p>My next attempt was to try the method that teased apart temporal influences.  The general approach was to observe a price change, and then subtract off the effect from all news articles before it, within the forecast parameters.  The value we used to subtract off was an assumed value, initialized to the naive price move.  This approach is then repeated several times, until we come to some steady state.</p>
<p>My initial findings were that this method performed with almost identical accuracy as the previous test.  There had to be something wrong.</p>
<p>It was then that I realized that I also needed to be subtracting off the influence from news articles ahead of the target news article.  This increased the necessary number of iterations, but the results amazed me.</p>
<p>Keep in mind, there is a parameter here, the epsilon value, the acceptable error.  High values of epsilon meant longer run-times, but often more accurate results.</p>
<p>With an epsilon value of 1E-4, the number of iterations required for the 5 main data sets were: 18, 40, 60, 125, 33.</p>
<p>At that epsilon, the accuracy values were: 0.96, 0.93, 0.92, 0.87, 0.99.</p>
<p>Detailed results for this test can be found <a href="http://thekevindolan.com/wp-content/uploads/2010/02/Result2.zip">here</a>.</p>
<p>At that value of epsilon, I was not able to run the test on the data set number 6.  Early anecdotal notes on performance, are that the accuracy seems to also be correlated to the ratio of time-frame to time-step, but does not seem to drop off as quickly.  Number of iterations seems to be almost linear to the same ratio.</p>
<p>Wanting to do some more experiments on performance, I tried setting the value of epsilon to 1E-8.</p>
<p>The first data set required 1387 iterations, but had accuracy 0.97.  The second data set took too long to finish.  The fifth data set took only 189 iterations, and only got a single sentiment wrong.</p>
<p>Reducing the epsilon value to 1E-2, we needed iterations  4, 11, 23, 27, 13 and got accuracies 0.94, 0.90, 0.88, 0.82, 0.94.  I gave the algorithm 3 minutes to do data set 6, and it was still unable to finish.</p>
<p>At 1E-1, or 0.1, we found runtime to be:  3, 8,  20, 21, 11 and accuracy to be: 0.93, 0.90,  0.88, 0.80,  0.94.  Here we start to notice less significant dropoff.  Keep in mind that each iteration probably causes the values to move less and less, and the values at each iteration are deterministic and not dependent on epsilon.  At this epsilon, data set 6 could be processed in 120 iterations (which took about 6 minutes), and got an accuracy of 0.89, which is actually greater than the epsilon for data set 4 at epsilon of 1E-4, but not by much.</p>
<p>Seeing how this one performed, however, led me to discover a wildly inefficient chunk of code that would make this algorithm painfully slow as we increase the number of news articles, the nested for loops that check every single article against every other at each iteration.  I did it this was, because the news articles might not necessarily be sorted, however, this really only needs to be done once, and we can cache the results.</p>
<p>Making this change did not change the results obtained at all, but did make the runtime seriously faster.</p>
<p>At 1E-1, data set 6 was now complete in under 30 seconds.  I then ran data set 6 with different values of epsilon.  At 1E-2, we required 142 iterations, with an accuracy of 0.89.  At 1E-4, we required 221 iterations, with accuracy 0.91.  Feeling good, I tried 1E-8, but lost patience.  At 1E-5, 313 iterations with accuracy 0.92.  At 1E-6: 1173 iterations (a sharp jump!), but the accuracy too increased to 0.94.</p>
<p>You can draw your own conclusions about what&#8217;s really going on here from these several data points, I&#8217;m not making any real claims yet, but I can imagine there is still a lot to learn.</p>
<p>Here is the code, as it currently exists:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000066; font-weight: bold;">double</span> epsilon <span style="color: #339933;">=</span> 1E<span style="color: #339933;">-</span>8<span style="color: #339933;">;</span>
&nbsp;
@SuppressWarnings<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;unchecked&quot;</span><span style="color: #009900;">&#41;</span>
@Override
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> getSentiment<span style="color: #009900;">&#40;</span>Ticker ticker, <span style="color: #000066; font-weight: bold;">long</span> forecast, <span style="color: #000066; font-weight: bold;">int</span> index<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #666666; font-style: italic;">//Get the observed price moves</span>
	<span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> priceMove <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> NaivePriceMove<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getSentiment</span><span style="color: #009900;">&#40;</span>ticker, forecast, index<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000066; font-weight: bold;">int</span> count <span style="color: #339933;">=</span> priceMove.<span style="color: #006633;">length</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #666666; font-style: italic;">//Create the empty result table</span>
	<span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">2</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>count<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> count<span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> priceMove<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
		result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	NewsCorpus corpus <span style="color: #339933;">=</span> ticker.<span style="color: #006633;">getCorpus</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #666666; font-style: italic;">//Cache the proximities</span>
	List<span style="color: #339933;">&lt;</span>Integer<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> proximities <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>List<span style="color: #339933;">&lt;</span>Integer<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">new</span> List<span style="color: #339933;">&lt;?&gt;</span><span style="color: #009900;">&#91;</span>count<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> current <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> current <span style="color: #339933;">&lt;</span> count<span style="color: #339933;">;</span> current<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		proximities<span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ArrayList<span style="color: #339933;">&lt;</span>Integer<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000066; font-weight: bold;">long</span> time <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span>current<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		List<span style="color: #339933;">&lt;</span>NewsStory<span style="color: #339933;">&gt;</span> range <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span>time <span style="color: #339933;">-</span> forecast <span style="color: #339933;">+</span> <span style="color: #cc66cc;">1</span>, time <span style="color: #339933;">+</span> forecast <span style="color: #339933;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #666666; font-style: italic;">//Go through the other news events</span>
		<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span>NewsStory story <span style="color: #339933;">:</span> range<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #000066; font-weight: bold;">int</span> other <span style="color: #339933;">=</span> story.<span style="color: #006633;">getId</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>other <span style="color: #339933;">!=</span> current<span style="color: #009900;">&#41;</span> 
				proximities<span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>other<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #666666; font-style: italic;">//Keep going until we reach an acceptable error</span>
	<span style="color: #000066; font-weight: bold;">double</span> error <span style="color: #339933;">=</span> <span style="color: #003399;">Double</span>.<span style="color: #006633;">MAX_VALUE</span><span style="color: #339933;">;</span>
	<span style="color: #000066; font-weight: bold;">int</span> iter <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">while</span><span style="color: #009900;">&#40;</span>error <span style="color: #339933;">&gt;</span> epsilon<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		iter<span style="color: #339933;">++;</span>
		error <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
		<span style="color: #666666; font-style: italic;">//Go through each news event</span>
		<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> current <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> current <span style="color: #339933;">&lt;</span> count<span style="color: #339933;">;</span> current<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #000066; font-weight: bold;">double</span> observed <span style="color: #339933;">=</span> forecast <span style="color: #339933;">*</span> priceMove<span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000066; font-weight: bold;">long</span> time <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span>current<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #666666; font-style: italic;">//Go through the other news events</span>
			<span style="color: #000000; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> other <span style="color: #339933;">:</span> proximities<span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
				<span style="color: #000066; font-weight: bold;">long</span> otherTime <span style="color: #339933;">=</span> corpus.<span style="color: #006633;">getNews</span><span style="color: #009900;">&#40;</span>other<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getTime</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
				<span style="color: #000066; font-weight: bold;">long</span> difference <span style="color: #339933;">=</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">abs</span><span style="color: #009900;">&#40;</span>time <span style="color: #339933;">-</span> otherTime<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
				<span style="color: #666666; font-style: italic;">//Depending on how much overlap, discount the influence of the other</span>
				<span style="color: #000066; font-weight: bold;">long</span> effect <span style="color: #339933;">=</span> forecast <span style="color: #339933;">-</span> difference<span style="color: #339933;">;</span>
				observed <span style="color: #339933;">-=</span> result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>other<span style="color: #009900;">&#93;</span> <span style="color: #339933;">*</span> effect<span style="color: #339933;">;</span>
			<span style="color: #009900;">&#125;</span>
			observed <span style="color: #339933;">/=</span> forecast<span style="color: #339933;">;</span>
			<span style="color: #666666; font-style: italic;">//Error is max error</span>
			<span style="color: #000066; font-weight: bold;">double</span> difference <span style="color: #339933;">=</span> observed <span style="color: #339933;">-</span> result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			difference <span style="color: #339933;">*=</span> difference<span style="color: #339933;">;</span>
			error <span style="color: #339933;">=</span> <span style="color: #003399;">Math</span>.<span style="color: #006633;">max</span><span style="color: #009900;">&#40;</span>difference, error<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			result<span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span>current<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> observed<span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>iter <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; iterations.&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">return</span> result<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/two-approaches-to-news-rating/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Test Data Sets</title>
		<link>http://thekevindolan.com/2010/02/test-data-sets/</link>
		<comments>http://thekevindolan.com/2010/02/test-data-sets/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 23:57:04 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[automated news analysis]]></category>
		<category><![CDATA[computational finance]]></category>
		<category><![CDATA[data set]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[theory]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://thekevindolan.com/?p=690</guid>
		<description><![CDATA[
From my last post, I introduced the idea of creating test data sets for the purpose of finding an algorithm to tease apart the influence of individual news articles.  I have done just that and am posting the data sets for further analysis.
My method for generating these test files was as the following pseudocode describes:
-Take [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thekevindolan.com/2010/02/test-data-sets"><img class="aligncenter size-full wp-image-691" title="test-data" src="http://thekevindolan.com/wp-content/uploads/2010/02/test-data.jpg" alt="test-data" width="400" height="300" /></a></p>
<p>From my last post, I introduced the idea of creating test data sets for the purpose of finding an algorithm to tease apart the influence of individual news articles.  I have done just that and am posting the data sets for further analysis.<span id="more-690"></span></p>
<p>My method for generating these test files was as the following pseudocode describes:</p>
<p>-Take 3 parameters, TIME-STEP,  TIME-FRAME, and COUNT.</p>
<p>-Create COUNT news articles, each with the following encoded in their summary field:</p>
<p style="padding-left: 30px;">-Time-frame equal to TIME-FRAME</p>
<p style="padding-left: 30px;">-Influence randomly set between [-1,1]</p>
<p>-For each timestep 0 through (TIME-STEP * COUNT)</p>
<p style="padding-left: 30px;">-Find all news articles before current time, within their Time-frame value of now</p>
<p style="padding-left: 30px;">-Add the sum of those news articles&#8217; Influence values to the current price</p>
<p style="padding-left: 30px;">-Record the current price</p>
<p>Because we defined a constant TIME-FRAME ahead of time, a simpler algorithm could have been used, but I am planning on attempting experiments with variable time-frames at a later date, so this was a sensible solution to save myself some work in the future.</p>
<p>I created 6 data sets, each with 500 data points, as follows:</p>
<p><strong>Data set 0</strong></p>
<p>TIME-STEP: 1</p>
<p>TIME FRAME: 1</p>
<p><strong>Data set 1</strong></p>
<p>TIME-STEP: 1</p>
<p>TIME FRAME: 2</p>
<p><strong>Data set 2</strong></p>
<p>TIME-STEP: 1</p>
<p>TIME FRAME: 5</p>
<p><strong>Data set 3</strong></p>
<p>TIME-STEP: 1</p>
<p>TIME FRAME: 10</p>
<p><strong>Data set 4</strong></p>
<p>TIME-STEP: 1</p>
<p>TIME FRAME: 50</p>
<p><strong>Data set 5<br />
</strong></p>
<p>TIME-STEP: 3</p>
<p>TIME FRAME: 17</p>
<p>The motivation for choosing the values for data-sets 1-4 are simple, to see the effects of using longer and longer time-frames relative to time-steps.  Data set 5 exists for the sole purpose of seeing if any problems are present with weird offsets.  If we see anything unexpected there, future research may be necessary.</p>
<p>I have attached a zip file of the corpus, if you are interested: <a href="http://thekevindolan.com/wp-content/uploads/2010/02/data1.zip">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thekevindolan.com/2010/02/test-data-sets/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
