<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.9.2 (http://www.squarespace.com/) on Wed, 10 Mar 2010 10:06:28 GMT--><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><title>Jig's Blog</title><link>http://www.jijnes.com/blog/</link><description></description><lastBuildDate>Tue, 09 Mar 2010 09:50:23 +0000</lastBuildDate><copyright></copyright><language>en-US</language><generator>Squarespace Site Server v5.9.2 (http://www.squarespace.com/)</generator><item><title>Word of Tweets</title><category>twitter</category><dc:creator>JIjnes</dc:creator><pubDate>Tue, 09 Mar 2010 05:38:49 +0000</pubDate><link>http://www.jijnes.com/blog/2010/3/9/word-of-tweets.html</link><guid isPermaLink="false">442154:4931149:6951511</guid><description><![CDATA[<p>Discovering information on the internet has evolved throughout the years.  As we've hit a point where information is pushed on us in near real-time, discovery and curation have become more important than ever.</p>

<p>At one point in the history of the internet, websites were discovered through word of mouth.  Your friend, family member, coworker, told you about some great thing at "http://" that he or she heard about.  Word of mouth expanded its reach, and medium to passing links on through email and websites, curating links for users.</p>

<p>Then came along RSS, allowing you to subscribe to your favorite and or trusted sources. This allowed information to flow to you automatically, as well as a greater influence and understanding of trending topics from the website author's perspective.</p>

<p><a href="http://delicious.com/">Del.icio.us</a> offered interesting view on information, providing the ability to view what everyone is bookmarking, sharing your bookmark, or to even view what bookmarks are available on a given topic.  For myself, Delicious has become an invaluable research tool.</p>

<p><a href="http://digg.com/">Digg</a> changed the game on how information is curated.  Digg allowed its users to submit, vote, and comment on articles, making it a fully community driven news source.  Voting and commenting added an "interesting" factor to any story.</p>

<p>As of lately, <a href="http://twitter.com/">Twitter</a>, has become my source of trending topics, alongside Digg, and my RSS feed subscribing to a small list of blogs I enjoy reading. With Twitter, you have several streams of tweets.  Public tweets that are sample of what the Twitter community is chirping about.  Lists which enable you to curate based on a topic, and of course your following list, which are individuals you want to always to see tweets from.</p>

<p>Like many services, Twitter publishes an API for developers to leverage in their own applications.  One such API allows an application to sample tweets using an algorithm Twitter has decided on.  Over a day, I managed to capture 1,331,214 tweets, a small percentage of overall tweets per day, but enough to run basic analysis against.</p>

<p>Of those 1.3 million tweets, 21% contained links to websites.</p>

<p><img src="http://www.jijnes.com/resource/-?fileId=6062052" alt="t1.png" border="0" width="214" height="281" align="center" />   </p>

<p>With 21% of tweets containing links, you stand a fair chance of discovering some tidbit of information out there.  More importantly though, is the link interesting, and does the Twitter community find it interesting by retweeting it.</p>

<p>As expected, majority of the links were shortened using a service like <a href="http://bit.ly/">bit.ly</a>.</p>

<p><img src="http://www.jijnes.com/resource/-?fileId=6062609" alt="t2.png" border="0" width="363" height="165" align="center" /></p>

<p>Expanding the URLs took a bit of time, but allowed for comparisons and uniqueness across the sampled set.  Pictures, videos, location were the top links.  URL4.eu is a flaw in the analysis tool as URL4 proxies content through its service instead of just redirecting like other shorteners.</p>

<p><img src="http://www.jijnes.com/resource/-?fileId=6062643" alt="t3.png" border="0" width="391" height="171" align="center" /></p>

<p>In the sample tweet data set, majority of the links, about 99%, only occurred once.  There were some that occurred 20 to 40 times, a handful more than that.  One occurred 1500 times, a link to LilTwistTV on UStream.com.  Now all the tweets did not occur at once, but over a fourteen hour period, slowing tailing off at the end.</p>

<p><img src="http://www.jijnes.com/resource/-?fileId=6062800" alt="t4.png" border="0" width="300" height="233" align="center" /></p>

<p>This is no different behavior than any hot topic in a community, without the location barriers.  A similar trend can be seen at a smaller scale for a MacHeist promotion.</p>

<p><img src="http://www.jijnes.com/resource/-?fileId=6062855" alt="t5.png" border="0" width="288" height="237" align="center" /></p>

<p>While I was reviewing a sampled data set, the full stream of tweets would either amplified this trend, or at a minimum resulted in the same number as in the sample.  Whether you are a fan of LilTwistTV or not, the frequent retweeting increases its network, and visibility to you.</p>

<p>Twitter is now word of mouth.</p>
]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-6951511.xml</wfw:commentRss></item><item><title>Mosaic Sumo</title><category>processing</category><dc:creator>JIjnes</dc:creator><pubDate>Tue, 26 Jan 2010 05:30:39 +0000</pubDate><link>http://www.jijnes.com/blog/2010/1/26/mosaic-sumo.html</link><guid isPermaLink="false">442154:4931149:6431208</guid><description><![CDATA[<p><a title="View 'Mosaic Sumo' on Flickr.com" href="http://www.flickr.com/photos/37341483@N00/4305201931"><img src="http://farm5.static.flickr.com/4046/4305201931_d806d2be96.jpg" border="0" alt="Mosaic Sumo" width="343" height="500" align="left" /></a></p><BR CLEAR=ALL>
<p>&nbsp;I've always enjoyed photographic and illustrative mosaic compositions. &nbsp;One of my favorites is a poster of Darth Vader composed of Star Wars film stills. &nbsp;Using <a href="http://processing.org">Processing</a>, I've created my own mosaic using one of my pictures composed of my <a href="http://www.flickr.com/photos/jig/">Flickr</a> images.&nbsp;</p>
<p>Processing APIs brightness and lerpColor were used to find the image that was closest in brightness for a 10x10 tile and lerpColor was used to blend pixels from my Flickr images to the sumo image.</p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-6431208.xml</wfw:commentRss></item><item><title>Found Elsewhere on 01/21/2010</title><dc:creator>JIjnes</dc:creator><pubDate>Fri, 22 Jan 2010 03:06:01 +0000</pubDate><link>http://www.jijnes.com/blog/2010/1/21/found-elsewhere-on-01212010.html</link><guid isPermaLink="false">442154:4931149:6395400</guid><description><![CDATA[<ul>
<li><p><a href="http://www.good.is/post/transparency-is-new-airport-security-causing-flight-delays/" title="Transparency: Is New Airport Security Causing Flight Delays?">Transparency: Is New Airport Security Causing Flight Delays?</a> - GOOD magazine looks at the increase in flight delays during the holidays over the past two years.</p></li>
<li><p><a href="http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/">How to Make a Heatmap – a Quick and Easy Solution</a> - Nathan Yau of FlowingData showing how to create a heatmap in one of my favorite tools, R.</p></li>
<li><p><a href="http://developer.yahoo.net/blogs/hadoop/2010/01/hadoop_bay_area_january_2010_u.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+YDNHadoop+%28Hadoop+and+Distributed+Computing+at+Yahoo%21%29">Hadoop Bay Area January 2010 User Group - Recap</a> - Slides from the recent Hadoop Bay Area Group meeting.</p></li>
</ul>
]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-6395400.xml</wfw:commentRss></item><item><title>Delicious Curation</title><category>delicious</category><category>information</category><category>productivity</category><category>productivity</category><category>python</category><category>rss</category><category>tools</category><dc:creator>JIjnes</dc:creator><pubDate>Mon, 18 Jan 2010 21:11:30 +0000</pubDate><link>http://www.jijnes.com/blog/2010/1/18/delicious-curation.html</link><guid isPermaLink="false">442154:4931149:6361910</guid><description><![CDATA[<p>In the first decade of 2000, effort on information retrieval was focused on discovery of good sources.  As people found information sources they enjoyed and respected, data formats such as RSS were utilized as well as tools such as <a href="http://delicious.com">Delicious</a> came to be. Towards the end of the decade, information sources were plenty, content was widely available from traditional sources such as news media companies as well as blogs.  Twitter and Facebook also came to be and provided a mechanism to exchange information amongst trusted circles, your friends.</p>
<p>The challenge I face today consists of 200 RSS feeds, 50 some odd Delicious subscriptions, Twitter links, Facebook links, and on rare occasions, links via email.  Any post that I feel I will read will be sent to Instapaper, which has become my list of good reads.  Inundated with information, I need a way to curate these data sources so that I am only looking at a handful of hot items in a given topics.</p>
<p>The largest time kill would have to be feeds from Delicious.  Amongst any given tag, there can be redundant posts, within any given tag, a post can appear everyday.  Delicious does not track what you have seen, as it focuses on what's newly bookmarked and what's popular.  Using Google Reader or NetNewsWire to consume Delicious feeds becomes an endless battle because of these reasons.</p>
<p>As an initial step in improving my information gathering process, I've created a very basic python script to grab Delicious feeds, track posts that have been downloaded before, and remove duplicates.  My hope is that this will greatly reduce the time spent scanning through Delicious feeds.  The script will tack interested tags, persist posts to a local SQLite3 database, and export an RSS file that NetNewsWire consumes.</p>
<p>The next step is going to be to do the same for existing RSS feeds, make the curation process into a Django application and look into having Shaun Inman's <a href="http://feedafever.com/">Fever</a> application add a layer of what are hot posts to it.</p>
<p>I've pasted the python code below, feel free to take.</p>
<pre>#!/usr/bin/env python
import urllib2
import datetime
import sqlite3
import PyRSS2Gen
import time

feed = 'http://feeds.delicious.com/v2/json/tag/%s?plain&amp;count=25'

""" Array of tags interested in. """
tags = ['python']

class CurateDB:
	""" Database wrapper of feeds """
	
	def __init__(self):
		""" Establish connection to db and setup tables if need be """
		self.conn = sqlite3.connect('curate.db',isolation_level=None,
			detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
		self.cur = self.conn.cursor()
		
		""" Create a table that maps tags to date last updated """
		self.cur.execute('CREATE TABLE IF NOT EXISTS TAGS (TAG VARCHAR(200) NOT NULL PRIMARY KEY, LAST_UPDATED TIMESTAMP)')
		self.cur.execute('CREATE TABLE IF NOT EXISTS LINKS (URL TEXT NOT NULL PRIMARY KEY, TITLE TEXT, HITS INT NOT NULL)')
	
	def get_last_updated(self, tag):
		""" Retrieve the last time a tag was updated """
		self.cur.execute('SELECT LAST_UPDATED [timestamp] FROM TAGS WHERE TAG = ?', (tag,))
		row = self.cur.fetchone()
		if row:
			return row[0]
		else:
			return None
			
	def set_last_updated(self, tag, updated=datetime.datetime.now()):
		""" Update last updated timestamp """
		if self.get_last_updated(tag):
			self.cur.execute('UPDATE TAGS SET LAST_UPDATED = ? WHERE TAG = ?', (updated, tag))
		else:
			self.cur.execute('INSERT INTO TAGS(TAG,LAST_UPDATED) VALUES (?,?)', (tag, updated))
	
	def add_post(self, post):
		""" Add a post entry if one does not already exist """
		self.cur.execute('SELECT URL FROM LINKS WHERE URL = ?', (post.link,))
		row = self.cur.fetchone()
		if row:
			print 'Already saved link %s' % post.link
			self.cur.execute('UPDATE LINKS SET HITS = HITS + 1 WHERE TITLE = ?', (post.link,))
			return False
		else:
			self.cur.execute('INSERT INTO LINKS(URL,TITLE,HITS) VALUES (?,?,1)', (post.link, post.title))
			print 'Saved link %s' % post.link
			return True
	
	def shutdown(self):
		""" Clean up resources of sqlite3 db """
		self.cur.close()
		self.conn.close()

class Post:
	""" Represents a Delicious post 
		
		link - The URL to the post
		title - Title of the post
		date - Date the post was bookmarked
	"""
	def __init__(self, item):
		""" Unmarshall a JSON encoded item """
		self.link = item['u'].replace('\\','')
		self.title = item['d'].replace('\\','')
		self.date = datetime.datetime.strptime(item['dt'], "%Y-%m-%dT%H:%M:%SZ")
		
	def is_newer(self, adate):
		if self.date &gt; adate:
			return True
		else:
			return False
	
	def __repr__(self):
		return self.link
		
def get_posts(tag):
	""" Retrieve recent posts from Delicious """
	url = feed % tag
	req = urllib2.Request(url)
	res = urllib2.urlopen(req)
	return map(Post, eval(res.read()))

db = CurateDB()
items = []

min_date = datetime.datetime.now() - datetime.timedelta(days=2)
try:	
	for tag in tags:
		print 'Finding recent posts for %s' % tag
		last_updated = db.get_last_updated(tag)
		if not last_updated:
			last_updated = min_date
		posts = get_posts(tag)
		for post in posts:
			if post.is_newer(last_updated):
				""" Only review posts after last updated """
				if db.add_post(post):
					items.append(PyRSS2Gen.RSSItem(title=post.title,link=post.link,guid=PyRSS2Gen.Guid(post.link),pubDate=post.date))
		""" Update last updated tag """
		db.set_last_updated(tag)
		""" Play nice with Delicious feeds, don't hammer the service """
		time.sleep(1)
finally:
	db.shutdown()

if len(items) &gt; 0:
	print 'Found %d new items' % len(items)
	rss = PyRSS2Gen.RSS2(title='Delicious Feed',link='http://localhost/',
		description='Recent posts from Delicious',lastBuildDate=datetime.datetime.now(),items=items)
	rss.write_xml(open('delicious.xml','w'))
</pre>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-6361910.xml</wfw:commentRss></item><item><title>2009</title><category>processing.org</category><category>python</category><category>visualization</category><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Thu, 31 Dec 2009 16:07:11 +0000</pubDate><link>http://www.jijnes.com/blog/2009/12/31/2009.html</link><guid isPermaLink="false">442154:4931149:6179769</guid><description><![CDATA[<p><span class="full-image-block ssNonEditable"><span><img src="http://farm5.static.flickr.com/4066/4231521460_3f10f8aee9.jpg?__SQUARESPACE_CACHEVERSION=1262275697951" alt="" /></span></span></p>
<p>Last day of 2009. &nbsp;I couldn't help but look back at the photographs I took this past year, most of which are of family. &nbsp;The montage was put together using Python and Processing.</p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-6179769.xml</wfw:commentRss></item><item><title>Found: Jobless Rate Visualization by NY Times</title><category>infographic</category><category>interactive</category><category>links</category><category>visualization</category><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Sun, 08 Nov 2009 14:44:06 +0000</pubDate><link>http://www.jijnes.com/blog/2009/11/8/found-jobless-rate-visualization-by-ny-times.html</link><guid isPermaLink="false">442154:4931149:5735421</guid><description><![CDATA[<p><span class="full-image-block ssNonEditable"><span><img src="http://farm3.static.flickr.com/2545/4086189678_012b29fe74.jpg?__SQUARESPACE_CACHEVERSION=1257691663491" alt="" /></span></span></p>
<p>&nbsp;</p>
<p><a href="http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html?hp">NY Times</a> has created a interactive visualization of the jobless rate and it's impact on the population, allowing you to filter by demographic. &nbsp;It's definitely eye opening to see how much harder some of the US workforce has been hit compared to others.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-5735421.xml</wfw:commentRss></item><item><title>Gravity</title><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Mon, 28 Sep 2009 08:15:09 +0000</pubDate><link>http://www.jijnes.com/blog/2009/9/28/gravity.html</link><guid isPermaLink="false">442154:4931149:5427058</guid><description><![CDATA[<p></p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-5427058.xml</wfw:commentRss></item><item><title>Flexing My Processing Muscles</title><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Mon, 21 Sep 2009 16:06:52 +0000</pubDate><link>http://www.jijnes.com/blog/2009/9/21/flexing-my-processing-muscles.html</link><guid isPermaLink="false">442154:4931149:5427059</guid><description><![CDATA[<p></p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-5427059.xml</wfw:commentRss></item><item><title>Nintendo Thumb</title><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Wed, 07 Jan 2009 09:18:01 +0000</pubDate><link>http://www.jijnes.com/blog/2009/1/7/nintendo-thumb.html</link><guid isPermaLink="false">442154:4931149:5427070</guid><description><![CDATA[<p></p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-5427070.xml</wfw:commentRss></item><item><title>NY Times Visualization Lab</title><category>design</category><category>links</category><category>technology</category><category>tools</category><category>visualization</category><dc:creator>JIjnes</dc:creator><pubDate>Mon, 27 Oct 2008 18:46:53 +0000</pubDate><link>http://www.jijnes.com/blog/2008/10/27/ny-times-visualization-lab.html</link><guid isPermaLink="false">442154:4931149:5427081</guid><description><![CDATA[<p></p>]]></description><wfw:commentRss>http://www.jijnes.com/blog/rss-comments-entry-5427081.xml</wfw:commentRss></item></channel></rss>