Friday, April 29, 2011

Diving into R

I've wanted to learn R for a long time. A new project at work is providing an ideal opportunity to finally use it. So far, it's been a great experience. R is an incredibly powerful tool for data analysis. It's allowed to me dive deep into the project's data and automate much of the analysis process.

Programming in R has been easier than expected. I've previously programmed in Matlab which has helped greatly. Some of the concepts are still foreign but I'm confident that they will become less so with time.

The greatest joy has been getting "lost" for hours writing R functions to analyze the data and produce reports. R's interactive interface has made it easy to build up code in an exploratory manner. This is my preferred programming methodology that, I find, allows me to stay in a flow state for long periods of time. The experience has been very similar to programming in Lisp dialects which I also deeply enjoy.

Although there is a lot of good information about R available for free on the web, I've found the following O'Reilly books the best resource for coming up to speed quickly,

A particularly powerful library is ggplot2 by Hadley Wickham. With it, I've been able to create very complex graphs and charts with minimal code. ggplot2 uses a grammar to create graphics in layers that, at first, can be challenging to learn. The website is informative but the book has been the best resource and well worth the money.

Another useful library is brew which I am using to auto-generate pleasant looking reports in PDF via LaTex.

I look forward to working more with R. Data science is a growing interest of mine and this opportunity to use R is adding to the momentum.

Monday, April 11, 2011

Book Review: Final Jeopardy

Final Jeopardy: Man vs Machine and the Quest to Know Everything by Stephen Baker

I found the Watson exhibition very exciting. I was therefore eager to read Baker's new book, Final Jeopardy, that accounts the inception of IBM's Jeopardy Grand Challenge and the software team that completed it by creating Watson. Although light on technical details, the book provides a good overview of the primary challenges. It also discusses the non-technical issues that the Watson and Jeopardy teams struggled with in staging the man-machine competition. Overall, a very good and enjoyable book. If you enjoyed Baker's Numerati, you'll probably enjoy this book too.

The next challenge is to create a computer that can write Jeopardy questions rather than just answering them.

Monday, April 4, 2011

Seymour Cray Videos

I've long admired Seymour Cray as the genius behind early super computers such as the CDC6600, Cray-1, and later Cray systems. However, I know little about Cray himself. So, I was happy to discover two YouTube videos of Cray speaking about his career and systems.

In this 1976 talk, Cray describes the design of the Cray-1. Among other topics, he describes the factors that gave rise to the Cray-1's iconic shape.

Thirteen years later, Cray discusses the design of the Cray-3 and Cray-4 systems in this talk and his decision to use Gallium Arsenide, then a leading edge material. I wasn't aware of the three dimensional modules used in the Cray-3. Cool stuff.

I enjoyed both talks. Cray was much more personable than I expected. He was very humble and claimed ignorance in a number of areas related to computing. It was refreshing to see someone of Cray's caliber display these characteristics.

It was amusing to see that the fundamental problems of building computing systems have remained the same for decades: speed, size, and power. The more things change, the more they stay the same.