Thursday, December 31, 2009

Recovering Deleted JPEGs from a FAT File System - Part 6

Part 6 in a series of posts on recovering deleted JPEG files from a FAT file system.

In this post we'll tackle the last major FAT file system data structure - the file allocation table. Compared to the previous post on the root directory, this post is much less complex. Follow the "read more" link for an overview of the file allocation table and the code required to process it.

Tuesday, December 29, 2009

Book Review: The Planiverse

The Planiverse: Computer Contact with a Two Dimensional World by A.K. Dewdney

Written in 1984, this is a fictional story of a computer science professor and his students making contact with a two-dimensional universe - The Planiverse - through a simulation program developed as a class project. More specifically, a telepathic connection is established with one of the universe's inhabitants called YNDRD otherwise known as "Yendred". The professor and students accompany Yendred on a journey and in the process discover the workings of the two-dimensional world.

Essentially, the book can be decomposed into two concurrent streams:

  • The story of Yendred's journey across his world in search of a mystic - Drabk - said to possess "knowledge of the beyond".
  • A series of expositions on the scientific, technological, and social aspects of Yendred's two-dimensional universe.

Basically, Yendred's journey provides the context and segues for the expositions. This structure reflects the book's derivation from two prior publications - the monograph Two-Dimensional Science and Technology and book Symposium on Two-Dimensional Science and Technology. Material from these prior works formed the basis for the expositions around which the fictional Yendred plot was wrapped to create a coherent, entertaining, and intellectually stimulating story.

Although I thought the Yendred story was only "OK", I really enjoyed the expositions - clearly a lot of thought was given to how a two dimensional world would work. For example, some of the scientific exposition topics were:

  • Thermodynamics and the extremely low melting point that two-dimensions lead to.
  • The mechanics of two-dimensional turbulence.
  • Electromagnetism and the lack of magnetic forces.
  • The "Global" weather patterns of a two-dimensional "planet"
  • Energy diminishing according to the inverse of distance (instead of the inverse-square as in three dimensions).
  • The biological structure of two-dimensional organisms.
  • Two-dimensional chemistry and the limited number of possible elements.

The technical expositions discussed the design of two-dimensional objects such as:

  • Buildings that avoided obstructing surface travel.
  • Doors that allowed privacy and structural integrity while remaining passable.
  • Functional steam engines, foundries, space rockets, and fishing boats.
  • Balloons for travel and the shipment of freight.
  • Gears capable of varying torque in the absence of axles.
  • Subterranean elevators.
  • Electronic circuits suitable for constructing rudimentary computers.
  • A wind powered generator for recharging batteries - the only viable source of electricity as wires create impassable obstacles.

The social expositions discussed topics such as:

  • How the world's inhabitants "passed" each other while traveling (they walk over each other according to social conventions).
  • How traveler's deal with surface events like a sports game (they wait until it's done).
  • Two dimensional warfare and its influence on politics.

In summary, if you enjoy thinking differently about science, and technology then I think you'll enjoy reading this book. The ability to think about problems from different perspectives is an important and powerful skill - I think reading books like The Planiverse is one way to help develop that skill.

Thursday, December 24, 2009

Recovering Deleted JPEGs from a FAT File System - Part 5

Part 5 in a series of posts on recovering deleted JPEG files from a FAT file system.

UPDATE: 2010/1/28: it's come to my attention that the long file name support code described in this post is broken - it only works for files that utilize a single LFN entry. For the time being I plan to work around this bug for the FATRecover series but I do intend to post a fix eventually. I'll be sure to add another update with a link to the fix when it is available.

The topic of this installment is the root directory and the code required to process it. Although strictly speaking the file allocation table is the next major on-disk area, I think covering the root directory first will make the overall discussion clearer.

Note that this is a watershed post as the "hack" level of the example code increases dramatically. This was unfortunate but necessary to accommodate certain FAT functionality while limiting the discussion's length. That said, readers with sufficient skill should be able to use the example code to create more robust implementations.

Follow the "read more" link for a detailed description of how to read and interpret the root directory.

Monday, December 21, 2009

Oh Christmas Tree, Oh Christmas Tree

The other day my son, age four, noticed that a string of lights were out on our Christmas tree. Last year, we switched to the new LED lights so I wasn't sure how long they should last or how to fix them.

I assumed that, like incandescent tree lights, a single failing LED would prevent the others from lighting. Unfortunately, we discarded the original box and therefore did not have any replacement bulbs (assuming that they came with some).

I picked up a new set of lights but apparently all "white" LEDs are not exactly the same - the new set had a bluer tint that didn't match the remaining sets on the tree. Oh bother, suddenly this became a "project". With the beauty of our tree at stake failure was not an option.

A Google search led me to this extremely informative web page. It turns out these LED lights use a much simpler circuit than I first imagined. No rectifiers, no smoothing capacitors - just 30 LEDs in series with a current limiting resistor. That explains the blinking that I sometimes find irritating.

My plan was to replace the faulting LED with one from the new set. First, however, I had to find it. I used a 9V battery and resistor to build a test rig, removed each LED, and tested it. Eventually, I found the broken one and of course it was the first one in the string that had the current limiting resistor soldered to it. So much for a quick replacement.

After a quick trip to the workshop to solder the resistor onto the new LED, we once again had a working set of really-white lights - except for one bluish one. Oh well, good enough is good enough.

An unexpected bonus from having one bluish LED is that my son now considers it a game to "find" the different one - it doesn't move but that doesn't appear to matter to a four year old :-).

Saturday, December 19, 2009

Recovering Deleted JPEGs from a FAT File System - Part 4

Part 4 in a series of posts on recovering deleted JPEG files from a FAT file system.

Now that we've created a test data set and covered the high-level structure of a FAT file system, it's time to get our hands dirty with some code. Enough talk, more action!!!

Follow the "read more" link for a detailed description of how to read and interpret a FAT file system boot sector.

Tuesday, December 15, 2009

An Incredible Library

I have never been more jealous in my life than I am of Jay Walker's personal library.

If I had copious amounts of money, this is what I would do with it - books, interesting gadgets, and lots of wood. I don't know if I would ever leave the house :-).

Even just working in such an aesthetic environment would be wonderful.

Recovering Deleted JPEGs from a FAT File System - Part 3

Part 3 in a series of posts on recovering deleted JPEG files from a FAT file system.

The FAT file system was invented in 1980 by Tim Paterson while developing 86-DOS, the precursor of MS-DOS. Since its creation, FAT has gone through a number of evolutions to accommodate growing disk sizes, and provide more capabilities. Although FAT is nearly 30 years old, it remains in common use due to its ubiquitous OS support and simplicity - both important factors for embedded consumer devices like digital cameras.

FAT's history is complex, a retelling could consume an entire series of posts itself. For this project, it is sufficient to know that there are three main variants - FAT12, FAT16, and FAT32 - each successively supporting larger maximum disk sizes. Generally speaking, the on-disk format of all three variants is very similar, so much so that code developed to grok one format can, with reasonable effort, be hacked to grok another1. In this project, we'll be working with a FAT16 file system which hdiutil can be explicitly requested to create by using the argument -fs "MS-DOS FAT16"2.

The following diagram depicts the high-level, on-disk structure of a FAT16 file system (not drawn to scale).

+---------+-------+--------+-------+---------------------+
|   BOOT  | RESV  |   FAT  | ROOT  |  DATA ............. |
|  SECTOR | (OPT) |        | DIR   |  AREA ............. |
+---------+-------+--------+-------+---------------------+
0                                                        N

In computer storage, capacity is divided into fixed sized units called sectors - typically 512 bytes long3. The sector is the fundamental unit for addressing storage and smallest amount of data that can be transferred to/from a disk drive. While older disk drives used an awkward addressing scheme based on physical geometry, newer drives use a linear series of sector addresses.

The first sector (offset 0) of a FAT16 file system contains the BOOT SECTOR which provides critical information about the file system's organization. Optionally, a number of sectors may be reserved after the boot sector (RESV).

The next major data structure is the File Allocation Table (FAT) from which the file system gets its name. The FAT is essentially an array of 16bit elements 4 each representing a fixed size portion of the DATA AREA. To minimize the FAT's relative size, each element represents a cluster5, a power-of-two multiple number of sectors. In addition to tracking the allocation status of each DATA AREA cluster, FAT elements are also used to store pointers forming linked-lists describing the on-disk location of files and sub-directories spanning multiple clusters.

After the FAT comes the root directory area (ROOT DIR) used to hold the information describing the files and sub-directories at the top of the file system namespace tree. In FAT12 and FAT16, the root directory area is a fixed size which limits the numbers of files and sub-directories that can be stored in it.

The remainder of the disk capacity (DATA AREA) is essentially a heap allocated as needed to store file and sub-directory data. As mentioned previously, use of the DATA AREA is managed by the FAT.

In the next few posts, I'll describe each of these major areas in greater detail and present the code needed to interpret them. By the end, we should essentially have a read-only FAT file system implementation that will serve as the basis for the remainder of the project.

Footnotes:

1 Here I am ignoring many, many subtle differences and complications. A thorough discussion of the technical details for all of FAT's variants is outside the scope of this series. The curious reader is recommended to read the bountiful information available via a Google search. Brian Carrier's book, File System Forensic Analysis, is another valuable resource (which sadly wasn't available when I first did this project in 2004).

2 Note that FAT16 file systems have a minimum supported size - usually 16MB but some tools may allow smaller sizes.

3 Some enterprise level disk drives use a 520 byte sector - the additional 8 bytes are often used to store error checking and recovery information for RAID systems.

4 The size of the FAT elements is one of the principal differences between the file system variants. FAT12 uses 12bit entries while FAT32 uses 32bit entries. The FAT element size is one of the factors that determines each variant's maximum supported disk size.

5 Other file systems may use the term block instead of cluster.

Monday, December 14, 2009

Recovering Deleted JPEGs from a FAT File System - Part 2

Part 2 in a series of posts on recovering deleted JPEG files from a FAT file system.

Whenever I develop an analysis program I first create a dataset to test and experiment against. In part 1, I mentioned that I used my digital camera to create a test data set for the original project in 2004. For this series of posts, I thought it would be better to use a data set reproducible by others. Two things are required to accomplish this goal:

A Google search for test images led me to this wikipedia page from which I selected the University of Southern California Signal & Image Processing Institute's data set - specifically the miscellaneous corpus. This collection consists of 44 TIFF images of various resolutions ranging in size from 64KB to 1MB (17MB in total). Since JPEG images are needed for this project, I used the following ImageMagick command to do the conversion.

mogrify -format jpg -quality 90 *.tiff

After the conversion, the file sizes ranged from 4.8KB to 350KB (3.4MB in total) - a conveniently sized collection for this project.

In the 2004 project, I used the UNIX dd command to make a disk image of my camera's memory card. However for this project, I wanted a way to create disk images without requiring a physical device. One option considered was to use Linux loop devices but I wanted to use my machine-of-choice, an Apple MacBookPro, for this project - an artificial but important constraint since this is for fun.

After some research, I discovered that Apple's hdiutil utility can create a suitable image using the following command:

hdiutil create \
        -fs "MS-DOS" \
        -megabytes <SIZE> \
        -layout NONE \
        -volname <VOLNAME> <IMAGENAME>

Executing this command creates the file <IMAGENAME>.dmg that is an image of a FAT file system called <VOLNAME> with a total capacity of <SIZE> megabytes. The remaining argument, -layout NONE, prevents a partition table from being added to the image - an unnecessary complication for this project.

The following example illustrates the OS X terminal commands needed to create a 16MB disk image with a single JPEG file on it1

$ hdiutil create \
          -fs "MS-DOS" \
          -megabytes 16 \
          -layout NONE \
          -volname FRTEST1 frtest1
..................................................
created: /<PATH>/frtest1.dmg

$ hdiutil attach frtest1.dmg 
/dev/disk1             /Volumes/FRTEST1

$ cp images/4.2.04.jpg //Volumes/FRTEST1//

$ hdiutil detach //Volumes/FRTEST1//

Using the selected image corpus and above hdituil terminal commands, a wide variety of test disk images can be created programmatically via shell scripts - just what is needed for this project.

In the next post, we'll start down the path of grokking a FAT file system layout.

Footnotes:

1 Minor changes were made to the example output to remove machine specific information (i.e. paths) and fit it within the limited size of the post area.

Friday, December 11, 2009

Recovering Deleted JPEGs from a FAT File System - Part 1

Part 1 in a series of posts on recovering deleted JPEG files from a FAT file system.

In 2004, my in-laws returned from a vacation and had all of the pictures accidentally deleted from their digital camera. When I heard the news my first thought was that the data may still be there!

Often, deleting a file only erases its associated file system metadata (i.e. name, owner, etc) - the file's data is left unchanged. As a result, a deleted file can sometimes be recovered by finding the unchanged data fragments and recombining them in the correct order. I reasoned that if their camera took this metadata-only approach then there was a good chance that some of the pictures could be recovered.

Unfortunately, my in-laws lived in another state so I couldn't examine their camera's memory card right away. I suggested that they remove the card from the camera and bring it with them on their next visit so that I could analyze it then.

My initial plan was to try one of the many file recovery tools that already existed. However, I soon began to wonder how hard it would be to write a program to recover the pictures myself. Finding the challenge exciting, I immediately began spending all of my spare time coding up a recovery program.

After one week, and a lot of caffeine, I had a C program capable of recovering deleted JPEG files from a FAT file system - the typical file system used by digital cameras. Using my own camera as a test bed, the program could reliably recover pictures after performing an "erase all" operation. Cool!

Unfortunately, the story didn't end as well for my in-laws. When I finally received their memory card I found that all of the file system data blocks had the value 0xFF - a clear sign that the entire FLASH memory had been erased1. The data was gone.

Despite the unfortunate ending, this project was one of my favorite spare-time hacks. So much so in fact that I thought it would be fun to recreate it as a series of blog posts. Over the next few weeks I plan to write a series of posts under the label "FATrecover" describing how to develop such a program. By the end of the series, hopefully anyone with sufficient coding experience will be able to write their own FAT file system recovery program.

Footnotes:

1 FLASH memory, being a form of EEPROM, cannot be re-written directly. Instead, the FLASH memory cells must first be returned to an unwritten state - an operation typically called an "erase". After being erased, the affected memory cells have a value of 0xFF.

Tuesday, December 1, 2009

Behind the Code: Jim Gray

One of my engineering role models is the late Jim Gray. I happened across a Microsoft Channel9 interview with him and decided to watch it.

I most appreciated Jim's comments on the following three points:

  • The importance of first principles
  • The "risk" that research presents to established products
  • The role writing played in his getting recognition for what was actually joint work with others

One of the most influential papers that I've read was Jim's (and Shenoy's) Rules of Thumb in Data Engineering. This relatively short paper is packed with insights derived from simple trend data and the application of first principles. Since that time I've tried to rely on first-principles while researching new technologies and communicating the results. I've found this approach to be both highly effective and differentiating as, unfortunately, the practice doesn't seem to be as common as it should be.

Although my current position doesn't compare to the one Jim held at Microsoft, I do run a small research team inside of a large company. As a result, I found his comments on this topic interesting. In particular, I was pleased to hear that I have come to a point-of-view in common with Jim's - research innovations represent increased risk to an established line of business that is trying to minimize risk. It's a classic example of the "change vs. control" conflict that Kotter describes in his seminal paper, What Leaders Really Do. In realizing this, I have come to the conclusion that it is my group's responsibility to reduce the risk of research innovations to a level that allows the main-line managers to predict with confidence the cost/time/effort required to bring the new product/feature to market. As a result, creating new technologies is not sufficient - my team must also make those technologies consumable to an existing line of business. This has been perhaps one of the most important realizations of my career and it was encouraging to hear Jim make similar comments.

Towards the end of the interview, Jim made it a point to say that much of "his" work was actually done in collaboration with others. However, much of the credit has been given to Jim because he took the additional effort to document the work in the form of papers, articles, books, etc. This statement matches my own observations of authors/presenters earning outsized credit and highlights the importance of communicating effectively in regards to career growth.