Sunday, September 30, 2007

PPM File Format

Often, I keep generating images; to make my research work's output more easy to understand. It is rightly said 'a picture speaks more than 1000 words'. In my case, a picture speaks more than 1,000,000 numbers. My work dwells in the area of Data Mining and I generate very large numerical matrices as output. I use images to view the matrix and interpret them better.

My code usually ends with an image writing function, that takes the matrix as input and generates an image. To generate a very primitive image format, I use PPM file format. PPM does not have any compression or optimization to save space. But for sake of easy coding, I generate PPM files and later use 'convert' command in Linux to change it to JPEG format.

I often forget the PPM file format, despite having coded a couple of times to save PPM files. So, I thought I would write it down here to serve myself and others who want to know the PPM file format.

Here's the order in which we need to put in the information - to generate a good ppm image.

1. A "magic number" for identifying the file type. A magic number in a ppm image is a two character code "P*" (i.e. P followed by a digit like P1, P3 and P6). P1, P3 are for mono-color images' code and P6 is for many-colored image.
2. Whitespace (blanks, TABs, Carriage returns etc.).
3. Width of the image, formatted as ASCII characters in decimal.
4. Whitespace.
5. Height of the image, again in ASCII decimal.
6. Whitespace.
7. The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero. In case of mono-colored or gray scale images, there is only value for each pixel. In case of many-colored images, for each pixel three values (for red, green and blue components of color) are to be written into the file.
8. Newline or other single whitespace character.
9. A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. For many-colored image, each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented in pure binary by either 1 or 2 bytes. If the Maxval is less than 256, it is 1 byte. Otherwise, it is 2 bytes. The most significant byte is first.

To all ye IP (image processing) and Graphics experts, let me know if there are any corrections.

Now that I brushed up PPM format, I finished coding the image writing part and now...

I mine. :-)