Zooomr Problems Resolved

February 20th, 2008

Looks like images are now being served up by zooomr on my blog. I’m pretty happy about that!

Zooomr.com Problems

February 6th, 2008

I host the images for this blog on zooomr.com and you may have noticed that they aren’t displaying properly. The reason for that is zooomr.com is doing an update and some of the hardware that is used to host images isn’t fully operational yet. Hopefully the images will be back up soon.

When you use active_record_store instead of the cookie-based default, you need to uncomment the line in controllers/application.rb that says protect_from_forgery :secret => ‘blah’. This makes sure all your HTML and JavaScript requests are coming from your web application. It essentially protects you from something called “Cross-site request forgery” by embedding a token into your web forms.

As a side note, it’s really not giving you much security at all, but that might be better left for another blog post.

I was banging my head against the wall yesterday trying to figure out why a custom Ajax.Updater wasn’t working and I kept getting an ActionController::InvalidAuthenticityToken exception. I decided to dig into the request_forgery_protection.rb file in actionpack-*/lib/action_controller and found that for custom requests, you need to include the authenticity_token yourself by taking advantage of the form_authenticity_token helper. When building the updater’s request url I just added “&authenticity_token=<%= form_authenticity_token %>” to the end and everything was fine.

Another way would be to not use the forgery protection at all for that action by including this in your controller: protect_from_forgery :except => :updater

You can also completely remove forgery protection from a controller by doing skip_before_filter :verify_authenticity_token

Back to the vulnerability of your web forms: I imagine this does protect your web application from someone hosting a form on their site that posts to your site. However, if someone really wants to spam some stuff they’ll scrape your page with cookies enabled on their scraping software, scrape your form’s fields (which include the authenticity_token hidden field) and POST to their heart’s desire.

I really wanted to finish this article last week, but I just didn’t get around to it. This post is going to be on the things you can do with the Python Imaging Library when you implement your own kernels. Remember that depending on the kernel arguments you supply, you may get radically different results. The results may not even be a smoothed image; they could be embossed images, edge maps, etc. With that out of the way, let’s get started.

There are a couple different ways you can implement your own image filters using PIL. The first way is easy for one-off filters that you probably won’t use throughout your code. I mean, you could if you wanted to keep writing the same line of code over and over, but that’s up to you. The way this is done is by making use of PIL’s ImageFilter.Kernel class. You can create an object from this class that has all the common image filter arguments (size, kernel, scale, and offset).

That’s the quick and dirty way to implement your own image filters. It uses the same filter arguments as ImageFilter.SMOOTH so it yields an identical result.

The second way to implement your own image filters is to do it just like PIL does. For each image filter class that PIL provides, you’ll see that they inherit from the ImageFilter.BuiltinFilter class which in turn inherits from the ImageFilter.Kernel class. The stock implementation for ImageFilter.SMOOTH is as follows:

Where filterargs is the size of the kernel (3, 3), the scale factor (13), the offset (0), and the kernel itself. Now since the stock PIL filters are boring and I don’t want this post to be about how the stock filters are implemented, I implemented my own image filter.

The result of running this filter on my picture is this:

Pretty cool, eh? I highly encourage you to implement your own filters and dig into the stock PIL image filters to see how they achieve certain effects. With your own image filters and the stock PIL image filters at your command, you may never have to write your own image processing algorithms at all.

Smoothing an image is helpful for a few different reasons. The big one that I’ve been using image smoothing for is to remove noise from an image. Laplacian edge detection is highly susceptible to noise due to the Laplacian operator being a second derivative operator. The Python Imaging Library (PIL) provides a few different ways to produce smooth images. The most obvious would be to use the ImageFilter.SMOOTH class.

The image I’ll be working on throughout this post is this greyscale picture of me:

Standard smoothing in PIL is super easy. For example:

Using the image of me above, here’s the result of this smoothing operation:

That’s a pretty good result for just using stock PIL image filters. The Python Imaging Library also offers a SMOOTH_MORE filter. Replacing the ImageFilter.SMOOTH above with ImageFilter.SMOOTH_MORE, we get:

Digging into the PIL source gives a really good indication of how it produces these results. For example, the BuiltinFilter classes (such as ImageFilter.SMOOTH) use filter arguments to produce different results. These filter arguments are a size tuple, which is the width and height of the kernel, the convolution kernel itself as a sequence containing weighted values, the scale which is used to divide the result of each pixel, and finally the offset which is added to the result after it has been divided by the scale factor.

For ImageFilter.SMOOTH, these filter arguments are:

I’m going to present a way to implement image smoothing so you can have a better idea of what’s really going on when you smooth an image. One small note: the scale part of the process defaults to the sum of the weights in the kernel. So if it isn’t present, you can calculate the default scale by doing this:

With that out of the way, here’s an implementation of image smoothing that is equivalent to calling ImageFilter.SMOOTH like the code above:

Notice how it loops through 1 <= y < height-1 and 1 <= x < width? That's because doing this processing through the entire image produces weird borders. The code above compensates for this and eliminates the dark borders by copying the original border pixels to the borders of the output image.

So what happens when I run the code above on the original image? Check it out:

Being able to code this kind of stuff is really cool, but for these really basic examples of standard smoothing, using PIL’s built-in filters is definitely the way to go. You’re also not limited to using PIL’s built-in filters. If you require different kernels and even different scaling and offset attributes, PIL provides ways for you to do that.

I was hoping to cover the ways you can implement your own filters in PIL, but it’s getting late so I will try covering them tomorrow if I have time.

Basic Edge Detection in Python

December 5th, 2007

Detecting edges in images is being actively researched for many different applications. The most notable of these applications is computer vision. The reason I began studying edge detection algorithms, aside from them being really cool, is that I’ve been noticing that I can use edge detection as part of my toolbox for optical character recognition.

So, how do we define edges in any given image? Edges are really just areas where the pixels intensities contrast. Basically where you have a bunch of light pixels touching a bunch of dark pixels. There are a couple different methods used for detecting edges: gradient and Laplacian. I’m going to be covering a basic gradient edge detection technique and will cover Laplacian techniques in future posts.

Gradient edge detection approximates the first derivative of the image, looking for minimum and maximum intensities in the magnitude of the gradient. Locating edge pixels can be done by setting a threshold of some value and testing if the gradient is greater than that threshold.

The gradient of the image function I is given by the vector:

ߜ I = [∂I / ∂x, ∂I / ∂y]

To approximate the first derivative of the image, we use convolution masks. The method I’m going to present is the Prewitt method. It uses two masks to approximate ∂I / ∂x and ∂I / ∂y, giving us a gradient of the image’s pixels. ∂I / ∂x and ∂I / ∂y detect vertical and horizontal edges, respectively. The masks that define ∂I / ∂x and ∂I / ∂y for the Prewitt operator are:

∂I / ∂x:
[-1, 0, 1]
[-1, 0, 1]
[-1, 0, 1]

∂I / ∂y:
[1, 1, 1]
[0, 0, 0]
[-1, -1, -1]

The resulting outputs of convolving the image with these masks are then added to get the magnitude of the gradient. The magnitude of the gradient is given by:

|G| = sqrt(Gx2 + Gy2)


To approximate the magnitude of the gradient, we use:

|G| = |Gx| + |Gy|

After getting the magnitude of the gradient, we want to check if it’s larger than our threshold. All the methods I’ve seen use a threshold of 255. What this means is that when the magnitude of the gradient is larger than 255, we’ve found an edge. We cap the magnitude to 255 if it’s larger than 255 and mark the pixel in the output image as a 0, which is black. This is done implicitly by setting the pixel value to 255 - magnitude, meaning if the magnitude is 255, the pixel value is black. Magnitudes of 0 will set the pixel to 255 - 0, which is white. The magnitudes can be any value between 0 and 255, inclusive.

The Prewitt masks in Python are given by the function get_prewitt_masks():

Now on to the meat of the entire operation. The prewitt() function takes a 1-d array of pixels and the width and height of the input image. It returns a greyscale edge map image.

You can store this code all in one file so when you run it, you can pass the program arguments for the input and output image filenames on the command line. To do so, add this code to the Python file with the edge detection code from earlier:

I called my file prewitt.py, so with all that code in the same file, you can call it from the command line:

$ python prewitt.py input_image.gif output_image.gif

Note that it will work for pretty much any image type you give it. Here are some results of me running the code above:

I suppose that concludes this article on basic edge detection using the Prewitt method. Hope you enjoyed reading it as much as I enjoyed writing it!

Python Meta Programming: Update

November 29th, 2007

I thought I was being really clever with the meta programming from the previous post but I ran into some roadblocks when using similar techniques in live code. I pretty much scrapped the meta programming stuff for now until I can do some more research and develop something solid. At least the problems showed themselves early on before I really became attached to the code. Here’s to failing early and often! ;)

Python Meta Programming

November 27th, 2007

For an ORM (Object Relational Mapper) I’m working on, I was trying to figure out how I can make it connect to a database without manually calling any functions. Using MySQLdb, you connect to a database by calling MySQLdb.connect(). I wanted this to happen automatically so I’m not always calling MySQLdb.connect() or the equivalent from my own ORM.

The technique I used was to initialize the class when it’s first defined. In that __classinit__ method I set up the database connection. The metaclass I used is pretty straightforward:

The class above is used as the __metaclass__ of another class which has a __classinit__ function that sets up the database connection.

So now you can create your model classes that inherit from MySQLRecord and it will create a database connection if one doesn’t already exist.

I highly encourage anyone interested in python metaprogramming to rip apart these techniques and others to get a better feel of what’s going on under the hood. A good place to start would be to set up your own metaclasses and print out the various variables and follow what happens in what order. I’ll be doing follow up posts on python metaprogramming that will clear up any fuzzy details.

Image processing and specifically OCR (Optical Character Recognition) has become an obsession of mine lately. A lot of research is being done in OCR for handwriting, digitizing books, cursive writing, and even CAPTCHA cracking. For those of you who may not know what a CAPTCHA is, it stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It’s those little images with letters and numbers in them that are used when registering on websites and even posting comments to blogs and forums.

The idea is that using a CAPTCHA will prevent computer programs from automatically registering or submitting comments on a given website. Breaking a CAPTCHA by using OCR renders these systems irrelevant. It’s definitely a game of cat and mouse. When a CAPTCHA is cracked, the intelligent thing to do is replace it with a stronger CAPTCHA.

I’m kind of reluctant to post very much information on how to crack CAPTCHAs and I’m sure it’s obvious why. I probably won’t be posting full source code for any given CAPTCHA and the code I do give out will either be crippled or just be snippets of a larger OCR program. The techniques used for cracking CAPTCHAs are really just image processing algorithms that have been applied for this specific use.

In the future I will be posting techniques on how to crack specific CAPTCHAs. For example, in my next article I’ll present algorithms for cracking the CAPTCHA at Bumpzee.com. Generally, if I post an article on how to crack a specific CAPTCHA it will probably be a site that isn’t worth spamming.

Each CAPTCHA is unique and the techniques used to crack a specific CAPTCHA have to be altered slightly, but generally all CAPTCHAs are cracked using similar techniques. For example, you read the image into memory, eliminate any noise, separate each character into its own image, then perform some kind of pixel matching to determine what each character is. With most CAPTCHAs in the wild today you can train your OCR software to recognize characters by doing pixel matching against each letter in the CAPTCHA.

This approach is really brute force and doesn’t work very well on the more advanced CAPTCHAs. For now I will be focusing on the brute force pixel matching techniques and maybe in later posts I will go into advanced techniques.

Using Python and PIL (Python Imaging Library), loading a CAPTCHA (or any image) is as simple as:

Sometimes a CAPTCHA will have noise in the background. Since each site’s CAPTCHA is unique, you have to come up with techniques to eliminate that noise. One of my favorite techniques is to convert the CAPTCHA to a greyscale image:

I like to use (.4, .4, .4, 0) for my conversion matrix when converting from ‘RGB’ to ‘L’ (greyscale). Past experience has shown this to be a decent conversion matrix but like I said earlier, all CAPTCHAs are different and some might not do well with that conversion matrix. You may even be able to get away without using a conversion matrix at all.

After converting the CAPTCHA to greyscale, another technique I use is to eliminate pixels that aren’t part of the letters. A lot of times this means the letters have darker shades of grey in them and the background noise has lighter shades of grey. You can determine which pixels to eliminate by trial and error. PIL provides a method that will give you all the colors in an image:

Using the output of getcolors() and modifying pixels until you determine the best colors to eliminate is all trial and error. Here’s a function you can use to play with for eliminating lighter-colored pixels:

The function is straightforward: iterate through each pixel, check if its color is greater than 140 and set it to white if the check passes. The idea is that this eliminates the lighter background noise while leaving the darker character pixels.

After eliminating the basic noise, there’s another thing I like to do called ’skeletonization’. There are a few different ways of achieving similar, but different, results. To put it plainly, skeletonization is a technique that takes an image and reduces the amount of edge pixels there are. For some CAPTCHAs it’s good enough to check surrounding pixels and eliminate them if there are too many white pixels surrounding a dark pixel. Another skeletonization technique is more advanced and is used for trimming edges to one-pixel widths in some cases. The skeletonization technique I’m going to cover here is the simpler version for getting rid of some noise in the CAPTCHA.

Now that the CAPTCHA is clean and noise is removed, the next step is to separate the characters from the CAPTCHA. There are a bunch of techniques for splitting a CAPTCHA into its letters. One that I’ve seen and even used is very brute force. The algorithm iterates over the CAPTCHA’s pixels and looks for non-white pixels. When it finds one, it records the x,y coordinates. It also stores values for the min and max x,y coordinates. Those coordinates allow you to crop the CAPTCHA and pull out the letter. The way it determines a letter’s bounding box is by finding a column that only has white pixels. A column that has zero black pixels indicates that there are no letter pixels in them and the letter’s bounding box is complete. This brute-force approach is problematic when a CAPTCHA has letters that have the same X coordinates with different Y coordinates. As you can imagine, using this algorithm to split a CAPTCHA’s letters will result in pulling two or more letters if the X coordinates of the letters is the same.

I’ll cover the brute force algorithm for now and in a later post I will go over the more elegant flood-fill algorithm that doesn’t fail on overlapping X coordinates.

The function above iterates over all the pixels in the CAPTCHA looking for pixels that aren’t white. If it’s the first non-white pixel found, record that pixel’s X coordinate in firstX. It also sets the initial value for lastX. It then checks the minimums and maximums for the top and bottom Y coordinates and the lastX coordinate. It then overwrites the variables with new values if necessary.

As long as there is a black pixel in each column, we know we’re looking at a letter in the CAPTCHA, so we only crop the CAPTCHA when we hit a column without any non-white pixels. Those bounding box variables (firstX, topY, lastX, bottomY) now come into play when setting up a crop box for the CAPTCHA.

Append this cropped image (a letter) to the letters list, reset the algorithm’s bounding box variables and resume scanning the CAPTCHA for more letters.

The final step in brute force CAPTCHA cracking is pixel matching. I’ll be exploring more advanced methods of OCRing CAPTCHAs, but for now the simplest method is doing a pixel-by-pixel match.

There is one thing I’ve left out until this point: OCR software has to be trained. For example, when you first run a CAPTCHA cracker you have to tell it which characters it’s reading. You basically have to solve CAPTCHAs for all letters and numbers until the OCR can successfully match a significant portion of all CAPTCHAs on a site. Training it is just a matter of letting the OCR software split the CAPTCHA into letters and then you manually input which letter it is. The software then saves that letter either in a directory named after the letter you input or in some other way that it’s easily identified as being the correct letter.

This is where the pixel matching comes into play. It splits the live CAPTCHA into its letters, iterates over all saved letters that you ‘trained’ the software with, and then finds the best match by counting the number of pixels that are matched. Since it knows where the letter came from, such as a directory, it knows that the directory name of the best-matched letter is the correct value for that character.

For now I’ll leave out this pixel matching function and I may post it at a later date.