Posts Tagged With “python”

Samson: Lightweight pycurl Wrapper

Comment

Samson is a little project I started a long time ago. I decided to refocus on it and put it on github. Some of its uses include: automating APIs, automating web forms, and webpage scraping. It basically hides away the pycurl stuff and provides a simple interface for achieving things you'd normally have to hand code with pycurl.

Last night I updated it to version 0.2 and pushed the result to github. Version 0.2 can also be downloaded here: samson-0.2.tar.gz. If you just want to install the package, easy_install samson should work nicely for you.

Tagged with: samson, python, pycurl

Introducing Blueberry Web Framework

Comment

This post is a long time coming. I've been meaning to publicly release my Python web framework for months. There are lots of Python web frameworks already, but I think the whole point of WSGI was to make it easy for anyone to implement their own web framework. Blueberry is hosted on Google Code, GitHub, and PyPI. It's my first github project so I'm expecting to make a few mistakes committing and pushing, but that's how it goes when learning a new source control manager.

Here are some facts about Blueberry:

  • It uses CherryPy's WSGI web server.
  • It was inspired by web.py and Google AppEngine's web framework.
  • I've been using it in production for months. This blog uses an old version of blueberry.
  • It's database ORM agnostic.
  • nginx is my preferred web server in front of blueberry, but it works with Apache and mod_python or mod_wsgi.

I still have a ton of work ahead of me to document blueberry, it just takes time. For now, what little information there is on blueberry can be found here on Google Code.

Distributed Database Systems

Comment

One of the things that's been keeping me up at night lately is this new project I've been researching and developing. SimpleStore is a way to store Python objects (in the future that will change to JSON objects) in sharded MySQL databases. The idea is that it lets developers scale their web applications, but honestly it hasn't been used on a site of any real scale yet since it's still very new.

SimpleStore is cool as a beta project for using with your own database servers. What would be even better is to take the ideas from SimpleStore and create a service that web developers can use for remote database storage. This is what's been really keeping me up at night. I've been doing research on distributed database systems and I think with my experience from writing SimpleStore I will be able to develop something pretty neat.

I won't turn SimpleStore into a cloud-based service, but the ideas from SimpleStore will help me create a truly distributed database system. If this service turns into something I can put online, I hope it creates another option for web developers looking for easy-to-use cloud database systems.

Also, I updated SimpleStore to version 0.3. That update adds MySQL sharding support. You can grab it here: simplestore-0.3.tar.gz

Schema-less Python Data Storage in MySQL using SimpleStore

Comment

I've been wanting to write this entry for a long time now. Since I recently put simplestore on Google Code I feel that it's finally time to write my thoughts about storing schema-less data. I wrote simplestore because I was fascinated by Bret Taylor's blog entry How FriendFeed uses MySQL to store schema-less data. My blog actually uses a predecessor to simplestore and it's worked out pretty well so far.

Here are a couple pages of documentation that demonstrate some of what simplestore is capable of. There's even a page on using simplestore with Pylons. To get a better look at simplestore's functionality I highly recommend reading (and running) the unit tests and reading over the actual source code. The latest official release can be downloaded here: simplestore-0.2.1.tar.gz

SimpleStore is still very new and there is still a lot of work to be done on it. Please email me at david@alwaysmovefast.com with any questions or requests of features/docs/tests you'd like to see regarding simplestore.

Tagged with: python, simplestore, mysql

Suds for Consuming SOAP Services

Comment

I started using suds recently and I have to say that it's exactly what I was looking for. It really opened my eyes to how I should've been consuming SOAP web services all along.

I needed a lightweight SOAP client for a project I'm working on, so I started off implementing my own. About 500 lines of code later after implementing a partial solution, I completely felt like what I was doing was just plain wrong. I did a quick google search and came across suds. I took a 500-line partial implementation and trimmed it down to just 40 lines using suds.

I still have a lot more work to do before my solution is complete, but I'm definitely sold on suds.

Tagged with: python, soap, web services, suds

Descriptors, Properties, and Metaprogramming

Comment

While reading through Google's AppEngine code for the datastore API, I came across a new (to me) technique for building classes with properties. The idea is that you can create a class for a Person and attach properties like so:

class Person(db.Model):
    firstname = db.StringProperty()
    lastname = db.StringProperty()

There are a lot of things you can do with those properties. For example, you can make sure they're required to exist and you can even validate their values by passing in a custom validator. When it's time to store a person object in the datastore, it will iterate over the properties, make sure they're valid, and finally convert each value to one suitable for storage. I've been using similar techniques for a few different projects at Based on Content, Inc..

For one of these projects, I wanted to build classes with properties that could be automatically turned into XML elements. The object itself would be the container XML element for its properties. Basically, create a class like this:

class Person(XMLElement):
    firstname = StringProperty()
    lastname = StringProperty()

and have it generate XML like this:

<person>
    <firstname>David</firstname>
    <lastname>Reynolds</lastname>
</person>

To accomplish this, I make use of some metaprogramming and descriptors. For a refresher on descriptors in Python, check out How-To Guide for Descriptors.

First off, I need to implement my property class. It's going to be very basic and not have validators or anything that makes it more useful. This is just a simplified example.

class Property(object):

    def __init__(self, name=None, default=None):
        self.name = name
        self.default = default

    def _configure(self, property_name):
        if self.name is None:
            self.name = property_name

    def __get__(self, instance, klass=None):
        if instance is None:
            return self
        try:
            return getattr(instance, self._attr_name())
        except AttributeError:
            return None

    def __set__(self, instance, value):
        setattr(instance, self._attr_name(), value)

    def _attr_name(self):
        return '_' + self.name

    def default_value(self):
        return self.default

    def get_value(self, instance):
        return self.__get__(instance, instance.__class__)

The Property class is called a "descriptor" since it implements the __get__ and __set__ methods.

The next piece is the metaclass for XML elements. It's going to do a couple things: initialize and configure the properties contained in a XML element class.

It looks like this:

class MetaXMLElement(type):

    def __init__(cls, name, bases, dct):
        super(MetaXMLElement, cls).__init__(name, bases, dct)
        _initialize_properties(cls, name, bases, dct)

The bulk of the work when using this technique is done by _initialize_properties. I haven't defined it yet. It goes through a class that contains properties and creates a _properties attribute for that class. All of the properties defined in that class are then added to the _properties dict. Another thing I want to be careful about is duplicate properties. You don't want attributes with the same name, which can happen through multiple inheritance. The _initialize_properties method is almost identical to google.appengine.ext.db

def _initialize_properties(cls, name, bases, dct):
    cls._properties = {}
    defined = set()

    # add properties from base classes to cls._properties
    # and make sure there are no duplicates
    for base in bases:
        if hasattr(base, '_properties'):
            property_keys = base._properties.keys()
            dupes = defined.intersection(property_keys)
            if dupes:
                raise Exception(
                    "Duplicate properties in base class %s already defined: %s" %
                    (base.__name__, list(dupes)))
            defined.update(property_keys)
            cls._properties.update(base._properties)

    for name, attr in dct.iteritems():
        if isinstance(attr, Property):
            if name in defined:
                raise Exception("Duplicate property: %s" % name)
            defined.add(name)
            cls._properties[name] = attr
            attr._configure(name)

Okay, so what does the base XMLElement class look like? The whole reason for having a XMLElement base class is so we can inherit core functionality and don't have to write so much specific code for individual XML elements that we actually use.

All of the XML elements we define from here on out will inherit from XMLElement.

class XMLElement(object):

    __metaclass__ = MetaXMLElement
    __node_name__ = None

    def __init__(self, **kwargs):
        for prop in self._properties.values():
            if prop.name in kwargs:
                value = kwargs.pop(prop.name)
            else:
                value = prop.default_value()
            prop.__set__(self, value)

    def _get_node(self):
        self._node = minidom.Element(self.__node_name__)
        for name, attr in self._properties.iteritems():
            value = attr.get_value(self)
            if value:
                self._node.appendChild(TextNode(name, value))
        return self._node
    node = property(_get_node)

__node_name__ is a string used as the containing node's name for a XMLElement's properties. The __metaclass__ attribute allows MetaXMLElement access to XMLElement's properties for configuration.

With all of the code defined above, that leaves us free to actually use it and create XML elements in Python.

Here's an example:

class Person(XMLElement):

    firstname = Property()
    lastname = Property()

    __node_name__ = 'person'

p = Person(firstname='david', lastname='reynolds')
print p.node.toxml()

The output of p.node.toxml() on my machine is:

<person><lastname>reynolds</lastname><firstname>david</firstname></person>

Notice how the lastname property is output first? This has to do with how dicts are handled. Keys are unordered. This generally isn't a problem when you're just sending XML over the wire (SOAP, for example), but if it bothers you, you can order the properties according to a static counter that keeps track of all the properties. You can then sort properties from low-to-high based on their counter value.

All the code for this article is available for download.

Tagged with: python, meta programming, xml

In Place Link Editing

Comment

I've been working on a new project that has some content management features lately. It's a pretty big project and will take months to complete. I was messing with in-place editing with prototype and scriptaculous and found that I couldn't use the default editor for managing links. Normally when you're doing in-place editing you have a single piece of text/content that you want to edit. You click on it and it brings up the form. For editing links I need to edit the URL and the anchor text. I needed to have two separate fields for this and Ajax.InPlaceEditor from controls.js wasn't entirely up to the task.

I ended up inheriting from Ajax.InPlaceEditor and altering some of the behavior to handle link editing. I feel like it's pretty much a hack job but it does what I need it to do, so it seems good enough. I felt that it would be somewhat beneficial to upload the code for in-place link editing.

I only uploaded the javascript and a html demo. The server-side stuff isn't included since this will work across different web languages. I use Pylons and Mako, for example.

You can grab the demo here: http://alwaysmovefast.com/public_svn/in_place_link_editor.

To give a short example of what I did for the server-side stuff, this could be my controller/action:

class LinksController(BaseController):
    def create(self):
        c.link = {
            'url': request.params['url'],
            'anchor': request.params['anchor'],
            'id': 'some_id'
        }
        return render('/links/create.mako')

And this is what '/links/create.mako' could look like to update the page:

<a href="${c.link['url']}">${c.link['anchor']}</a>

This is a very basic example. My create.mako actually looks something like this:

<script src="/javascripts/prototype.js"></script>
<script src="/javascripts/scriptaculous.js"></script>
<script src="/javascripts/application.js"></script>

<script>
    li = $('newlink');
    li.id = '${c.link['id']}';

    li.innerHTML = '<a href="${c.link['href']}">${c.link['anchor']}</a>';

    new_link = document.createElement('li');
    new_link.id = 'newlink';
    new_link.innerHTML = 'Insert Link';

    li.parentNode.appendChild(new_link);

    new InPlaceLinkEditor('newlink', '/links/create');
</script>

Basically what it does is update 'newlink' by changing its id to the id of the newly-created link on the server side and update the innerHTML of that object. It then creates a brand new link object and appends it to the old link's parent node. I then create a new InPlaceLinkEditor for the new 'newlink'. That's about it.

Hopefully you can find this stuff useful.

FormEncode Usage

Comment

There's a lack of documentation on how to use FormEncode in real projects so I've been meaning to write this article for some time now. I use FormEncode in my Pylons apps and I created a small app just for this article. You can grab the source here: http://alwaysmovefast.com/public_svn/formencode_tutorial. To make using FormEncode easier, I also created a few form helper methods that can be found in formencode_tutorial/lib/form_helpers.py. I also used a little CSS to make the form somewhat pretty.

FormEncode makes it pretty easy to do form validations and to then display any errors to your users. The way it works is that you pass a string of HTML to htmlfill.render(), along with some other options, and it returns parsed HTML for use in your pages. htmlfill.render() uses an errors option where you pass in a dict consisting of field names and error values. A typical error dict might look like: {'firstname': Invalid(u'Please enter your firstname',), 'lastname': Invalid(u'Please enter your lastname',)}.

htmlfill.render() will parse those errors and inject them into your <form:error> tag for that particular element. As well as accepting an error dict, htmlfill.render() can accept a default value dict that will inject default values into your form elements.

When injecting errors into your HTML template, it will also use an error formatter. The default error formatter looks like this:

def default_formatter(error):
    return '<span class="error-message">%s</span><br />' % html_quote(error)

You can also use custom error formatters for a little more control over the look and feel of your displayed errors. I like to use this as my formatter:

def p_error_formatter(error):
    return '<p>%s</p>' % htmlfill.html_quote(error)

The reason I use this as my formatter is because I display label elements below my form elements and above the errors. It's mostly personal preference.

I created a helper to abstract away some of the finer details of using formencode. One of them helps me create text fields with the label and form:error tags prepackaged and ready to pass to htmlfill.render():

# return HTML string that can be passed to htmlfill.render()
def tfield(name, label=None, **options):
    if label is None: label = name

    s = text_field(name, **options)

    id = ''
    if options.has_key('id'): id = options['id']
    else: options['id'] = name

    # append a label to this field for good measure
    s += '<label for="%s">%s</label>' % (id, label)

    # use a custom error formatter. when htmlfill.render() parses this html
    # it looks for the key 'p_error_formatter' and uses that formatter.
    # remember that 'p_error_formatter' is looked for in the error_formatters dict
    # defined above.
    s += '<form:error name="%s" format="p_error_formatter"></form:error>' % name
    return 

Notice the form:error tag? That's where htmlfill.render() inserts error text (if there is any) and formats it according to the 'p_error_formatter'. So how does htmlfill.render() know how to handle 'p_error_formatter'? Easy. You create a dict with your error formatter(s) and pass it to htmlfill.render(). When htmlfill.render() parses your HTML, it will take 'p_error_formatter' and do a lookup in your error formatter dict and call that function with the error text.

Here's my error_formatter dict:

error_formatter = {
    'default': htmlfill.default_formatter,
    'p_error_formatter': p_error_formatter
}

error_formatter['p_error_formatter'] is the p_error_formatter() function I defined above.

I also created my own render() function that just calls htmlfill.render() with some predetermined args:

def render(html, defaults=None, errors=None):
    return htmlfill.render(
        html,
        defaults=defaults,
        errors=errors,
        error_formatters=error_formatters,
        auto_insert_errors=False
    )

That's about it for the core htmlfill stuff. I have a couple other helper methods in form_helpers.py and some more CSS that makes the background of form elements red when there's an error.

If you're running Pylons, just go into the root formencode_tutorial directory and run 'paster serve ––reload development.ini' and check out how it's all tied together.

Feel free to leave comments if you have any questions or if I missed something. Happy hacking!

Pylons

Comment

I've been working on some Pylons projects lately because I've been wanting to move into using Python for web apps instead of Ruby. It took awhile to find the right Python web framework for me. I checked out Django, TurboGears, web.py, and a couple others. I finally landed on Pylons and it was an instant hit with me.

There's still a lot to be done on the Pylons core but it's very exciting to use and I never feel like the framework is holding me back.

This is only going to be a short post until I come up with something interesting to write an article about. Just thought I'd share that Pylons is pretty amazing.

Tagged with: python, pylons, web development

Follow Up: Fun with Image Smoothing in Python

Comment

I really wanted to finish this article last week, but I just didn't get around to it. This post is going to be on the things you can do with the Python Imaging Library when you implement your own kernels. Remember that depending on the kernel arguments you supply, you may get radically different results. The results may not even be a smoothed image; they could be embossed images, edge maps, etc. With that out of the way, let's get started.

There are a couple different ways you can implement your own image filters using PIL. The first way is easy for one-off filters that you probably won't use throughout your code. I mean, you could if you wanted to keep writing the same line of code over and over, but that's up to you. The way this is done is by making use of PIL's ImageFilter.Kernel class. You can create an object from this class that has all the common image filter arguments (size, kernel, scale, and offset).

# This is a custom image filter that produces exactly the same result as using ImageFilter.SMOOTH
import sys, Image, ImageFilter
def with_kernel_object(filename, outfile):
    kernel = ImageFilter.Kernel((3,3), (1, 1, 1, 1, 5, 1, 1, 1, 1), 13, 0)
    img = Image.open(filename)
    img = img.filter(kernel)
    img.save(outfile)

if __name__ == '__main__':
    with_kernel_object(sys.argv[1], sys.argv[2])

That's the quick and dirty way to implement your own image filters. It uses the same filter arguments as ImageFilter.SMOOTH so it yields an identical result.

The second way to implement your own image filters is to do it just like PIL does. For each image filter class that PIL provides, you'll see that they inherit from the ImageFilter.BuiltinFilter class which in turn inherits from the ImageFilter.Kernel class. The stock implementation for ImageFilter.SMOOTH is as follows:

# This is inside of the ImageFilter module (ImageFilter.py)
class SMOOTH(BuiltinFilter):
    name = "Smooth"
    filterargs = (3, 3), 13, 0, (
        1, 1, 1,
        1, 5, 1,
        1, 1, 1
    )

Where filterargs is the size of the kernel (3, 3), the scale factor (13), the offset (0), and the kernel itself. Now since the stock PIL filters are boring and I don't want this post to be about how the stock filters are implemented, I implemented my own image filter.

import sys, Image, ImageFilter
class MYFILTER(ImageFilter.BuiltinFilter):
    name = "My filter"
    filterargs = (3, 3), 4, 0, (
        1, 1, 1,
        1, 10, 1,
        1, 1, 1
    )

def with_my_filter(filename, outfile):
    img = Image.open(filename)
    img = img.filter(MYFILTER)
    img.save(outfile)

if __name__ == '__main__':
    with_my_filter(sys.argv[1], sys.argv[2])

The result of running this filter on my picture is this:

Pretty cool, eh? I highly encourage you to implement your own filters and dig into the stock PIL image filters to see how they achieve certain effects. With your own image filters and the stock PIL image filters at your command, you may never have to write your own image processing algorithms at all.

Tagged with: python, image processing, pil

Fun with Image Smoothing in Python

Comment

Smoothing an image is helpful for a few different reasons. The big one that I've been using image smoothing for is to remove noise from an image. Laplacian edge detection is highly susceptible to noise due to the Laplacian operator being a second derivative operator. The Python Imaging Library (PIL) provides a few different ways to produce smooth images. The most obvious would be to use the ImageFilter.SMOOTH class.

The image I'll be working on throughout this post is this greyscale picture of me:

Standard smoothing in PIL is super easy. For example:

import sys, Image, ImageFilter

def pil_smooth(filename, outfile):
      img = Image.open(filename)
      img = img.filter(ImageFilter.SMOOTH)
      img.save(outfile)

if __name__ == '__main__':
    pil_smooth(sys.argv[1], sys.argv[2])

Using the image of me above, here's the result of this smoothing operation:

That's a pretty good result for just using stock PIL image filters. The Python Imaging Library also offers a SMOOTH_MORE filter. Replacing the ImageFilter.SMOOTH above with ImageFilter.SMOOTH_MORE, we get:
Digging into the PIL source gives a really good indication of how it produces these results. For example, the BuiltinFilter classes (such as ImageFilter.SMOOTH) use filter arguments to produce different results. These filter arguments are a size tuple, which is the width and height of the kernel, the convolution kernel itself as a sequence containing weighted values, the scale which is used to divide the result of each pixel, and finally the offset which is added to the result after it has been divided by the scale factor.

For ImageFilter.SMOOTH, these filter arguments are:

size = (3,3)
scale = 13
offset = 0
kernel = (
    1, 1, 1,
    1, 5, 1,
    1, 1, 1
)

I'm going to present a way to implement image smoothing so you can have a better idea of what's really going on when you smooth an image. One small note: the scale part of the process defaults to the sum of the weights in the kernel. So if it isn't present, you can calculate the default scale by doing this:

# assumes size = (kernel_width, kernel_height)
scale = 0
for i in xrange(size[0]):
    for j in xrange(size[1]):
        scale += kernel[j+i*size[0]]

With that out of the way, here's an implementation of image smoothing that is equivalent to calling ImageFilter.SMOOTH like the code above:

import Image
import sys

# img is an Image object
# size is width, height tuple of kernel
# scale is the value by which the sum of the convolution operation is divided
# offset is added to the result of sum / scale
# This smooth() function can only operate on 3x3 kernels for now. Adding support for 5x5
# is similar to 3x3, but it's something I'll cover in a later post.
def smooth(img, size, kernel, scale=0, offset=0):
      # note that converting an image to greyscale isn't required.
      # it's just something that I do and you can leave it out if you want.
      if img.mode != 'L': img = img.convert('L')
      pixels = list(img.getdata())
      width, height = img.size

      outimg = Image.new('L', (width, height))
      outpixels = list(outimg.getdata())

      if scale == 0:
          # calculate it from the sum of the weights in the kernel
          for i in xrange(size[0]):
              for j in xrange(size[1]):
                  scale += kernel[j+i*size[0]]

      for x in xrange(width):
          # copy top row of pixels to eliminate top border
          outpixels[x] = pixels[x]

      for y in xrange(1, height-1):
          # copy left column of pixels to eliminate left border
          outpixels[y*width] = pixels[y*width]
          for x in xrange(1, width-1):
              result = 0
          # PIL's C implementation unrolls these loops, but the idea is the same.
          for i in xrange(-1, 2):
              for j in xrange(-1, 2):
                  result += pixels[(y+j)*width+x+i] * kernel[(j+1)*size[0]+i+1]

          result = result / scale + offset
          if result <= 0: outpixels[y*width+x] = 0
          elif result >= 255: outpixels[y*width+x] = 255
          else: outpixels[y*width+x] = result

          # copy right column of pixels to eliminate right border.
          # note the x+1: x at this point is width-2, so increment it once
          # to get the far right column.
          outpixels[y*width+x+1] = pixels[y*width+x+1]

      for x in xrange(width):
          # copy bottom row of pixels to eliminate bottom border.
          # note the y+1: same reason as above for x+1.
          outpixels[(y+1)*width+x] = pixels[(y+1)*width+x]

      outimg.putdata(outpixels)
      return outimg

if __name__ == '__main__':
    img = Image.open(sys.argv[1])

    kernel = (
        1, 1, 1,
        1, 5, 1,
        1, 1, 1
    )

    outimg = smooth(img, (3,3), kernel, 13, 0)
    outimg.save(sys.argv[2])

Notice how it loops through 1 <= y < height-1 and 1 <= x < width? That's because doing this processing through the entire image produces weird borders. The code above compensates for this and eliminates the dark borders by copying the original border pixels to the borders of the output image.

So what happens when I run the code above on the original image? Check it out:

Being able to code this kind of stuff is really cool, but for these really basic examples of standard smoothing, using PIL's built-in filters is definitely the way to go. You're also not limited to using PIL's built-in filters. If you require different kernels and even different scaling and offset attributes, PIL provides ways for you to do that.

I was hoping to cover the ways you can implement your own filters in PIL, but it's getting late so I will try covering them tomorrow if I have time.

Tagged with: python, image processing, pil

Basic Edge Detection in Python

Comment

Detecting edges in images is being actively researched for many different applications. The most notable of these applications is computer vision. The reason I began studying edge detection algorithms, aside from them being really cool, is that I’ve been noticing that I can use edge detection as part of my toolbox for optical character recognition.

So, how do we define edges in any given image? Edges are really just areas where the pixels intensities contrast. Basically where you have a bunch of light pixels touching a bunch of dark pixels. There are a couple different methods used for detecting edges: gradient and Laplacian. I’m going to be covering a basic gradient edge detection technique and will cover Laplacian techniques in future posts.

Gradient edge detection approximates the first derivative of the image, looking for minimum and maximum intensities in the magnitude of the gradient. Locating edge pixels can be done by setting a threshold of some value and testing if the gradient is greater than that threshold.

The gradient of the image function I is given by the vector:

d I = [dI / dx, dI / dy]

To approximate the first derivative of the image, we use convolution masks. The method I'm going to present is the Prewitt method. It uses two masks to approximate dI / dx and dI / dy, giving us a gradient of the image's pixels. dI / dx and dI / dy detect vertical and horizontal edges, respectively. The masks that define dI / dx and dI / dy for the Prewitt operator are:

dI / dx:
[-1, 0, 1]
[-1, 0, 1]
[-1, 0, 1]

dI / dy:
[1, 1, 1]
[0, 0, 0]
[-1, -1, -1]

The resulting outputs of convolving the image with these masks are then added to get the magnitude of the gradient. The magnitude of the gradient is given by:

|G| = sqrt(Gx2 + Gy2)

To approximate the magnitude of the gradient, we use:

|G| = |Gx| + |Gy|

After getting the magnitude of the gradient, we want to check if it's larger than our threshold. All the methods I've seen use a threshold of 255. What this means is that when the magnitude of the gradient is larger than 255, we've found an edge. We cap the magnitude to 255 if it's larger than 255 and mark the pixel in the output image as a 0, which is black. This is done implicitly by setting the pixel value to 255 - magnitude, meaning if the magnitude is 255, the pixel value is black. Magnitudes of 0 will set the pixel to 255 - 0, which is white. The magnitudes can be any value between 0 and 255, inclusive.

The Prewitt masks in Python are given by the function get_prewitt_masks():

# Uses hashes of tuples to simulate 2-d arrays for the masks.
def get_prewitt_masks():
    xmask = {}
    ymask = {}

    xmask[(0,0)] = -1
    xmask[(0,1)] = 0
    xmask[(0,2)] = 1
    xmask[(1,0)] = -1
    xmask[(1,1)] = 0
    xmask[(1,2)] = 1
    xmask[(2,0)] = -1
    xmask[(2,1)] = 0
    xmask[(2,2)] = 1

    ymask[(0,0)] = 1
    ymask[(0,1)] = 1
    ymask[(0,2)] = 1
    ymask[(1,0)] = 0
    ymask[(1,1)] = 0
    ymask[(1,2)] = 0
    ymask[(2,0)] = -1
    ymask[(2,1)] = -1
    ymask[(2,2)] = -1
    return (xmask, ymask)

Now on to the meat of the entire operation. The prewitt() function takes a 1-d array of pixels and the width and height of the input image. It returns a greyscale edge map image.

import Image

def prewitt(pixels, width, height):
    xmask, ymask = get_prewitt_masks()

    # create a new greyscale image for the output
    outimg = Image.new('L', (width, height))
    outpixels = list(outimg.getdata())

    for y in xrange(height):
        for x in xrange(width):
            sumX, sumY, magnitude = 0, 0, 0

            if y == 0 or y == height-1: magnitude = 0
            elif x == 0 or x == width-1: magnitude = 0
            else:
                for i in xrange(-1, 2):
                    for j in xrange(-1, 2):
                        # convolve the image pixels with the Prewitt mask, approximating &#8706;I / &#8706;x
                        sumX += (pixels[x+i+(y+j)*width]) * xmask[i+1, j+1]

            for i in xrange(-1, 2):
                for j in xrange(-1, 2):
                    # convolve the image pixels with the Prewitt mask, approximating &#8706;I / &#8706;y
                    sumY += (pixels[x+i+(y+j)*width]) * ymask[i+1, j+1]

            # approximate the magnitude of the gradient
            magnitude = abs(sumX) + abs(sumY)</p>

            if magnitude > 255: magnitude = 255
            if magnitude < 0: magnitude = 0

            outpixels[x+y*width] = 255 - magnitude

    outimg.putdata(outpixels)
    return outimg

You can store this code all in one file so when you run it, you can pass the program arguments for the input and output image filenames on the command line. To do so, add this code to the Python file with the edge detection code from earlier:

import sys
if __name__ == '__main__':
    img = Image.open(sys.argv[1])
    # only operates on greyscale images
    if img.mode != 'L': img = img.convert('L')
    pixels = list(img.getdata())
    w, h = img.size
    outimg = prewitt(pixels, w, h)
    outimg.save(sys.argv[2])

I called my file prewitt.py, so with all that code in the same file, you can call it from the command line:

$ python prewitt.py input_image.gif output_image.gif

Note that it will work for pretty much any image type you give it. Here are some results of me running the code above:

I suppose that concludes this article on basic edge detection using the Prewitt method. Hope you enjoyed reading it as much as I enjoyed writing it!

Tagged with: python, image processing, pil

Python Meta Programming: Update

Comment

I thought I was being really clever with the meta programming from the previous post but I ran into some roadblocks when using similar techniques in live code. I pretty much scrapped the meta programming stuff for now until I can do some more research and develop something solid. At least the problems showed themselves early on before I really became attached to the code. Here's to failing early and often! ;)

Tagged with: python, meta programming

Python Meta Programming

Comment

For an ORM (Object Relational Mapper) I'm working on, I was trying to figure out how I can make it connect to a database without manually calling any functions. Using MySQLdb, you connect to a database by calling MySQLdb.connect(). I wanted this to happen automatically so I'm not always calling MySQLdb.connect() or the equivalent from my own ORM.

The technique I used was to initialize the class when it's first defined. In that __classinit__ method I set up the database connection. The metaclass I used is pretty straightforward:

class MetaRecord(type):
    def __new__(cls, name, bases, dct):
        klass = type.__new__(cls, name, bases, dct)
        klass.__classinit__.im_func(klass)
        return klass

The class above is used as the __metaclass__ of another class which has a __classinit__ function that sets up the database connection.

class MySQLRecord(dict):
    __metaclass__ = MetaRecord
    __CONN = None

    def __classinit__(cls):
        if not cls.__CONN:
            cls.__CONN = MySQLdb.connect(host='localhost', user='foo', passwd='password', db='some_database')

    # ... Rest of the code goes here.

So now you can create your model classes that inherit from MySQLRecord and it will create a database connection if one doesn't already exist.

class User(MySQLRecord):
    pass

I highly encourage anyone interested in python metaprogramming to rip apart these techniques and others to get a better feel of what's going on under the hood. A good place to start would be to set up your own metaclasses and print out the various variables and follow what happens in what order. I'll be doing follow up posts on python metaprogramming that will clear up any fuzzy details.

Tagged with: python, meta programming

Cracking CAPTCHAs for Fun and Profit

Comment

Image processing and specifically OCR (Optical Character Recognition) has become an obsession of mine lately. A lot of research is being done in OCR for handwriting, digitizing books, cursive writing, and even CAPTCHA cracking. For those of you who may not know what a CAPTCHA is, it stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It's those little images with letters and numbers in them that are used when registering on websites and even posting comments to blogs and forums.

The idea is that using a CAPTCHA will prevent computer programs from automatically registering or submitting comments on a given website. Breaking a CAPTCHA by using OCR renders these systems irrelevant. It's definitely a game of cat and mouse. When a CAPTCHA is cracked, the intelligent thing to do is replace it with a stronger CAPTCHA.

I'm kind of reluctant to post very much information on how to crack CAPTCHAs and I'm sure it's obvious why. I probably won't be posting full source code for any given CAPTCHA and the code I do give out will either be crippled or just be snippets of a larger OCR program. The techniques used for cracking CAPTCHAs are really just image processing algorithms that have been applied for this specific use.

In the future I will be posting techniques on how to crack specific CAPTCHAs. For example, in my next article I'll present algorithms for cracking the CAPTCHA at Bumpzee.com. Generally, if I post an article on how to crack a specific CAPTCHA it will probably be a site that isn't worth spamming.

Each CAPTCHA is unique and the techniques used to crack a specific CAPTCHA have to be altered slightly, but generally all CAPTCHAs are cracked using similar techniques. For example, you read the image into memory, eliminate any noise, separate each character into its own image, then perform some kind of pixel matching to determine what each character is. With most CAPTCHAs in the wild today you can train your OCR software to recognize characters by doing pixel matching against each letter in the CAPTCHA.

This approach is really brute force and doesn't work very well on the more advanced CAPTCHAs. For now I will be focusing on the brute force pixel matching techniques and maybe in later posts I will go into advanced techniques.

Using Python and PIL (Python Imaging Library), loading a CAPTCHA (or any image) is as simple as:

img = Image.open(filename)

Sometimes a CAPTCHA will have noise in the background. Since each site's CAPTCHA is unique, you have to come up with techniques to eliminate that noise. One of my favorite techniques is to convert the CAPTCHA to a greyscale image:

def captcha_to_greyscale(captcha):
    if captcha.mode == 'L': return captcha
    captcha = captcha.convert('L', (.4, .4, .4, 0))
    return captcha

I like to use (.4, .4, .4, 0) for my conversion matrix when converting from 'RGB' to 'L' (greyscale). Past experience has shown this to be a decent conversion matrix but like I said earlier, all CAPTCHAs are different and some might not do well with that conversion matrix. You may even be able to get away without using a conversion matrix at all.

After converting the CAPTCHA to greyscale, another technique I use is to eliminate pixels that aren't part of the letters. A lot of times this means the letters have darker shades of grey in them and the background noise has lighter shades of grey. You can determine which pixels to eliminate by trial and error. PIL provides a method that will give you all the colors in an image:

print captcha.getcolors()

Using the output of getcolors() and modifying pixels until you determine the best colors to eliminate is all trial and error. Here's a function you can use to play with for eliminating lighter-colored pixels:

# pixels is gotten from the image with: pixels = captcha.load()
# w and h is gotten with: w, h = captcha.size
# Note: captcha.load() returns a pixel access object. if you alter a pixel using this object
#       then you alter the captcha itself. there are other ways to load pixels but I like using
#       the pixel access objects.
#       captcha.size returns a tuple of height and width values for the image.
def light_pixels_to_white_pixels(pixels, w, h):
    for x in xrange(w):
        for y in xrange(h):
            if pixels[x, y] > 140: pixels[x, y] = 255

    return pixels

The function is straightforward: iterate through each pixel, check if its color is greater than 140 and set it to white if the check passes. The idea is that this eliminates the lighter background noise while leaving the darker character pixels.

After eliminating the basic noise, there's another thing I like to do called 'skeletonization'. There are a few different ways of achieving similar, but different, results. To put it plainly, skeletonization is a technique that takes an image and reduces the amount of edge pixels there are. For some CAPTCHAs it's good enough to check surrounding pixels and eliminate them if there are too many white pixels surrounding a dark pixel. Another skeletonization technique is more advanced and is used for trimming edges to one-pixel widths in some cases. The skeletonization technique I'm going to cover here is the simpler version for getting rid of some noise in the CAPTCHA.

# This function uses two passes over the pixels, once for marking black pixels for removal
# and one for actually removing the black pixels. The two-pass approach is generally ideal
# because you don't want to be flipping black pixels to white pixels while you're still
# iterating over them looking for neighbors. That would artificially inflate the number of
# white pixels and we don't want that.
def skeletonize(pixels, w, h):
    for x in xrange(w):
        for y in xrange(h):
            # no point in processing white pixels since we only want to remove black pixels
            if pixels[x, y] == 255: continue

        count = 0
        # Using a try/except block here is a weak solution. The proper way to do this
        # would be to test that each pixel is within the image's borders and then check
        # if it's not white. Using try/except means that when an exception is raised,
        # code execution resumes after the except: pass statement and no other if-statements
        # are executed. This results in the variable 'count' not getting a correct value
        # and may result in a pixel getting set for removal.
        try:
            if pixels[x-1, y-1] != 255: count += 1
            if pixels[x-1, y  ] != 255: count += 1
            if pixels[x-1, y+1] != 255: count += 1
            if pixels[x, y+1  ] != 255: count += 1
            if pixels[x+1, y+1] != 255: count += 1
            if pixels[x+1, y  ] != 255: count += 1
            if pixels[x+1, y-1] != 255: count += 1
            if pixels[x, y-1  ] != 255: count += 1
        except: pass

        # not enough neighbors are dark pixels so mark this pixel
        # to be changed to white
        if count < 4:
            pixels[x, y] = 1

    # second pass: this time set all 1's to 255 (white)
    for x in xrange(w):
        for y in xrange(h):
            if pixels[x, y] == 1: pixels[x, y] = 255

    return pixels

Now that the CAPTCHA is clean and noise is removed, the next step is to separate the characters from the CAPTCHA. There are a bunch of techniques for splitting a CAPTCHA into its letters. One that I've seen and even used is very brute force. The algorithm iterates over the CAPTCHA's pixels and looks for non-white pixels. When it finds one, it records the x,y coordinates. It also stores values for the min and max x,y coordinates. Those coordinates allow you to crop the CAPTCHA and pull out the letter. The way it determines a letter's bounding box is by finding a column that only has white pixels. A column that has zero black pixels indicates that there are no letter pixels in them and the letter's bounding box is complete. This brute-force approach is problematic when a CAPTCHA has letters that have the same X coordinates with different Y coordinates. As you can imagine, using this algorithm to split a CAPTCHA's letters will result in pulling two or more letters if the X coordinates of the letters is the same.

I'll cover the brute force algorithm for now and in a later post I will go over the more elegant flood-fill algorithm that doesn't fail on overlapping X coordinates.

def split_captcha_letters(captcha):
    started = False
    letters = []
    width, height = captcha.size
    bottomY, topY = 0, height
    pixels = captcha.load()
  
    for x in xrange(width):
        black_pixel_in_col = False
        for y in xrange(height):
            if pixels[x, y] != 255:
                if started == False:
                    started = True
                    firstX = x
                    lastX = x
   
                if y > bottomY: bottomY = y
                if y < topY: topY = y
                if x > lastX: lastX = x
   
                black_pixel_in_col = True
   
        if black_pixel_in_col == False and started == True:
            rect = (firstX, topY, lastX, bottomY)
            new_captcha = captcha.crop(rect)
  
            letters.append(new_captcha)
  
            started = False
            bottomY, topY = 0, height
  
    return letters

The function above iterates over all the pixels in the CAPTCHA looking for pixels that aren't white. If it's the first non-white pixel found, record that pixel's X coordinate in firstX. It also sets the initial value for lastX. It then checks the minimums and maximums for the top and bottom Y coordinates and the lastX coordinate. It then overwrites the variables with new values if necessary.

As long as there is a black pixel in each column, we know we're looking at a letter in the CAPTCHA, so we only crop the CAPTCHA when we hit a column without any non-white pixels. Those bounding box variables (firstX, topY, lastX, bottomY) now come into play when setting up a crop box for the CAPTCHA.

Append this cropped image (a letter) to the letters list, reset the algorithm's bounding box variables and resume scanning the CAPTCHA for more letters.

The final step in brute force CAPTCHA cracking is pixel matching. I'll be exploring more advanced methods of OCRing CAPTCHAs, but for now the simplest method is doing a pixel-by-pixel match.

There is one thing I've left out until this point: OCR software has to be trained. For example, when you first run a CAPTCHA cracker you have to tell it which characters it's reading. You basically have to solve CAPTCHAs for all letters and numbers until the OCR can successfully match a significant portion of all CAPTCHAs on a site. Training it is just a matter of letting the OCR software split the CAPTCHA into letters and then you manually input which letter it is. The software then saves that letter either in a directory named after the letter you input or in some other way that it's easily identified as being the correct letter.

This is where the pixel matching comes into play. It splits the live CAPTCHA into its letters, iterates over all saved letters that you 'trained' the software with, and then finds the best match by counting the number of pixels that are matched. Since it knows where the letter came from, such as a directory, it knows that the directory name of the best-matched letter is the correct value for that character.

For now I'll leave out this pixel matching function and I may post it at a later date.

Twitter

davidreynolds: I just became the mayor of Uncle Jack's Billiards on @foursquare! http://4sq.com/a6yEqh