Raspberry Pi USB Webcam Motion Detection

I did it. I got a Raspberry Pi.

My first project is to create a home monitoring device. Not a simple one that just records video on the file system; I want it to be cool. I want it to have motion detection to trigger video capture, a web service to forward it to after a recording is complete, a website that can monitor multiple devices, an auto-updater so I don’t have to manually update 10 devices, and an iOS/android app that connects to the device via Bluetooth to configure it in the event that it loses wifi connection. I don’t want to ever have to plug an HDMI cable into the device again after it is deployed.

Naturally, I started with the flashy camera stuff. While waiting for the official Pi camera to come in the mail, I plugged in a USB webcam and got it displaying pretty quickly after choosing between several libraries to complete the task. The approach I took was using Pygame. It offers a really simple way to scan the cameras available, capture a frame, and access the RGB values of each frame. As a challenge, I wanted to see if I could come up with a motion detection algorithm without doing any research. I tried two different approaches.

These algorithms probably already have names, but I haven’t checked. I just gave them techy sounding ones. Neither of these algorithms is very fast, and I’m sure it can be optimized further, but I am able to get the results I want at about 30% CPU usage on a 760×480 image.

Mean Pixel Aggregate Delta

I won’t spend too much time on this because it didn’t work very well. This method involves iterating through each pixel in a 3d array and adding each channel to a running long, then dividing by the total number of pixels at the end to find the mean RGB of the image. Remember that we can’t simply add the RGB values up into one integer, so we must separate them and compare each channel independently. We compare the mean difference between two frames, and if it is greater than a certain threshold, we have determined there is movement in the frame.

The problem with this is that it’s not sensitive enough. Slight movements are massively underrepresented in this algorithm. While you could set the movement threshold to very low, the natural artifacts in the camera would also cause movement to register. I wasn’t able to find a configuration of values that would sensitize it enough to overcome that.

Lower-Bounded Iterative Delta

This method seems to work well enough for what I want it to do, and I’m happy with the results. The basic idea is that we have two images, and we iterate through each pixel and compare the RGB channel values to the older image and find the absolute value of the difference. If the change is greater than some threshold in one or more channels, increment a counter outside the loop to indicate that we have found a changed pixel. The threshold is important here because we are able to adjust the lower bound to ignore artifacts, which tend to have very small changes.

The last step is comparing the total number of changed pixels to the total number of pixels. If it is over a certain surprisingly small coefficient, we have detected movement.

The problem with this algorithm is that it requires the Pi to iterate through every single pixel. I was getting one frame every 5 seconds or so, which is quite slow, and the lag is enough to completely miss any movement if it happened between frames. The solution was to implement a step in the loop so that we only sample every x number of pixels. This is a balancing act; if your step is too small, we still have the speed issue, but if the step is too large, we can miss smaller actions that we should definitely treat as movement. Smaller items become less represented the larger the step. This step should obviously scale to the size of the image–since my camera is fairly low resolution at 760×480, a step of 5 was a good compromise, but if you’re running 1080p video through the algorithm then a larger step size is appropriate.

 

If you wanted to go further–and I will be doing so in the future when I come back to optimizing this–you could cache the results of the past x number of frames (say, 3) and decide to record based on the results of all of them combined. This would increase the accuracy of the decision to record and ensure we are only getting motion that matters, and not a one or two frame camera focus adjustment. This depends on your sample rate, of course; if you are sampling 10 frames per second, you’ll want to analyze more than 3 frames.

 

The code:


import time
import pygame
import pygame.camera
from pygame.locals import *

PIXEL_CHANGE_LOWER_BOUND = 10
TOTAL_CHANGE_COEFFICIENT_THRESHOLD = .002
PIXEL_SAMPLE_SKIP_SIZE = 5


def shouldRecord(newImg, oldImg):
    if newImg is None or oldImg is None:
        return false
    
    oldImgRGB = None
    newImgRGB = None

    oldImgRGB = pygame.surfarray.array3d(oldImg)    
    newImgRGB = pygame.surfarray.array3d(newImg)


    changedPixels = long(0)
    actualChangedPixels = long(0)

    for x in range(0, len(oldImgRGB), PIXEL_SAMPLE_SKIP_SIZE):
        for y in range(0, len(oldImgRGB[0]), PIXEL_SAMPLE_SKIP_SIZE):
            r = abs(int(oldImgRGB[x][y][0]) - int(newImgRGB[x][y][0]))
            g = abs(int(oldImgRGB[x][y][1]) - int(newImgRGB[x][y][1]))
            b = abs(int(oldImgRGB[x][y][2]) - int(newImgRGB[x][y][2]))
            if (r > PIXEL_CHANGE_LOWER_BOUND or g > PIXEL_CHANGE_LOWER_BOUND or b > PIXEL_CHANGE_LOWER_BOUND):
                changedPixels += 1

            if (r > 0 or g > 0 or b > 0):
                actualChangedPixels += 1

    totalPixels = len(oldImgRGB) * len(oldImgRGB[0])
    percentChanged = float(float(changedPixels) / totalPixels)

    percentChangedDisplay = str(round(percentChanged * 100, 3)) + "%"
    
    if (percentChanged >= TOTAL_CHANGE_COEFFICIENT_THRESHOLD):
        print("Recording now!" + " Threshold percent changed: " + str(percentChangedDisplay))
        return True
       
    print("..." + " (Threshold percent changed: " + str(percentChangedDisplay) + ")")
    return False


pygame.init()
pygame.camera.init()

camlist = pygame.camera.list_cameras()
if camlist:
    cam = pygame.camera.Camera(camlist[0])

cam.start()
img = cam.get_image()

width = img.get_width()
height = img.get_height()

screen = pygame.display.set_mode((width, height))
pygame.display.set_caption("USB Webcam")

oldImg = None

while True:
    for e in pygame.event.get():
        if e.type == pygame.KEYDOWN:
            if e.key == pygame.K_ESCAPE:
                cam.stop()
                pygame.display.quit()
                pygame.quit()
                sys.exit()
   
    
    oldImg = img
    img = cam.get_image()

    shouldRecord(oldImg, img)

    screen.blit(img, (0,0))
    pygame.display.flip()

    if oldImg is None or img is None:
        continue

    






		
		

MMO Server Design Part 1

MMO design is hard. Really hard. There’s no sugar-coating it–if you decide to roll your own server code, you will spend a lot of hours doing so, and most of that time will be spent debugging.

That said, since you are here to begin with, you probably have an interest in the topic, and me telling you it is hard should not dissuade you. It can be really rewarding during those eureka moments.

First you should have an understanding of what the heck we’re trying to accomplish with all this. In the early days of networking, games took the obvious approach to networking: when your character moves, propagate that change to the other machines connected to the game. It’s simple and dirty, but is not used because of how easy that is to hack. While the client can do error checking, it’s just not secure enough. There’s also the problem of the game state being out of sync for each machine–if your machine says you hit someone, but your opponent’s machine disagrees, who is right? Technically, they’re both right, and we can’t have that.

Read More

Visual Studio 2015 Preview

Seeing the new toys in Visual Studio 2015 cause me to ogle like a kid in a candy store. (Yes, the definition of ogle is “stare at in a lecherous manner.” I stand by my words.) Here’s a short list of the things I’m most excited about:

Linq Support in the Debugger

This feature is just short of being utterly life-changing for my company. This allows you to mouse over each function in a chain of Linq statements and see the values at each stage. Previously, if you had two or three linq statements put together and you weren’t sure which one was causing the error, it was necessary to stop the execution, unchain them all into individual variables, and then run the program again. This doesn’t work well for us, as our software takes upwards of five minutes to initialize. What would really be life-changing, however, is Edit and Continue support–we would literally double our development speed, as 99% of our database layer uses Entity Framework and Linq.

XAML Debugging Tool

There’s now a tool to be able to see the live properties of each component in a WPF form. While we still use WinForms, I use WPF in my personal projects, so I’m quite happy.

ASP.NET 5

The new version of ASP.NET combines the previously distinct MVC and Web API libraries, providing a unified library for modern web development. This used to be a big source of confusion to me–which Controller class am I using? Should I be using System.Web or System.Http?

Miscellaneous

After using the preview for a bit, I noticed the intellisense coloring is just better. When mousing over a function, a box used to come up with all gray text saying the details of the declaration, such as the parameter and return types, which could be very long and hard to read when getting into Funcs. Now they are nicely color-coded and much easier to understand at a glance.

Lastly, Microsoft seems to really be pushing Xamarin development. Mobile development is the future, and it’s good to see Microsoft catching up to Apple in this respect. As a .NET developer and fanboy, I’d much rather develop with C# than Swift.

Positive changes all around. Can’t wait to adopt it at work.

SQL: The Apply Keyword

The other day I learned something about SQL. In particular, MS SQL. I would never call myself a master, not by a long shot–however, I do have a solid grasp of how to put together rather complex queries, so I was surprised to learn of a new keyword that is just as useful as a simple join.

First, a little rant as to why this is necessary in the first place. I do tend to find the language archaic in parts. For instance, we have all seen the error “Column ‘name’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.’ It occurs when we try to select something out of a query where not all columns have been included in a group by or having statement, leading SQL Server to get confused as to what you actually want (explanation for why here). After learning why, it does make perfect sense, but this leads to a whole class of frustrations that would just go away if this was not the case. If only we could bend the rules of logic, right? (For the record, I say the language is archaic for the syntax, not for the logical issue, which is not unique to SQL and is not easily fixed. But syntactically, why not allow the “*” operator in group by, and have it include all the items in the select statement that are not aggregate functions, and do the same for group by? I digress.)

In my frustration to solve this problem for a particularly large query in which I had multiple nested queries each pulling data from the query below it while the inner layers needed the grouping one way while the others needed it another, I discovered a keyword that is not present in MySQL, and is unique to SQL Server. This keyword is “Apply”.

To be clear, this is treading into hackland, but I will say that the performance of using Outer Apply was far better than any other solution we came up with for our particular problem.

The purpose of the function is to allow you to use table functions, which join statements do not allow. This means you can dynamically create a table, organize the data however you’d like, make whichever columns you need accessible, give that table an alias, and then reference the data in the outer query. But the kicker: you can reference tables that already exist in the outer query from within the apply statement. Have a key or value from a table that you need to use within the apply? No problem, just reference it with the alias given in the outer query.

For example, the following code works great:

SELECT OT1.NAME, APPLIEDTABLE.NAME FROM OUTERTABLE1 AS OT1
OUTER APPLY (
SELECT * FROM INNERTABLE1
WHERE OT1.KEY = IT1.KEY
) AS APPLIEDTABLE 

Func Funkiness

Func<> and Action<> are the fire beneath the almighty Linq, the savior of inline collections querying in C#. They allow the programmer to pass in a function as a parameter to a method, which can then use the behavior as an object–in other words, functions can be objects. It’s quite a powerful concept, and it is a fundamental building block of Javascript. Used in the right manner, it greatly enhances the usability of the language. Used in the wrong manner, it becomes a debugging nightmare.

Ok, nightmare is a little strong. But I do feel a bit of rage when I come across a class that is full of Funcs used as properties. Here’s a short list of why:

1) Debugging. When I want to find out why something behaves the way it does, I want to be able to F12 (or right click -> Go to Declaration) my way to the parent function or object until I can say “oh, this chain of ambiguously named variables (which is an inevitability in our trade) comes from a function called MultiplyTheseThingsTogether(). Now I know what the value means–a composite of the values from TableA and TableB” and so on and so forth. When I see a Func, I now know that I have to Shift-F12 (or right click -> Find Usages) on the setters to find all the possible places where the behavior is being defined and make my best guess at which one is relevant. I can use the call stack to trace my way back up, of course, but what if the program isn’t running or I want to go up a different path in the stack?

2) Runtime ambiguities. On a related note, since the Func is a property, it can change at runtime. This leads to situations where it is difficult to determine where a value comes from if a Func has been redefined since the breakpoint was hit. And if it hasn’t been redefined multiple times, it brings up a question as to why Func is necessary in the first place.

3) General OOP ideological issues. In general, an object should not contain behavior which it does not control. This is directly at odds with using Func as a property.

While Func can be a valid tool in the developer’s chest, the vast majority of situations in which I have seen it used in the wild (aside from Linq, which tends to make it invisible anyway–as I believe it should be) use it as a crutch for poor design or a band-aid for legacy systems. The latter can be a valid excuse, as massive refactors can be expensive and entirely unnecessary given adequate performance of existing code, but the former is where this post is directed.

For the sanity of developers that come after you, please do not use Func unless you have a good reason to do so. There are few.