Raspberry Pi USB Webcam Motion Detection

I did it. I got a Raspberry Pi.

My first project is to create a home monitoring device. Not a simple one that just records video on the file system; I want it to be cool. I want it to have motion detection to trigger video capture, a web service to forward it to after a recording is complete, a website that can monitor multiple devices, an auto-updater so I don’t have to manually update 10 devices, and an iOS/android app that connects to the device via Bluetooth to configure it in the event that it loses wifi connection. I don’t want to ever have to plug an HDMI cable into the device again after it is deployed.

Naturally, I started with the flashy camera stuff. While waiting for the official Pi camera to come in the mail, I plugged in a USB webcam and got it displaying pretty quickly after choosing between several libraries to complete the task. The approach I took was using Pygame. It offers a really simple way to scan the cameras available, capture a frame, and access the RGB values of each frame. As a challenge, I wanted to see if I could come up with a motion detection algorithm without doing any research. I tried two different approaches.

These algorithms probably already have names, but I haven’t checked. I just gave them techy sounding ones. Neither of these algorithms is very fast, and I’m sure it can be optimized further, but I am able to get the results I want at about 30% CPU usage on a 760×480 image.

Mean Pixel Aggregate Delta

I won’t spend too much time on this because it didn’t work very well. This method involves iterating through each pixel in a 3d array and adding each channel to a running long, then dividing by the total number of pixels at the end to find the mean RGB of the image. Remember that we can’t simply add the RGB values up into one integer, so we must separate them and compare each channel independently. We compare the mean difference between two frames, and if it is greater than a certain threshold, we have determined there is movement in the frame.

The problem with this is that it’s not sensitive enough. Slight movements are massively underrepresented in this algorithm. While you could set the movement threshold to very low, the natural artifacts in the camera would also cause movement to register. I wasn’t able to find a configuration of values that would sensitize it enough to overcome that.

Lower-Bounded Iterative Delta

This method seems to work well enough for what I want it to do, and I’m happy with the results. The basic idea is that we have two images, and we iterate through each pixel and compare the RGB channel values to the older image and find the absolute value of the difference. If the change is greater than some threshold in one or more channels, increment a counter outside the loop to indicate that we have found a changed pixel. The threshold is important here because we are able to adjust the lower bound to ignore artifacts, which tend to have very small changes.

The last step is comparing the total number of changed pixels to the total number of pixels. If it is over a certain surprisingly small coefficient, we have detected movement.

The problem with this algorithm is that it requires the Pi to iterate through every single pixel. I was getting one frame every 5 seconds or so, which is quite slow, and the lag is enough to completely miss any movement if it happened between frames. The solution was to implement a step in the loop so that we only sample every x number of pixels. This is a balancing act; if your step is too small, we still have the speed issue, but if the step is too large, we can miss smaller actions that we should definitely treat as movement. Smaller items become less represented the larger the step. This step should obviously scale to the size of the image–since my camera is fairly low resolution at 760×480, a step of 5 was a good compromise, but if you’re running 1080p video through the algorithm then a larger step size is appropriate.


If you wanted to go further–and I will be doing so in the future when I come back to optimizing this–you could cache the results of the past x number of frames (say, 3) and decide to record based on the results of all of them combined. This would increase the accuracy of the decision to record and ensure we are only getting motion that matters, and not a one or two frame camera focus adjustment. This depends on your sample rate, of course; if you are sampling 10 frames per second, you’ll want to analyze more than 3 frames.


The code:

import time
import pygame
import pygame.camera
from pygame.locals import *


def shouldRecord(newImg, oldImg):
    if newImg is None or oldImg is None:
        return false
    oldImgRGB = None
    newImgRGB = None

    oldImgRGB = pygame.surfarray.array3d(oldImg)    
    newImgRGB = pygame.surfarray.array3d(newImg)

    changedPixels = long(0)
    actualChangedPixels = long(0)

    for x in range(0, len(oldImgRGB), PIXEL_SAMPLE_SKIP_SIZE):
        for y in range(0, len(oldImgRGB[0]), PIXEL_SAMPLE_SKIP_SIZE):
            r = abs(int(oldImgRGB[x][y][0]) - int(newImgRGB[x][y][0]))
            g = abs(int(oldImgRGB[x][y][1]) - int(newImgRGB[x][y][1]))
            b = abs(int(oldImgRGB[x][y][2]) - int(newImgRGB[x][y][2]))
                changedPixels += 1

            if (r > 0 or g > 0 or b > 0):
                actualChangedPixels += 1

    totalPixels = len(oldImgRGB) * len(oldImgRGB[0])
    percentChanged = float(float(changedPixels) / totalPixels)

    percentChangedDisplay = str(round(percentChanged * 100, 3)) + "%"
        print("Recording now!" + " Threshold percent changed: " + str(percentChangedDisplay))
        return True
    print("..." + " (Threshold percent changed: " + str(percentChangedDisplay) + ")")
    return False


camlist = pygame.camera.list_cameras()
if camlist:
    cam = pygame.camera.Camera(camlist[0])

img = cam.get_image()

width = img.get_width()
height = img.get_height()

screen = pygame.display.set_mode((width, height))
pygame.display.set_caption("USB Webcam")

oldImg = None

while True:
    for e in pygame.event.get():
        if e.type == pygame.KEYDOWN:
            if e.key == pygame.K_ESCAPE:
    oldImg = img
    img = cam.get_image()

    shouldRecord(oldImg, img)

    screen.blit(img, (0,0))

    if oldImg is None or img is None: