A Real-Time Implementation of Moving Object Action Recognition System Based on Motion Analysis

This paper proposes a PixelStreams-based FPGA implementation of a real-time system that can detect and recognize human activity using Handel-C. In the first part of our work, we propose a GUI programmed using Visual C++ to facilitate the implementation for novice users. Using this GUI, the user can program/erase the FPGA or change the parameters of different algorithms and filters. The second part of this work details the hardware implementation of a real-time video surveillance system on an FPGA, including all the stages, i.e., capture, processing, and display, using DK IDE. The targeted circuit is an XC2V1000 FPGA embedded on Agility’s RC200E board. The PixelStreams-based implementation was successfully realized and validated for real-time motion detection and recognition.


Introduction
In modern society, there is a growing need for technologies such as video surveillance and access control to detect and identify human and vehicle motion in various situations.Intelligent video surveillance attempts to assist human operators when the number of cameras exceeds the operators' capability to monitor them and alerts the operators when abnormal activity is detected.Most intelligent video surveillance systems are designed to detect and recognize human activity.It is difficult to define abnormal activity because there are many behaviors that can represent such activity.Examples include a person entering a subway channel, abandonment of a package, a car running in the opposite direction, and people fighting or rioting.However, it is possible not only to set criteria to detect abnormal activity but also to zoom in on the relevant area to facilitate the work of the operator.
In general, an intelligent video surveillance system has three major stages: detection, classification, and activity recognition [1].Over the years, various methods have been developed to deal with issues in each stage.

Related Work
Many methods for motion detection have already been proposed.They have been classified [1]- [3] into three major categories: background subtraction, [4], [5] temporal differencing [6] , [7] and optical flow [8], [9].Further, motion detection methods have been recently classified into matching methods, energy-based methods, and gradient methods.The aim of the motion detection stage is to detect regions corresponding to moving objects such as vehicles and human beings.It is usually linked to the classification stage in order to identify moving objects.There are two main types of approaches for moving object classification: [1], [2], [10] shape-based identification and motion-based classification.Different descriptions of shape information of motion regions such as representations of points, boxes, silhouettes, and blobs are available for classifying moving objects.For example, Lipton et al. [11] used the dispersedness and area of image blobs as classification metrics to classify all moving object blobs into human beings, vehicles, and clutter.Further, Ekinci et al. [12] used silhouettebased shape representation to distinguish humans from other moving objects, and the skeletonization method to recognize actions.In motion-based identification, we are more interested in detecting periodic, non-rigid, articulated human motion .For example, Ran et al. [13] examined the periodic gait of pedestrians in order to track and classify it.The final stage of surveillance involves behavior understanding and activity recognition.Various techniques for this purpose have been categorized into seven types: dynamic time warping algorithms, finite state machines, hidden Markov models, time-delay neural networks, syntactic techniques, nondeterministic finite automata, and self-organizing neural networks.Such a wide variety of techniques is attributable to the complexity of the problems and the extensive research conducted in this field.The computational complexity of these methods and the massive amount of information obtained from video streams makes it difficult to achieve real-time performance on a general-purpose CPU or DSP.There are four main architectural approaches for overcoming this challenge: application-specific integrated circuits (ASICs) and fieldprogrammable gate arrays (FPGAs), parallel computing, GPUs, and multiprocessor architectures.Evolving high-density FPGA architectures, such as those with embedded multipliers, memory blocks, and high I/O (input/output) pin counts, are ideal solutions for video processing applications [14].In the field of image and video processing, there are many FPGA implementations for motion segmentation and tracking.For example, Menezes et al. [5] used background subtraction to detect vehicles in motion, targeting Altera's Cyclone II FPGA with Quartus II software.Another similar study on road traffic detection [15] adopted the sum of absolute differences (SAD) algorithm, implemented on Agility's RC300E board using an XC2V6000 FPGA with Handel-C and the PixelStreams library of Agility's DK Design Suite.
Other methods for motion detection such as optical flow have been successfully implemented [8], [9] on an FPGA.For example, Ishii et al. [8] optimized an optical flow algorithm to process 1000 frames per second.The algorithm was implemented on a Virtex-II Pro FPGA.Many video surveillance systems have been developed for behavior change detection.For example, in the framework of ADVISOR, a video surveillance system for metro stations, a finite state machine (with scenarios) [16] is used to define suspicious behavior (jumping over a barrier, overcrowding, fighting, etc.).The W4 system [17] is a system for human activity recognition that has been implemented on parallel processors with a resolution of 320×240.This system can detect objects carried by people and track body parts using background detection and silhouettes.Bremond and Morioni [18] extracted the features of moving vehicles to detect their behaviors by setting various scenario states (toward an endpoint, stop point, change in direction, etc.); the application employs aerial grayscale images.
The objective of this study is to implement different applications of behavior change detection and moving object recognition based on motion analysis and the parameters of moving objects.Such applications include velocity change detection, direction change detection, and posture change detection.The results can be displayed in the RGB format using chains of parallelized sub-blocks.We used Handel-C and the PixelStreams library of Agility's DK Design Suite to simplify the acquisition and display stages.An RC200E board with an embedded Virtex-II XC2V1000 FPGA was employed for the implementation.

Mixed Software-Hardware Design
To make our implementation more flexible, we use the software-hardware platform approach.This approach simplifies not only the use of the hardware but also the change between soft data and hard data, especially for image processing applications that need many parameters to be changed, for example, the parameters of convolution filters and threshold levels.In our implementation, we use Handel-C for the hardware part.Handel-C is a behaviororiented programming language for FPGA hardware synthesis, and it is adapted to the codesign concept [19].
The software part is developed using Visual C++.After generating the bit file using Agility's DK Design Suite, [20] we use our software interface to load this bit file via the parallel port (with a frequency of 50 MHz) on the RC200E board in order to configure the FPGA.The algorithm parameters are transferred through this port as 8-bit data at the same frequency.For the user, these operations are hidden.The graphical user interface allows the user to configure/erase the FPGA and change the algorithm parameters.For example, in our case, we can change the threshold level according to the brightness of the scene or the velocity level according to the object in motion (human, vehicle).

. Pixel Streams Library
Before we detail and explain our algorithm and the method used to achieve our goals, we should discuss the tools used for our implementation.We used an RC200E board with an embedded XC2V1000 FPGA [21].This board has multiple video inputs (S-video, camera video, and composite video), multiple video outputs (VGA, S-video, and composite video), and two ZBT SRAMs, each with a capacity of 2 MB.The language used is Handel-C [22] and the integrated development environment (IDE) is Agility's DK5.This environment is equipped with different platform development kits (PDKs) that include the PixelStreams library [23].
The PixelStreams library is used to develop systems for image and video processing.It includes many blocks (referred to as filters) that perform primary video processing tasks such as acquisition, stream conversion, and filtering.The user has to associate these blocks carefully by indicating the type of the stream (pixel type, coordinate type, and synchronization type).Then, the user can generate the algorithm in Handel-C.Thereafter, the user has to add or modify blocks to program his/her method, and finally, he/she must merge the results.It is worth mentioning that these blocks are parameterizable, i.e., we can modify the image processing parameters, such as the size of the acquired image or the threshold.These blocks are fully optimized and parallelized.Figure 1 shows the GUI of PixelStreams.

Detection Algorithm
We choose to implement the delta frame method for three reasons: its adaptability to changes in luminance, its simplicity, and its low consumption of hardware resources.This method determines the absolute difference between two successive images, and it is executed in two stages: temporal difference and segmentation.

Temporal difference
In this stage, we determine the absolute difference between the previous frame and the current frame as follows.

t t t t dI x y x y x y I x y I x y dt
is the difference between It(x,y) (i.e., the intensity of pixel (x,y) at moment t) and It-1(x,y) (i.e., the intensity of pixel (x,y) at moment t-1).

Segmentation
In this stage, significant temporal changes are detected by means of thresholding: This operation yields a binary card that indicates zones of significant variations in brightness from one image to the other.

Feature Extraction and Behavior Change Detection
In this study, simple behavior change detection refers to motion that can be caused by abrupt movements that might represent suspicious actions.To define these actions, we use the parameters of the objects in motion, such as the center of gravity, width, and length.In general, the actions detected by this method are simple yet useful in video surveillance.For example, velocity change detection is useful for detecting a criminal who is being chased by the police or a car that exceeds the speed limit; direction change detection is useful for detecting a car that is moving in the wrong direction; and posture change detection is useful for detecting a person who bends to place or pick up an object.
Our implementation involves the following stages: acquisition of the video signal, elimination of noise from the input video signal, detection of moving regions, segmentation for separating the moving objects, extraction of the object parameters, classification of the moving objects, and determining whether movements are suspicious.

Velocity change detection
We can detect suspicious behavior of a person from his/her gait as well as his/her change in velocity near sensitive locations such as banks, airports, and shopping centers.In such cases, we can calculate the speed (in pixels/s) or acceleration (in pixels/s2) of the suspect in the image space in real time.There are several ways of representing this anomaly: the most widely adopted method in the literature is the use of a bounding box (a rectangle around the suspect).
It is easy to calculate the speed of a moving object.As soon as the speed or acceleration of the object exceeds a certain threshold of normality (predetermined experimentally or on the basis of statistical studies), a bounding box appears around the suspect.However, the issue that needs to be addressed is the calculation of the speed in realtime circuits owing to the absence of mathematical functions (such as square root), types of data (integer or real values), and the object parameters on which we base our calculation.
In general, the speed and acceleration are calculated as follows:

Direction change detection
Changes in direction or motion in the wrong direction can represent abnormal behavior depending on the situation.For example, roaming around a building or car can be considered as an abrupt cyclic change in direction, possibly indicating the intention of burglary or car theft.Other examples include detection of a car that is moving in the wrong direction or a person who is moving in the opposite direction of a queue at an exit gate or exit corridor in an airport.To determine the direction, we select parameters that distinguish the object of interest, such as its center of gravity, width, and length.In general, the co-ordinates of the center of gravity can be used to determine whether the object has changed its direction, i.e., whether it has moved rightward or leftward depending on the position of the camera.
The change in direction along the x-axis is given by     gg 0 The object did not change direction x t x t dt (5) 0 The object changed direction The change in direction along the y-axis is given by     gg 0 The object did not change direction y t y t dt (6) 0 The object changed direction These techniques, which are based on the object parameters, can be improved by integrating them with advanced models such as finite state machines (FSMs).

Hardware Implementation
Figure 2 shows the general outline of our FPGA implementation.\This general outline consists of four blocks: an acquisition block, an analysis block, a display block, and an intermediate block between the display block and the analysis block.

Acquisition Block
Acquisition is achieved using the standard camera associated with the RC200E board.The video input processor, Philips SAA7113H, acquires the frames in the PAL format at a rate of 25 fps.The pixels are in the YCbCr format.Using the PixelStreams library of Agility's DK Design Suite, we split the input video signal into two identical streams (see Figure 3).The first stream is fed to the display block and it is converted into the RGB format to display the results on a VGA display.The second stream is fed to the analysis block and it is converted into the grayscale format to reduce (by one-third) the amount of data to be processed.
We can choose to perform the conversion into the RGB format before splitting the input signal and then convert the second stream for the analysis block into the grayscale format.However, this method is not preferable because the conversion from the YCbCr format to the RGB format is approximate.Moreover, in the conversion from the YCbCr format to the grayscale format, the brightness is simply represented by the Y component of the YCbCr format.Further, it is not preferable to approximate the input stream that is fed to the analysis block.

Analysis Block
The analysis block consists of several stages.In the first stage, we use inter-image subtraction (delta frames) and apply thresholding to detect moving regions.
To obtain the delta frames, we start by splitting the video signal in three channels (see Figure 4).The first and second channels are used to save the acquired image, creating a delay cell.The image I(t-1) is recorded in the memory.The third channel is used to acquire the actual frame at moment t.Then, the two image streams are synchronized and fed to the subtraction block.The subtraction block is a modified block that takes the absolute result of subtraction and compares it with a threshold.This function is realized using a macro.The threshold value Th is fixed according to the luminosity of the scene.The second stage of the analysis block involves statistical analysis.In this stage, we search for the min and max values along the x-and y-axes of the mobile regions (Figure 5).In general, this stage must be preceded by a filter for noise reduction.We employed a morphological filter (e.g., alternating sequential filter, opening/closing filter) using the PixelStreams library.After calculating the min and max values along the two axes, we determine the center of gravity of the detected object.We calculate the sum of the pixel co-ordinates that have nonzero values along the x-and y-axes, and we divide these coordinate values by their sum.However, for our implementation, it is better to avoid this division.Therefore, we use the direct method.We subtract the max from the min and divide the result by 2. Division by 2 is achieved by a simple bit shift (right shift).Once the values minX, MaxX, minY, MaxY, and XG, YG are obtained, we copy these values into the behavior change detection block.Then, we reset these values to zero.

Behavior Change Detection Block
As stated in the previous section, the analysis block provides the behavior change detection block with the parameters of the moving objects.In this stage, we save the values extracted from the first delta frame (xg(t-1), yg(t-1), minx(t-1), MaxX(t-1), miny(t-1), MaxY(t-1)), and from the second delta frame, we obtain the current values xg(t), yg(t), minx(t), MaxX(t), miny(t), and MaxY(t).From these latter results, we can calculate the width and length of the moving object to classify the object as human, vehicle, or others, as in our previous work [24].Using the values extracted in two different instants (t-1, t), we define the changes in behavior.
For velocity change detection, the speed and acceleration are calculated using the two equations presented in Sec.4.4.1.However, we simplify these equations by calculating the absolute differences between two moments (the previous and current values).If the absolute difference exceeds a certain threshold Vth, we assume that the velocity has changed, and we copy the values of the center of gravity in the display block in order to draw a rectangle around the object.Then, the current values are saved as previous values.
Consider a practical problem that involves the values of the center of gravity.In our algorithm, we need to reset all the variables to zero.Consequently, the coordinates of the center of gravity will be zero.If an object enters the scene, the coordinates of the center of gravity change from 0 to xg, yg, and this will cause false detection.
To overcome this problem, we have to ensure that the object has entered the scene entirely.For this purpose, we set a condition on the coordinates of the bounding box for two consecutive instants; if this condition is met (|minXt1 -minXt2 | > S AND |MaxXt1 -MaxXt2 | > S), we can guarantee that the object has entered the scene entirely, either from the right or from the left.Direction change detection: To implement this application, we follow the same stages as those used in velocity change detection, except that the condition changes.We use the same parameters, minX, MaxX, minY, and MaxY, in order to avoid the center of gravity problem.We calculate the difference between minX1 and minX2, and MaxX1 and MaxX2.If there is a change in sign, we assume that the object has changed its direction.Otherwise, we assume that the object has not changed its direction.
We can easily determine the direction of motion of an object by applying the same concept as that described above.However, in this case, it is impractical to compare the differences between the previous values and the current values with zero because the presence of a small or non-significant movement (such as that of the arms) can cause false detection.Therefore, to overcome this problem, we compare the difference with a threshold Thd, which should not be very large.Then, the values minX, MaxX, minY, and MaxY are copied to the block that draws the bounding box.
We use two blocks for detection in two directions (a different color for each direction of motion).In order to minimize resource consumption, we used only one block for drawing the bounding box by changing the parameters of entry in our macro.In this macro, we added a parameter that changes the color according to the direction of detected motion (Figure 8).
Posture change detection: We are interested in such an application to detect a person who leans (bends) to place or pick up something, especially in sensitive locations (e.g.,the subways).In this case, we are interested in movements along the y-axis of the image (up/down motion), and we use the same architecture as that used in velocity change detection.We calculate the difference between the previous and current values of miny(t-1), MaxY(t-1), miny(t), MaxY(t).
If the difference between the previous and current values is positive, we assume that the person leans, and we copy the values minX, MaxX, minY, and MaxY to the block that draws the bounding box and fix the color parameter of the rectangle.We can add a warning message using the PxsConsole filter of PixelStreams.In the opposite case, we assume that the person rises, and we copy the values to the block that draws the rectangle, which uses a different color in this case.As in the case of direction change detection, it is better to use a threshold Thp to reduce the occurrence of false detection due to small movements along the y-axis.For such detections, we require a camera whose front sight faces the scene.Motion analysis: Here, we tried to collect all the above-mentioned behaviors using a single program in order to practically validate the system.To minimize resource consumption, we considered our problem as a finite state machine with several scenarios.The thresholds of detection for each case were used to define and manage these various scenarios.The differences between the values of minX, MaxX, minY, and MaxY at moments t and t-1 are denoted by ∆minx, ∆MaxX, ∆miny, and ∆MaxY, respectively.
In the first state, all the values are initialized (State 0); they represent the initial state of each new inter-image difference.In the second state (State 1), if the absolute values of ∆minx and ∆MaxX are higher than VTh, we assume that the velocity changes and we return to the initial state after copying the values of the block to the bounding box filter.In the opposite case, we go to the third state (State 2) and compare ∆minx and ∆MaxX with the threshold Thd.According to the result of this comparison, we assume that a leftward or rightward movement has occurred.Then, we return to the initial state.Starting from this state, if the moving object accelerates, we return to the second state of velocity change.For posture change detection, the condition is related to the values of ∆miny and ∆MaxY (State 3).We can detect this behavior from any state (e.g., a person runs and leans to collect something).The following figure summarizes these states and the possible scenarios.

Display Block
In this block, we call the macro PxsAnalyseAwaitUpdate, which allows us to pause the display until an update occurs in the analysis block.We obtain the values minX, MaxX, minY, and MaxY; if there is a motion, we copy these values to the bounding box filter to draw the rectangle.The values of the center of gravity, Xg,Yg, are also copied to the PxsCursor filter in order to draw a cross at the center of the moving object.We can add a warning message, e.g., "Warning: velocity change detection", by using the PxsConsole filter of the PixelStreams library.Finally, the results are displayed in the RGB format on a VGA display.

Experimental Results
An RC200E board with an embedded Virtex-II XC2V1000 FPGA was used for our implementation.The language used was Handel-C.The results for each behavior are summarized in Tables 2-5: Table 2. Resource consumption and maximum frequency of implementation for velocity change detection.
Table 3. Resource consumption and maximum frequency of implementation for direction change detection.
Table 4. Resource consumption and maximum frequency of implementation for up/down motion detection.These tables specify the resource consumption and maximal frequency of each implemented detection case for PAL video with a resolution of 720×576.
In all these implementations, the results show that the two main constraints, i.e., the resource limit of our FPGA and the real-time aspect (40 ms/image), are well respected.We note that the consumption of the CLB blocks increases in the case of detection of multiple objects; this is caused by the algorithm used to identify the number of objects in the scene.We also note that the algorithm for motion analysis that collects all the previous behaviors can be implemented on our FPGA in real time, but it consumes nearly all of the CLB resources (88%).
The following figures show the results of all these implementations.Each behavior is represented by a different color, and a warning message is added below the scenes.Figure 11 shows the results of velocity change detection in the case of two objects.As soon as the objects start running, a rectangle appears.We note that in the case of occlusion, the algorithm considers both objects as a single object.After the objects separate, two rectangles with different colors appear on them.
Figure 12 shows the results of direction change detection.Figure 15 shows our graphical user interface (GUI), which is divided into four sections.Three of these sections are used to detect just one simple behavior each, whereas the fourth section detects all the behaviors.Using this GUI, we can send the bit-file for configuring or erasing our FPGA, or directly changing the filter parameters without the need to use the IDE.

Conclusions
We presented a mixed software-hardware approach that simplifies the use of the hardware part by enabling us to communicate with it using the graphical interface.In addition, it simplifies the choice of the algorithm to be implemented and modifies the parameters of this algorithm.We adopted the proposed approach for object detection and behavior recognition based on motion analysis and sudden movements.We exploited the hardware part, which offers the possibility of handling large amounts of data and performing calculations for image processing via parallel processing, guaranteed by the use of the PixelStreams library of Agility's DK Design Suite.Further, we tried to improve our architecture by collecting all the different behaviors using a single program.In addition, we added warning messages using the PxsConsole filter.Thus, we successfully implemented different algorithms that can recognize objects in motion and detect changes in velocity, direction, and posture in real time.The results showed that our approach achieves good recognition and detection of these behaviors, especially in indoor areas.However, in outdoor areas, the results are less promising owing to the simple motion detection algorithm used; this problem is aggravated by occlusion due to overlapping movements of different persons.Therefore, in the future, we will try to use multiple cameras (stereoscopic, Kinect) with improved motion detection and learning methods to detect behavior changes in crowded environments.

Figure 2 .
Figure 2. General outline of behavior change detection.

Figure 5 .
Figure 5. Inter-image difference and calculation of min and max values.

Figure 6 .
Figure 6.False velocity change detection (the object enters the scene).

Table 1 . 51 Figure 7
Figure7shows this implementation and represents all the stages realized.

Figure 7 .
Figure 7. Hardware architecture for velocity change detection.

Figure 8 .
Figure 8. Hardware architecture for direction change detection.

Figure 10 .
Figure 10.Results of velocity change detection in the case of one object.

Figure 10 55 Figure 11 .
Figure10shows the results of velocity change detection in the case of one object.In Figure10(a), as soon as the object decreases its speed, the rectangle disappears.In Figure10(b), as soon as the object starts to run, a rectangle appears around it.
Figure11shows the results of velocity change detection in the case of two objects.As soon as the objects start running, a rectangle appears.We note that in the case of occlusion, the algorithm considers both objects as a single object.After the objects separate, two rectangles with different colors appear on them.Figure12shows the results of direction change detection.Figure 12(a) shows direction detection for two directions: right to left movement, represented by the blue rectangle, and left to right movement, represented by the red rectangle.The figures also show warning messages below the images.Figure 12(b) shows the results of direction change detection in one direction for two objects.

Figure 13 .
Figure 13.Posture change detection: a) for one object, b) for two objects.

Figure 14
Figure14shows the results of collecting all the behaviors using a single program.Motion to the right and left are represented by red and blue rectangles, respectively.Further, up/down and down/up motion are represented by turquoise and yellow rectangles, respectively.Finally, velocity change is represented by a black rectangle.In every case, a warning message is displayed.

Table 5 .
Resource consumption and maximum frequency of implementation for motion analysis.