Human action categorization in unconstrained videos

By , November 20, 2008

Collaborators: Lorenzo Seidenari, Giuseppe Serra, Marco Bertini, Alberto Del Bimbo

Human action categorization (an example of walking action from Weizmann dataset)

Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.

It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radius-based clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy

Our method has obtained very competitive performance on several popular action-recognition datasets such as KTH (accuracy = 92.7%), Weizmann (accuracy = 95.4%) and Hollywood-2 (mAP = 0.451).

Related publications:

  • L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, "Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos," IEEE Transactions on Multimedia, vol. 14, iss. 4, pp. 1234-1245, 2012. [Imp Fact 2.303]
    @article{tmm12,
      author = {Ballan, Lamberto and Bertini, Marco and Del Bimbo, Alberto and Seidenari, Lorenzo and Serra, Giuseppe},
      title = {Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos},
      journal = {IEEE Transactions on Multimedia},
      publisher = {IEEE Signal Processing Society},
      volume = {14},
      number = {4},
      pages = {1234--1245},
      month = {Aug.},
      doi = {10.1109/TMM.2012.2191268},
      url = {http://www.lambertoballan.net/downloads/2012_tmm_preprint.pdf},
      data = {http://imagelab.ing.unimore.it/visor/video_videosInCategory.asp?iStartFrom=0&idcategory=20},
      impact = {[Imp Fact 2.303]},
      cited = {http://scholar.google.com/scholar?cites=7420290031458634039},
      icon = {http://www.lambertoballan.net/images/tmm12.jpg},
      year = {2012}
    }
  • L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari, and G. Serra, "Recognizing Human Actions by Fusing Spatio-temporal Appearance and Motion Descriptors," in Proc. of IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 2009. (Poster)
    @inproceedings{icip09,
      author = {Ballan, Lamberto and Bertini, Marco and Del Bimbo, Alberto and Seidenari, Lorenzo and Serra, Giuseppe},
      Title = {Recognizing Human Actions by Fusing Spatio-temporal Appearance and Motion Descriptors},
      Booktitle = {Proc. of IEEE International Conference on Image Processing (ICIP)},
      Address = {Cairo, Egypt},
      Month = {November},
      Note = {(Poster)},
      Url = {http://www.micc.unifi.it/publications/2009/BBDSS09/bbdss-icip09.pdf},
      Cited = {http://scholar.google.com/scholar?cites=1110163141728059540},
      icon = {http://www.lambertoballan.net/images/icip_2009.jpg},
      Year = {2009}
    }

Comments are closed

Panorama Theme by Themocracy