Posts tagged: bag-of-words

Lab Bag-of-Words

comments Comments Off on Lab Bag-of-Words
By , November 14, 2013

University of Florence
Course on Multimedia Databases – 2013/14 (Prof. A. Del Bimbo)
Instructors: Lamberto Ballan and Lorenzo Seidenari

Goal

The goal of this laboratory is to get basic practical experience with image classification. We will implement a system based on bag-of-visual-words image representation and will apply it to the classification of four image classes: airplanes, cars, faces, and motorbikes.

We will follow the three steps:

  1. Load pre-computed image features, construct visual dictionary, quantize features
  2. Represent images by histograms of quantized features
  3. Classify images with Nearest Neighbor / SVM classifiers

Getting started

  • Download excercises-description.pdf
  • Download lab-bow.zip (type the password given in class to uncompress the file) including the Matlab code
  • Download 4_ObjectCategories.zip including images and precomputed SIFT features; uncompress this file in lab-bow/img
  • Download 15_ObjectCategories.zip including images and precomputed SIFT features; uncompress this file in lab-bow/img
  • Start Matlab in the directory  lab-bow/matlab and run exercises.m

ICIAP 2013 Tutorial: Hands on Advanced Bag-of-Words Models for Visual Recognition

comments Comments Off on ICIAP 2013 Tutorial: Hands on Advanced Bag-of-Words Models for Visual Recognition
By , September 7, 2013

ICIAP2013Lorenzo Seidenari and I will give a tutorial named “Hands on Advanced Bag-of-Words Models for Visual Recognition” at the forthcoming ICIAP 2013 conference (September 9, Naples, Italy). All materials (slides, Matlab code, etc.) and more details can be found on this webpage.

Generative & discriminative models for classifying social images on the MICC-Flickr101 dataset

comments Comments Off on Generative & discriminative models for classifying social images on the MICC-Flickr101 dataset
By , June 17, 2012

The MICC-Flickr101 datasetOur paper “Combining Generative and Discriminative Models for Classifying Social Images from 101 Object Categories” has been accepted at ICPR’12. We use a hybrid generative-discriminative approach (LDA + SVM with non-linear kernels) over several visual descriptors (SIFT, GIST, colorSIFT).

A major contribution of our work is also the introduction of a novel dataset, called MICC-Flickr101, based on the popular Caltech 101 and collected from Flickr. We demonstrate the effectiveness and efficiency of our method testing it on both datasets, and we evaluate the impact of combining image features and tags for object recognition.

Effective Codebooks for Action Recognition in Unconstrained Videos

comments Comments Off on Effective Codebooks for Action Recognition in Unconstrained Videos
By , March 12, 2012

IEEE-TMM

Our paper entitled “Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos” by L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari and G. Serra has been accepted for publication in the IEEE Transactions on Multimedia.

Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization.

In this paper we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radius-based clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy.

Our method has obtained very competitive performance on several popular action-recognition datasets such as KTH (accuracy = 92.7%), Weizmann (accuracy = 95.4%) and Hollywood-2 (mAP = 0.451).

Human action recognition: ICIP and ICCV VOEC 2009 papers online

comments Comments Off on Human action recognition: ICIP and ICCV VOEC 2009 papers online
By , July 17, 2009

Our ICIP 2009 and ICCV VOEC 2009 papers are available online. We are working at a novel method based on an effective visual bag-of-words model and on a new spatio-temporal descriptor.

First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning (ICIP paper).

Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. For this reason we use a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates, thus obtaining a more effective codebook (ICCV VOEC paper). We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.

Panorama Theme by Themocracy