Tag refinement and localization in web videos

By , September 28, 2011

Collaborators: Marco Bertini, Giuseppe Serra, Alberto Del Bimbo

video-tag-localizationNowadays, almost any photo-sharing service or social network provide tagging functionalities to let users annotate their visual media. The tags are then used to retrieve the uploaded content, and to ease browsing and exploration of these collections. However, not all media are equally tagged by users: using the current browsers is easy to tag a single photo, and even tagging a part of a photo, like a face, has become quite common; on the other hand tagging a video is more complicated and time consuming, so that users just tend to tag the overall content of a video.

We present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing.

Related publications:

  • L. Ballan, M. Bertini, G. Serra, and A. Del Bimbo, "A Data-Driven Approach for Tag Refinement and Localization in Web Videos," Computer Vision and Image Understanding, vol. 140, pp. 58-67, 2015. [IF: 2.134]
    @article{cviu15,
      author = {Ballan, Lamberto and Bertini, Marco and Serra, Giuseppe and Del Bimbo, Alberto},
      title = {A Data-Driven Approach for Tag Refinement and Localization in Web Videos},
      journal = {Computer Vision and Image Understanding},
      publisher = {Elsevier},
      volume = {140},
      pages = {58--67},
      month = {Nov.},
      doi = {10.1016/j.cviu.2015.05.009},
      url = {http://arxiv.org/abs/1407.0623},
      impact = {[IF: 2.134]},
      reviews = {http://www.lambertoballan.net/downloads/2015_CVIU_REVIEWS.txt},
      cited = {http://scholar.google.com/scholar?cites=13219841874483285369},
      icon = {http://www.lambertoballan.net/images/cviu.gif},
      year = {2015}
    }
  • L. Ballan, M. Bertini, A. Del Bimbo, and G. Serra, "Enriching and Localizing Semantic Tags in Internet Videos," in Proc. of ACM International Conference on Multimedia (ACM-MM), Scottsdale, AZ, USA, 2011. (Poster, Accept Rate 35%)
    @inproceedings{mm11,
      author = {Ballan, Lamberto and Bertini, Marco and Del Bimbo, Alberto and Serra, Giuseppe},
      title = {Enriching and Localizing Semantic Tags in Internet Videos},
      booktitle = {Proc. of ACM International Conference on Multimedia (ACM-MM)},
      address = {Scottsdale, AZ, USA},
      month = {November},
      note = {(Poster, Accept Rate 35%)},
      url = {http://www.micc.unifi.it/publications/2011/BBDS11/mm11.pdf},
      poster = {http://www.lambertoballan.net/downloads/2011_mm_poster.pdf},
      data = {http://www.lambertoballan.net/research/tag-webvideos},
      cited = {http://scholar.google.com/scholar?cites=18389573881424473914},
      icon = {http://www.lambertoballan.net/images/acmmm_2011.jpg},
      year = {2011}
    }

Datasets:

  • MICC-YouTube60: this dataset is composed by 60 web-videos downloaded from YouTube in May 2010; 4 videos for each of the 15 YouTube categories are provided. For getting access to the youtube-dataset please send an e-mail requesting the download link to: lamberto.ballan@unifi.it.

If you use this dataset, please cite the paper: L. Ballan, M. Bertini, A. Del Bimbo, M. Meoni, G. Serra. “Tag suggestion and localization in user-generated videos based on social knowledge,” in Proc. of ACM Multimedia Int’l Workshop on Social Media (WSM),  Firenze, Italy, 2010.

Comments are closed

Panorama Theme by Themocracy