Ver registro no DEDALUS
Exportar registro bibliográfico

Metrics


Metrics:

Correlation based feature fusion for the temporal video scene segmentation task (2018)

  • Authors:
  • USP affiliated authors: GOULARTE, RUDINEI - ICMC
  • USP Schools: ICMC
  • DOI: 10.1007/s11042-018-6959-4
  • Subjects: MULTIMÍDIA INTERATIVA; RECUPERAÇÃO DA INFORMAÇÃO; VÍDEO
  • Keywords: Temporal scene segmentation; Early fusion
  • Agências de fomento:
  • Language: Inglês
  • Imprenta:
  • Source:
  • Acesso online ao documento

    Online accessDOI or search this record in
    Informações sobre o DOI: 10.1007/s11042-018-6959-4 (Fonte: oaDOI API)
    • Este periódico é de assinatura
    • Este artigo NÃO é de acesso aberto

    How to cite
    A citação é gerada automaticamente e pode não estar totalmente de acordo com as normas

    • ABNT

      KISHI, Rodrigo Mitsuo; TROJAHN, Tiago Henrique; GOULARTE, Rudinei. Correlation based feature fusion for the temporal video scene segmentation task. Multimedia Tools and Applications, Amsterdam, Springer, 2018. Disponível em: < http://dx.doi.org/10.1007/s11042-018-6959-4 > DOI: 10.1007/s11042-018-6959-4.
    • APA

      Kishi, R. M., Trojahn, T. H., & Goularte, R. (2018). Correlation based feature fusion for the temporal video scene segmentation task. Multimedia Tools and Applications. doi:10.1007/s11042-018-6959-4
    • NLM

      Kishi RM, Trojahn TH, Goularte R. Correlation based feature fusion for the temporal video scene segmentation task [Internet]. Multimedia Tools and Applications. 2018 ;Available from: http://dx.doi.org/10.1007/s11042-018-6959-4
    • Vancouver

      Kishi RM, Trojahn TH, Goularte R. Correlation based feature fusion for the temporal video scene segmentation task [Internet]. Multimedia Tools and Applications. 2018 ;Available from: http://dx.doi.org/10.1007/s11042-018-6959-4

    Referências citadas na obra
    Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16(6):345–379. https://doi.org/10.1007/s00530-010-0182-0
    Baraldi L, Grana C, Cucchiara R (2015) A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM international conference on multimedia, MM ’15, pp 1199–1202. ACM, New York. https://doi.org/10.1145/2733373.2806316
    Baraldi L, Grana C, Cucchiara R (2015) Measuring scene detection performance, pp 395–403, Springer International Publishing, Cham
    BBC: Planet earth. http://www.bbc.co.uk/programmes/b006mywy (2006). [Online; accessed 25-may-2018]
    Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Signature verification using a “siamese” time delay neural network. In: Proceedings of the 6th international conference on neural information processing systems, NIPS’93, pp 737–744. Morgan Kaufmann Publishers Inc., San Francisco. http://dl.acm.org/citation.cfm?id=2987189.2987282
    Chasanis V, Kalogeratos A, Likas A (2009) Movie segmentation into scenes and chapters using locally weighted bag of visual words. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’09, pp 35:1–35:7. https://doi.org/10.1145/1646396.1646439. ACM, New York
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
    Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech and signal processing, pp 357–366
    Del Fabro M, Böszörmenyi L (2013) State-of-the-art and future challenges in video scene detection: a survey. Multimedia Syst 19(5):427–454. https://doi.org/10.1007/s00530-013-0306-4
    Ellouze M, Boujemaa N, Alimi AM (2010) Scene pathfinder: unsupervised clustering techniques for movie scenes extraction. Multimedia Tools Appl 47(2):325–346. https://doi.org/10.1007/s11042-009-0325-5
    Gao G, Ma H (2012) Multi-modality movie scene detection using kernel canonical correlation analysis. In: 2012 21st International Conference on Pattern recognition (ICPR), pp 3074–3077
    Gauch JM, Gauch S, Bouix S, Zhu X (1999) Real time video scene detection and classification. Inf Process Manag 35(3):381–400
    Haghighat M, Abdel-Mottaleb M, Alhalabi W (2016) Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition. IEEE Trans Inf Forensic Secur 11(9):1984–1996. https://doi.org/10.1109/TIFS.2016.2569061
    Han B, Wu W (2011) Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In: 2011 IEEE International conference on multimedia and expo, pp 1–6. https://doi.org/10.1109/ICME.2011.6012001
    Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664. https://doi.org/10.1162/0899766042321814
    Hare J, Samangooei S, Dupplaw D (2011) Openimaj and imageterrier: Java libraries and tools for scalable multimedia analysis and indexing of images. In: ACM Multimedia 2011, pp 691–694. ACM. Event Dates: 28/11/2011 until 1/12/2011. http://eprints.soton.ac.uk/273040/
    Jhuo IH, Ye G, Gao S, Liu D, Jiang YG, Lee DT, Chang SF (2014) Discovering joint audio–visual codewords for video event detection. Mach Vis Appl 25 (1):33–47
    Kender JR, Yeo BL (1998) Video scene segmentation via continuous video coherence. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, CVPR ’98, pp 367–. IEEE Computer Society, Washington, DC, USA
    Koprinska I, Carrato S (2001) Temporal video segmentation: a survey. In: Signal processing: image communication, pp 477–500
    Kurcius JJ, Breckon TP (2014) Using compressed audio-visual words for multi-modal scene classification. In: 2014 International workshop on computational intelligence for multimedia understanding (IWCIM), pp 1–5. https://doi.org/10.1109/IWCIM.2014.7008808
    LeCun Y, Bengio Y (1998) The handbook of brain theory and neural networks. MIT Press, Cambridge. http://dl.acm.org/citation.cfm?id=303568.303704
    Lloyd SP (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28:129–137
    Lopes BL, Trojahn TH, Goularte R (2014) Video scene detection by multimodal bag of features. J Inf Data Manag 5(2):194
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems - Volume 2, NIPS’13, pp 3111–3119. Curran Associates Inc., USA. http://dl.acm.org/citation.cfm?id=2999792.2999959
    Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-hall, inc., upper saddle river, NJ USA
    Rao KS, Koolagudi SG (2012) Emotion recognition using speech features. Springer Publishing Company, Incorporated, New York
    Rasheed Z, Shah M (2003) Scene detection in hollywood movies and tv shows. In: Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. vol 2, pp II–343–8 vol 2. https://doi.org/10.1109/CVPR.2003.1211489
    Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. In: Kaski S, Corander J (eds) Proceedings of the seventeenth international conference on artificial intelligence and statistics, Proceedings of machine learning research, vol 33, pp 823-831. PMLR, Reykjavik, Iceland
    Saraceno C, Leonardi R (1997) Audio as a support to scene change detection and characterization of video sequences. In: 1997 IEEE international conference on acoustics, speech, and signal processing, 1997. ICASSP-97. vol 4, pp 2597–2600 vol 4. https://doi.org/10.1109/ICASSP.1997.595320
    Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Bugalho M, Trancoso I (2011) Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans Cir Sys Video Technol 21(8):1163–1177. https://doi.org/10.1109/TCSVT.2011.2138830
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380. https://doi.org/10.1109/34.895972
    Snoek CGM, Worring M (2002) A review on multimodal video indexing. In: Proceedings of the 2002 IEEE International Conference on Multimedia and expo, 2002. ICME ’02. vol 2, pp 21–24 vol 2. https://doi.org/10.1109/ICME.2002.1035364
    Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32. https://doi.org/10.1007/BF00130487
    Vendrig J, Worring M (2002) Systematic evaluation of logical story unit segmentation. IEEE Trans Multimedia 4(4):492–499. https://doi.org/10.1109/TMM.2002.802021
    Wang X, Gao L, Song J, Shen H (2017) Beyond frame-level cnn: saliency-aware 3-d cnn with lstm for video action recognition. IEEE Sig Process Lett 24(4):510–514. https://doi.org/10.1109/LSP.2016.2611485
    Wang X, Gao L, Song J, Zhen X, Sebe N, Shen HT (2018) Deep appearance and motion learning for egocentric activity recognition. Neurocomputing 275:438–447. https://doi.org/10.1016/j.neucom.2017.08.063. http://www.sciencedirect.com/science/article/pii/S0925231217314935
    Wang X, Gao L, Wang P, Sun X, Liu X (2018) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimedia 20(3):634–644. https://doi.org/10.1109/TMM.2017.2749159
    Wu S, Jin M (2015) Study on a new video scene segmentation algorithm. Appl Math Inf Sci 9 (1):361–368. https://doi.org/10.12785/amis/090142. https://www.scopus.com/inward/record.uri?eid=2-s2.0-84907246427&partnerID=40&md5=dd07505c1071cd1603e5206c25e41311. Cited By 0
    Xi W, Fox EA, Fan W, Zhang B, Chen Z, Yan J, Zhuang D (2005) Simfusion: Measuring similarity using unified relationship matrix. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’05, pp 130–137. ACM, New York. https://doi.org/10.1145/1076034.1076059
    Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Proceedings of the 26th international joint conference on artificial intelligence, IJCAI’17, pp 3133–3139. AAAI Press. http://dl.acm.org/citation.cfm?id=3172077.3172326
    Xie L, Shen J, Zhu L (2016) Online cross-modal hashing for web image retrieval. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, pp 294–300. AAAI Press. http://dl.acm.org/citation.cfm?id=3015812.3015855
    Xu S, Feng B, Ding P, Xu B (2012) Graph-based multi-modal scene detection for movie and teleplay. In: 2012 IEEE International Conference On Acoustics, Speech and Signal Processing (ICASSP), pp 1413–1416. https://doi.org/10.1109/ICASSP.2012.6288155
    Xu S, Feng B, Xu B (2013) Temporal video segmentation to scene based on conditional random fileds. In: Li S, El Saddik A, Wang M, Mei T, Sebe N, Yan S, Hong R, Gurrin C (eds) 2013 Proceedings of the 19th international conference on advances in multimedia modeling, MMM 2013, Huangshan, China, January 7-9, Part II, pp 374–384. Springer, Berlin. https://doi.org/10.1007/978-3-642-35728-2_36
    Yeung M, Yeo BL, Liu B (1998) Segmentation of video by clustering and graph analysis. Comput. Vis. Image Underst 71(1):94–109. https://doi.org/10.1006/cviu.1997.0628
    Yu SX, Shi J (2001) Grouping with bias. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01, pp 1327–1334. http://dl.acm.org/citation.cfm?id=2980539.2980711. MIT Press, Cambridge
    Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern 47(11):3941–3954. https://doi.org/10.1109/TCYB.2016.2591068