Ver registro no DEDALUS
Exportar registro bibliográfico

Metrics


Metrics:

A new multimodal deep-learning model to video scene segmentation (2018)

  • Authors:
  • USP affiliated authors: GOULARTE, RUDINEI - ICMC
  • USP Schools: ICMC
  • DOI: 10.1145/3243082.3243108
  • Subjects: MULTIMÍDIA INTERATIVA; RECUPERAÇÃO DA INFORMAÇÃO; APRENDIZADO COMPUTACIONAL; VÍDEO
  • Keywords: Scene segmentation; deep learning; RNN; CNN
  • Agências de fomento:
  • Language: Inglês
  • Imprenta:
  • Source:
  • Conference titles: Brazilian Symposium on Multimedia and the Web - WebMedia
  • Acesso online ao documento

    Online accessDOI or search this record in
    Informações sobre o DOI: 10.1145/3243082.3243108 (Fonte: oaDOI API)
    • Este periódico é de assinatura
    • Este artigo NÃO é de acesso aberto

    How to cite
    A citação é gerada automaticamente e pode não estar totalmente de acordo com as normas

    • ABNT

      TROJAHN, Tiago Henrique; KISHI, Rodrigo Mitsuo; GOULARTE, Rudinei. A new multimodal deep-learning model to video scene segmentation. Anais.. New York: ACM, 2018.Disponível em: DOI: 10.1145/3243082.3243108.
    • APA

      Trojahn, T. H., Kishi, R. M., & Goularte, R. (2018). A new multimodal deep-learning model to video scene segmentation. In Proceedings. New York: ACM. doi:10.1145/3243082.3243108
    • NLM

      Trojahn TH, Kishi RM, Goularte R. A new multimodal deep-learning model to video scene segmentation [Internet]. Proceedings. 2018 ;Available from: http://dx.doi.org/10.1145/3243082.3243109
    • Vancouver

      Trojahn TH, Kishi RM, Goularte R. A new multimodal deep-learning model to video scene segmentation [Internet]. Proceedings. 2018 ;Available from: http://dx.doi.org/10.1145/3243082.3243109

    Referências citadas na obra
    G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6 (jun 2005), 734--749. https://doi.org/10.1109/TKDE.2005.99
    Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 6 (nov 2010), 345--379. https://doi.org/10.1007/s00530-010-0182-0
    Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. A Deep Siamese Network for Scene Detection in Broadcast Videos. In Proceedings of the 23rd ACM International Conference on Multimedia (MM '15). ACM, New York, NY, USA, 1199--1202. https://doi.org/10.1145/2733373.2806316
    Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2015. Measuring Scene Detection Performance. In Pattern Recognition and Image Analysis, Roberto Paredes, Jaime S. Cardoso, and Xosé M. Pardo (Eds.). Lecture Notes in Computer Science, Vol. 9117. Springer International Publishing, Cham, 395--403. https://doi.org/10.1007/978-3-319-19390-8_45
    Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2017. Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks. IEEE Transactions on Multimedia 19, 5 (may 2017), 955--968. https://doi.org/10.1109/TMM.2016.2644872
    Tamires T. S. Barbieri, Tiago H. Trojahn, Moacir P. Ponti-Jr, and Rudinei Goularte. 2015. Shot-HR: A Video Shot Representation Method Based on Visual Features. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC '15). ACM, New York, NY, USA, 1257--1262. https://doi.org/10.1145/2695664.2695841
    Yoshua Bengio. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures (2 ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 437--478. https://doi.org/10.1007/978-3-642-35289-8_26
    R M Bolle, B L Yeo, and M M Yeung. 1998. Video query: Research directions. IBM Journal of Research and Development 42, 2 (mar 1998), 233--252. https://doi.org/10.1147/rd.422.0233
    Gertjan J. Burghouts and Jan Mark Geusebroek. 2009. Performance evaluation of local colour invariants. Computer Vision and Image Understanding 113, 1 (2009), 48--62. https://doi.org/10.1016/j.cviu.2008.07.003
    Vasileios Chasanis, Argyris Kalogeratos, and Aristidis Likas. 2009. Movie segmentation into scenes and chapters using locally weighted bag of visual words. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR '09). ACM, New York, NY, USA, Article 35, 7 pages. http://doi.acm.org/10.1145/1646396.1646439
    V.T. Chasanis, A.C. Likas, and N.P. Galatsanos. 2009. Scene Detection in Videos Using Shot Clustering and Sequence Alignment. IEEE Transactions on Multimedia 11, 1 (jan 2009), 89--100. https://doi.org/10.1109/TMM.2008.2008924
    Kyunghyun Cho, Bart van Merriënboer, Çaħlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 1724--1734. http://www.aclweb.org/anthology/D14-1179
    Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Deep Learning and Representation Learning Workshop. Curran Associates Inc., Red Hook, NY, USA.
    Manfred Del Fabro and Laszlo Böszörmenyi. 2013. State-of-the-art and future challenges in video scene detection: a survey. Multimedia Systems 19, 5 (oct 2013), 427--454. https://doi.org/10.1007/s00530-013-0306-4
    Nevenka Dimitrova, John Zimmerman, Angel Janevski, Lalitha Agnihotri, Norman Haas, Dongge Li, Ruud Bolle, Senem Velipasalar, Thomas Mcgeeand, and Lira Nikolovska. 2004. Personalized Digital Television: Targeting Programs to individual Viewers. Springer Netherlands, Dordrecht, NL, Chapter Media Augmentation and Personalization Through Multimedia Processing and Information Extraction, 203--233. https://doi.org/10.1007/1-4020-2164-X_8
    A. Gupta and H. Gupta. 2013. Applications of MFCC and Vector Quantization in speaker recognition. In 2013 International Conference on Intelligent Systems and Signal Processing (ISSP' 13). IEEE, Washington, DC, USA, 170--173. https://doi.org/10.1109/ISSP.2013.6526896
    A Hanjalic, R L Lagendijk, and J Biemond. 1999. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Transactions on Circuits and Systems for Video Technology 9, 4 (jun 1999), 580--588. https://doi.org/10.1109/76.767124
    K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770--778. https://doi.org/10.1109/CVPR.2016.90
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (nov 1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735
    Rodrigo Mitsuo Kishi, Tiago Henrique Trojahn, and Rudinei Goularte. 2016. An Evaluation of Readily Usable Automatic Video Shot Segmentation Techniques. In Proceedings of the 22Nd Brazilian Symposium on Multimedia and the Web (Webmedia '16). ACM, New York, NY, USA, 199--202. https://doi.org/10.1145/2976796.2988174
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing System (NIPS'12), Vol. 1. Curran Associates Inc., Red Hook, NY, USA, 1097--1105. http://dl.acm.org/citation.cfm?id=2999134.2999257
    Yijuan Lu, Nicu Sebe, Ross Hytnen, and Qi Tian. 2011. Personalization in multimedia retrieval: A survey. Multimedia Tools and Applications 51, 1 (jan 2011), 247--277. https://doi.org/10.1007/s11042-010-0621-0
    Krystian Mikolajczyk and Cordelia Schmid. 2005. Performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (oct 2005), 1615--1630. https://doi.org/10.1109/TPAMI.2005.188
    Zengchang Qin, Wei Liu, and Tao Wan. 2013. A Bag-of-Tones Model with MFCC Features for Musical Genre Classification. In Advanced Data Mining and Applications, Hiroshi Motoda, Zhaohui Wu, Longbing Cao, Osmar Zaiane, Min Yao, and Wei Wang (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 564--575. https://link.springer.com/chapter/10.1007/978-3-642-53914-5_48
    C. G. Rijsbergen. 1979. Information Retrieval (2 ed.). Butterworths, London. 224 pages.
    F. Schroff, D. Kalenichenko, and J. Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Washington, DC, USA, 815--823. https://doi.org/10.1109/CVPR.2015.7298682
    P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, and J. Kittler. 2012. Differential Edit Distance: A Metric for Scene Segmentation Evaluation. IEEE Transactions on Circuits and Systems for Video Technology 22, 6 (June 2012), 904--914. https://doi.org/10.1109/TCSVT.2011.2181231
    P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. 2011. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features. IEEE Transactions on Circuits and Systems for Video Technology 21, 8 (Aug 2011), 1163--1177. https://doi.org/10.1109/TCSVT.2011.2138830
    Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv e-prints abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556
    A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380. https://doi.org/10.1109/34.895972
    Cees G M Snoek, Marcel Worring, and Arnold W M Smeulders. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the 13th annual ACM international conference on Multimedia. ACM Press, New York, New York, USA, 399--402. https://doi.org/10.1145/1101149.1101236
    Jeroen Vendrig and Marcel Worring. 2002. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia 4, 4 (Dec. 2002), 492--499. https://doi.org/10.1109/TMM.2002.802021
    Thomas Wiatowski and Helmut Bolcskei. 2018. A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction. IEEE Transactions on Information Theory 64, 3 (mar 2018), 1845--1866. https://doi.org/10.1109/TIT.2017.2776228
    Shaofei Wu and Maozhu Jin. 2015. Study on a New Video Scene Segmentation Algorithm. Applied Mathematics & Information Sciences 9, 1 (2015), 361--368. https://doi.org/10.12785/amis/090142
    Zuxuan Wu, Yu-Gang Jiang, Xi Wang, Hao Ye, and Xiangyang Xue. 2016. Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM Press, New York, NY, USA, 791--800. https://doi.org/10.1145/2964284.2964328
    Charles Elkan Zachary Chase Lipton, John Berkowitz. 2015. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv ePrint abs/1506.00019 (2015). http://arxiv.org/abs/1506.00019