4th GNY Area Multimedia and Vision Meeting

Friday, October 3rd, 2014

NYU Polytechnic Engineering School, Brooklyn, New York

The Fourth Multimedia and Vision Day will bring together multimedia and computer vision researchers and students from both academic and industrial research institutions in the Greater NY area. It is a forum that features technical talks from invited speakers, poster presentations by researchers and students, as well as open discussions among participants. It is aimed to provide a regular forum for researchers, practitioners and students to exchange ideas, presenting current work, addressing latest topics, and sharing information in the broad areas of multimedia analytics, search and management, as well as machine learning, pattern recognition and computer vision. It is jointly sponsored by IBM T. J. Watson Research Center and the Polytechnic Engineering School of NYU . The meeting will be hosted by the Polytechnic Engineering School of NYU on Friday, October 3rd, 2014.

Location: Polytechnic Engineering School of NYU, Brooklyn, New York

Date: Friday, October 3rd, 2014, 9:00am - 5:00pm

Follow us on









9:00-9:10 AM: Welcome and Chair’s opening remarks

9:10-11:30 AM: Invited Presentations (AM session)


Terrence Chen, Siemens Corporate Research

Computer assisted intervention by robust image-based tracking

Abstract:Tracking of instruments and devices relative to the patient anatomy is an important topic in computer assisted interventions. To meet the high demand of accuracy and robustness in daily clinical applications, hardware-based mapping systems are often adopted. The cost and the complicated system integration processes, however, have limited their usage to only selected medical centers. In recent years, image-based tracking methods have become popular and show the potentials to provide significant benefits to clinics. Despite its popularity, development of robust and practical solutions progresses slowly mainly due to the difficulty in data acquisition. Insufficient validation also renders the usage of such systems doubtful. This presentation will discuss the challenges of tracking in interventions and will demonstrate several robust algorithms developed and validated for practical clinical solutions.


John R. Smith, IBM T. J. Watson Research Center

Semantics of Visual Discrimination

Abstract:The multimedia and vision community is making great progress on image recognition using data-driven machine learning techniques. The recent ImageNet results are an impressive example where convolutional neural nets are achieving a breakthrough in performance. However, given that image recognition capabilities are starting to work at a more significant scale, it is time to focus on more meaningful formulation of visual discrimination tasks. What this requires is more careful design and modeling of the visual semantic space that supports multiple facets of visual content description across semantic concepts of people, objects, scenes, actions, activities, events, etc. We highlight a number of semantic modeling issues related to inheritance, mutual exclusivity and completeness across the visual semantic space and discuss their important role for effective visual discriminative learning. We demonstrate sample results that explore different semantic modeling techniques and study their effect on visual discrimination in different image and video data domains.


Coffee Break

Sponsored by NYC Media Lab


Ajay Divakaran, SRI International

Comprehensive Human State Modeling and Its Applications

Abstract: We present a suite of multimodal techniques for assessment of human behavior with cameras and microphones. These techniques drive the sensing module of an interactive simulation trainer in which the trainee has lifelike interaction with a virtual character so as to learn social interaction. We recognize facial expressions, gaze behaviors, gestures, postures, speech and paralinguistics in real-time and transmit the results to the simulation environment which reacts to the trainee's behavior in a manner that serves the overall pedagogical purpose. We will describe the techniques developed and results, comparable to or better than the state of the art ,obtained for each of the behavioral cues, as well as identify avenues for further research. Behavior sensing in social interactions poses a few key challenges for each of the cues including the large number of possible behaviors, the high variability in execution of the same behavior within and across individuals and real-time execution. Furthermore, we have the challenge of appropriate fusion of the multimodal cues so as to arrive at a comprehensive assessment of the behavior at multiple time scales. We will also discuss our approach to social interaction modeling using our sensing capability to monitor and model dyadic interactions. We will present a video of the demonstration of the end to end simulation trainer.


Alejandro Jaimes , Yahoo Research

Computer Vision at Yahoo: Challenges and Opportunities at Web Scale

Abstract: Everybody loves images and video- most of us watch video everyday (it’s a daily habit!) and some of us produce lots of it. At Yahoo, video is critical, not just because our users love it, but also because many of our products are highly visual (Screen, Magazines, News, Tumblr, Flickr, Homepage, and many others). Industry has taken note of what video means to consumers and digital video ad spending is expected to double to $12.7 Billion by 2018. In this talk, I will go over some of our most important projects on video (and images)- from automatically selecting thumbnails, to recommendations and automatic labeling based on video content alone, among others. I will describe the technical challenges and how we’re building a world-class technology stack that impacts our products and users, highlighting the technical challenges ahead and our opportunities. I’ll place special emphasis on discussing how academic research makes it into our products, what strategies we use to work with product partners, and how we’re able to balance between product impact and academic contributions.


Announcement of NYC Media Lab

MLB Advanced Media Corporate Member Challenge

Justin Hendrix

Dirk Van Dall

11:35AM-1:30PM: Posters and Demos Session. Lunch


Posters, Demos and Lunch.______________________

Sponsored by IBM Research

1:30-3:50 PM: Invited Presentations (PM session)

1:30-2:00 PM

Sanjiv Kumar , Google Research

Circulant Binary Embedding for High Dimensional Data

Abstract: Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. In this talk, I will describe Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from O(d^2) to O(dlogd), and the space complexity from O(d^2) to O(d) where d is the input dimensionality. We also propose a novel time-frequency alternating optimization to learn data-dependent circulant projections. Extensive experiments show that the proposed approach gives much better performance than the state-of-the-art approaches for fixed time, and provides much faster computation with no performance loss for fixed number of bits.

2:00-2:30 PM

Yann LeCun , New York University and Facebook AI Research

The unreasonable effectiveness of deep learning

Abstract:Over the last two years, deep learning systems have been rapidly deployed for a wide variety of industrial applications, including speech recognition, image and video tagging, face recognition, and various tasks in natural language processing and information retrieval. Companies such as Facebook, Google, Baidu, Microsoft, Yandex, IBM, and Yahoo! have all deployed large-scale services and products built around deep learning.
The essential advantage of deep learning approaches is the ability to train a multi-stage system from end to end. A particularly popular deep learning architecture is the convolutional network. Fed with raw inputs, convolutional nets simultaneously learn low-level features, mid-level features, high-level features, and classifiers. The success of deep learning has demonstrated the rather humbling fact that gradient descent applied to the training of large convolutional nets with tens of millions of parameters on millions of examples produces better recognition systems than the best human engineers.
I will describe some of the latest developments in deep learning and discuss some of the most promising avenues for future research.

2:30-2:50 PM

Coffee Break

Sponsored by NYC Media Lab

2:50-3:20 PM

Tsuhan Chen, Cornell University

Referring Expressions: How to point by not pointing

Abstract: Humans interact with computers in many ways. For example, one may tell a robot to pick up "the red bottle on that round table". Vice versa, a video surveillance system may tell a human guard to chase "the man in sunglasses carrying a brown suitcase". Likewise, a navigation system may tell the driver to go "50 meters beyond the restaurant on the right with a yellow awning", or to follow "that white car that's turning left at the upcoming T-interaction". In this talk, we will present our recent findings in automatic generation of these expressions, which are often called the "referring expressions." Combining techniques in computer vision and key concepts in language and psychology, we can generate efficient referring expressions that take into account the saliency of objects, uncertainty due to imperfect visual attributes, and object location and relative attributes. With crowdsourcing to collect user data and to validate our hypotheses, we show that our referring expressions are effective in referring the viewer to a specific image, or a specific object within an image. Such a multimodal approach to human-machine interaction presents exciting research opportunities.

3:20-3:50 PM

Yingli Tian, The City College, City University of New York

Computer Vision-based Assistive Technology for Blind Persons

Abstract: Abstract: Recent technology developments in computer vision, digital cameras, and portable computers make it possible to develop practical computer vision-based assistive technologies to help blind persons independently explore unfamiliar environments and improve the quality of daily life. In this talk, I will introduce the research conducted in CCNY Media Lab for applying computer vision technologies to assist people who are visually impaired including indoor navigation and wayfinding, text reading, banknote recognition, and clothing pattern recognition, etc.

Biography:Dr. Ying-Li Tian is a professor in the Department of Electrical Engineering at the City College of New York. She received her PhD from the Department of Electronic Engineering at the Chinese University of Hong Kong in 1996 and her BS and MS from TianJin University, China in 1987 and 1990. After she held an associate professor position in National Laboratory of Pattern Recognition at the Chinese Academy of Sciences, Beijing, China, Dr. Tian joined the Robotics Institute in Carnegie Mellon University as a postdoctoral fellow in 1998. From 2001 to 2008, Dr. Tian was a research staff member at IBM T. J. Watson Research Center. She was one of the inventors of the IBM Smart Surveillance Solutions (SSS) product and led the video analytics team. She received several IBM Invention Achievement Awards and the IBM Outstanding Innovation Achievement Award due to her contributions to IBM SSS. Dr. Tian has published more than 150 papers in journals and conferences and holds 20 patents. Her current research focuses on a wide range of computer vision problems from assistive technology, to human identification, facial expression analysis, and video surveillance. She is an area editor for Computer Vision and Image Understanding and a senior member of IEEE.

3:50-4:00 PM


4:00-5:00 PM: Panel Discussion (Session Chair: Alexander Haubold, Google)

Theme: Future of Big Data for Multimedia and Computer Vision


David Gibbon

(AT&T Research)


Sanjiv Kumar

(Google Research)


Yinglong Xia

(IBM Research)


Nasir Memon



Tsuhan Chen

(Cornell University)


Additional panelists will be announced soon.

5:00-5:15 PM: Award Presentation and Closing Remarks


Posters and Demos

Call for Papers/Demos

Researchers in the GNY area are invited to submit posters and demos to showcase their work at the meeting.
The following awards ($500 each) will be chosen based on voting by workshop participants: Best Poster Prize, sponsored by Siemens Corporation , Best Demo Prize, sponsored by Google Research, and Best Student Poster Prize (first author must be a student), sponsored by Google Research. Details on the submission procedure can be found in the Call for Papers document here .


Poster Session

  1. Circulant Binary Embedding --- abstract --- Project page

    Felix Yu1, Sanjiv Kumar2, Yunchao Gong3 and Shih-Fu Chang1. (1 Columbia Univeristy; 2 Google Research, NY; 3 University of North Carolina at Chapel Hill)

  2. PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding --- abstract

    Yinda Zhang1, Shuran Song1, Ping Tan2 and Jianxiong Xiao1. (1 Princeton University; 2 National University of Singapore)

  3. A Framework of Changing Image Emotion Using Emotion Prediction --- abstract

    Kuan-Chuan Peng1, Kolbeinn Karlsson1, Tsuhan Chen1, Dong-Qing Zhang2 and Heather Yu2. (1 Cornell University; 2 Huawei)

  4. Nested Graph Cut for Automatic Segmentation of Nested Objects and Application to Mouse Embryo Segmentation --- abstract ---

    Jen-wei Kuo. (New York University)

  5. A New Approach to Music/Voice Separation Using Resonance-Based Signal Decomposition --- abstract --- poster

    Roozbeh Soleymani.(New York University)

  6. Recognition of Ancient Roman Coin with Alignment and Spatial Encoding --- abstract

    Jongpil Kim and Vladimir Pavlovic. (Rutgers)

  7. Pose Invariant Activity Classication for Multi-Floor Indoor Localization --- abstract

    Saehoon Yi1, Piotr Mirowski2, Tin Kam Ho3 and Vladimir Pavlovic1. (1 Rutgers; 2 Microsoft Bing; 3 IBM Watson Research Center)

  8. Detecting Affective Audio Content to Identify Propaganda Videos --- abstract ---

    Dave Chisholm, Behjat Siddiquie, Elizabeth Shriberg and Ajay Divakaran. (SRI)

  9. Highly Efficient Multimedia Event Recounting from User Semantic Preferences --- abstract

    Chun-Yu Tsai, Michelle L. Alexander, Nnenna Okwara and John R. Kender. (Columbia Univeristy)

  10. Depth Recovery with Face Prior --- abstract

    Chongyu Chen1-2, Hai Xuan Pham3, Vladimir Pavlovic3, Jianfei Cai4, and Guangming Shi2. (1 Nanyang Technological University, Singapore; 2 Xidian University, China; 3 Rutgers University; 4 Nanyang Technological University, Singapore)

  11. We are not All Equal: Personalizing Models for Facial Expression Analysis with Transductive Parameter Transfer --- abstract ---

    Gloria Zen. (University of Trento)

  12. Large-scale multimedia event detection with multiple modalities --- abstract

    Guangnan Ye1, Zhu Liu2, Yadong Mu2, Eric Zavesky2, David Gibbon2, Behzad Shahraray2. (1 Columbia University; 2 AT&T Labs)

  13. Action detection with dense trajectories and sliding window --- abstract

    Zhixin Shu, Kiwon Yun, Dimitris Samaras. (Stony Brook University)

  14. 3D Video Visualization for Event Summarization --- abstract --- poster

    Yueming Yang1, Ming-Ching Chang2, Siwei Lyu1, Peter Tu2. (1 University at Albany, SUNY; 2 GE Global Research)

  15. Gaze Behavior Analysis --- abstract

    Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky and Tamara L. Berg. (Stony Brook University, NY)

  16. Generalized Twin Gaussian Processes using Sharma-Mittal Divergence --- abstract

    Mohamed Elhoseiny and Ahmed Elgammal. (Rutgers University)

  17. Estimating a Patient Surface Model for Optimizing the Medical Scanning Workflow --- abstract

    Vivek Kumar Singh. (Siemens Corporation, Corporate Technology)

  18. A Deep Look Into Educational Videos Indexing --- abstract

    Junjie Cai1, Michele Merler2 and Sharath Pankanti2. (University of Texas at San Antonio; IBM Research)

  19. Measuring Typicality of Images and Recognizing Strangeness of Objects --- abstract

    Babak Saleh1, Ahmed Elgammal1 and Ali Farhadi2. (1 Rutgers University; 2 University of Washington)

  20. Coronary Vessel Structure Detection in 2D X-Ray Angiograms Using 3D Prior information --- abstract

    Shih-Yu Sun, Peng Wang, Shanhui Sun, and Terrence Chen (Siemens Corporation, Corporate Technology)

  21. Traffic Problem – How Long Will You React --- abstract

    An-Ti Chiang, Gregory Dobler and Yao Wang. (Polytechnic Institute of NYU)

  22. Untangling the Object-View Manifold for Visual Recognition --- abstract

    Amr Bakry and Ahmed Elgammal. (Rutgers University)

  23. Correlating Speaker Gestures in Political Debates with Audience Engagement Measured via EEG --- poster

    John R. Zhang1, Jason Sherwin1, Jacek Dmochowski2, Paul Sajda1 and John R. Kender1. (1 Columbia University; 2 Stanford University)

  24. From Low-cost Depth Sensor to CAD: Cross-domain 3D Shape Retrieval via Regression Tree Fields --- abstract

    Yan Wang1, Jie Feng1, Zhixiang Wu1, Jun Wang2 and Shih-Fu Chang1 (1 Columbia University; 2 IBM T. J. Watson Research)

  25. Social Interaction Analysis at a Distance --- abstract --- poster

    Peter Tu, Tian Tai-Peng, Ming-Ching Chang, Jixu Chen and Ting Yu (GE Global Research)

Demo Session

  1. Social Music Video Recommender-MvRock --- abstract

    Hao Ding and Yong Liu. (NYU)

  2. People Clustering in Fisheye Video --- abstract

    Yuan Yang and Yao Wang. (NYU)

  3. Semantic Grouping in Videos based on Normalized Graph Cut and Kanade–Lucas–Tomasi Tracking Trajectory --- abstract

    Chenge Li and Yao Wang. (NYU)

  4. Multimodal Integrated Behavior Analytics --- abstract ---

    Ajay Divakaran. (SRI)

  5. Enhancing Human-Machine Interaction via Gestural Understanding --- abstract - video

    Vinay Venkataraman1 and Jonathan Lenchner2. (1 Arizona State University; 2 IBM T.J. Watson)

  6. Streamloading: High-quality, Low-cost, Efficient Video Delivery for Mobile Users --- abstract - video

    Fraida Fund, S. Amir Hosseini, Shivendra S. Panwar. (NYU)



The event is free, but please register so that we could have an accurate count of participants.
Registered attendees will be provided a free lunch, sponsored by IBM Research.

Register now




Steering Committee

John R. Smith (IBM Research)

Shih-Fu Chang (Columbia University)

Tsuhan Chen (Cornell University)

Ajay Divakaran (Sarnoff SRI)

Yingli Tian CCNY, CUNY)




General Chairs

Yao Wang (New York University)

John R Kender (Columbia University)







Program Chairs

Quanfu Fan (IBM Research)

Kevin Chang (Siemens)







Poster Chair

Quanfu Fan (IBM Research)








Demo Chair

Kevin Chang (Siemens)







Local Arrangement Chair

Raquel Thompson (New York University)

rct274 at nyu.edu






Panel Chair

Alexander Haubold (Google)








Michele Merler (IBM Research)








____ ____ ____ ____ ____ ____


Previous GNY Meetings

1st GNY Area Multimedia and Vision Meeting ____ Tuesday, February 7th, 2012 ____ Stevens Institute of Technology, Hoboken, NJ
____ ____ ____
2nd GNY Area Multimedia and Vision Meeting ____ Friday, June 15th, 2012 ____ Columbia University, New York, NY
____ ____ ____
3rd GNY Area Multimedia and Vision Meeting ____ Friday, June 14th, 2013 ____ The City College of New York, New York, NY


Travel Tips

Directions to the Polytechnic Engineering School of NYU

Printable version

5 MetroTech Center
Dibner Building, Pfizer Auditorium
Brooklyn, NY 11201

View Larger Map

By Subway

By Train

By Car

From Manhattan: From Queens, Brooklyn, Bronx and Staten Island: From Long Island: From New Jersey: From Westchester, Downstate New York and Connecticut:

Parking Info

By Shuttle

Visit engineering.nyu.edu/shuttle for schedule and routes. NYU Polytechnic School of Engineering is a member of MetroTech Center.

Hotels near the Polytechnic Engineering School of NYU

Note: all the prices in this page are not guaranteed. They are only estimated prices for one night stay on October 2/3, 2014.


Copyright © Research.IBM.com