4th GNY Area Multimedia and Vision Meeting

Friday, October 3rd, 2014

NYU Polytechnic Engineering School, Brooklyn, New York

The Fourth Multimedia and Vision Day will bring together multimedia and computer vision researchers and students from both academic and industrial research institutions in the Greater NY area. It is a forum that features technical talks from invited speakers, poster presentations by researchers and students, as well as open discussions among participants. It is aimed to provide a regular forum for researchers, practitioners and students to exchange ideas, presenting current work, addressing latest topics, and sharing information in the broad areas of multimedia analytics, search and management, as well as machine learning, pattern recognition and computer vision. It is jointly sponsored by IBM T. J. Watson Research Center and the Polytechnic Engineering School of NYU . The meeting will be hosted by the Polytechnic Engineering School of NYU on Friday, October 3rd, 2014.

Location: Polytechnic Engineering School of NYU, Brooklyn, New York

Date: Friday, October 3rd, 2014, 9:00am - 5:00pm

Follow us on









9:00-9:10 AM: Welcome and Chair’s opening remarks

9:10-11:30 AM: Invited Presentations (AM session)


Terrence Chen, Siemens Corporate Research

Computer assisted intervention by robust image-based tracking


Tsuhan Chen, Cornell University

Referring Expressions: How to point by not pointing

Abstract: Humans interact with computers in many ways. For example, one may tell a robot to pick up "the red bottle on that round table". Vice versa, a video surveillance system may tell a human guard to chase "the man in sunglasses carrying a brown suitcase". Likewise, a navigation system may tell the driver to go "50 meters beyond the restaurant on the right with a yellow awning", or to follow "that white car that's turning left at the upcoming T-interaction". In this talk, we will present our recent findings in automatic generation of these expressions, which are often called the "referring expressions." Combining techniques in computer vision and key concepts in language and psychology, we can generate efficient referring expressions that take into account the saliency of objects, uncertainty due to imperfect visual attributes, and object location and relative attributes. With crowdsourcing to collect user data and to validate our hypotheses, we show that our referring expressions are effective in referring the viewer to a specific image, or a specific object within an image. Such a multimodal approach to human-machine interaction presents exciting research opportunities.


Coffee Break


Ajay Divakaran, SRI International

Comprehensive Human State Modeling and Its Applications

Abstract: We present a suite of multimodal techniques for assessment of human behavior with cameras and microphones. These techniques drive the sensing module of an interactive simulation trainer in which the trainee has lifelike interaction with a virtual character so as to learn social interaction. We recognize facial expressions, gaze behaviors, gestures, postures, speech and paralinguistics in real-time and transmit the results to the simulation environment which reacts to the trainee's behavior in a manner that serves the overall pedagogical purpose. We will describe the techniques developed and results, comparable to or better than the state of the art ,obtained for each of the behavioral cues, as well as identify avenues for further research. Behavior sensing in social interactions poses a few key challenges for each of the cues including the large number of possible behaviors, the high variability in execution of the same behavior within and across individuals and real-time execution. Furthermore, we have the challenge of appropriate fusion of the multimodal cues so as to arrive at a comprehensive assessment of the behavior at multiple time scales. We will also discuss our approach to social interaction modeling using our sensing capability to monitor and model dyadic interactions. We will present a video of the demonstration of the end to end simulation trainer.


Alejandro Jaimes , Yahoo Research


11:30AM-1:30 PM: Poster and Demo and Lunch

11:30:pm-1:30pm Poster, Demo and Lunch

1:30-3:50 PM: Invited Presentations (PM session)

1:30-2:00 PM

Sanjiv Kumar , Google Research

Circulant Binary Embedding for High Dimensional Data

Abstract: Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. In this talk, I will describe Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from O(d^2) to O(dlogd), and the space complexity from O(d^2) to O(d) where d is the input dimensionality. We also propose a novel time-frequency alternating optimization to learn data-dependent circulant projections. Extensive experiments show that the proposed approach gives much better performance than the state-of-the-art approaches for fixed time, and provides much faster computation with no performance loss for fixed number of bits.

2:00-2:30 PM

Yann LeCun , New York University and Facebook AI Research

The unreasonable effectiveness of deep learning

Abstract:Over the last two years, deep learning systems have been rapidly deployed for a wide variety of industrial applications, including speech recognition, image and video tagging, face recognition, and various tasks in natural language processing and information retrieval. Companies such as Facebook, Google, Baidu, Microsoft, Yandex, IBM, and Yahoo! have all deployed large-scale services and products built around deep learning.
The essential advantage of deep learning approaches is the ability to train a multi-stage system from end to end. A particularly popular deep learning architecture is the convolutional network. Fed with raw inputs, convolutional nets simultaneously learn low-level features, mid-level features, high-level features, and classifiers. The success of deep learning has demonstrated the rather humbling fact that gradient descent applied to the training of large convolutional nets with tens of millions of parameters on millions of examples produces better recognition systems than the best human engineers.
I will describe some of the latest developments in deep learning and discuss some of the most promising avenues for future research.

2:30-2:50 PM

Coffee Break

2:50-3:20 PM

John R. Smith, IBM T. J. Watson Research Center

Semantics of Visual Discrimination

Abstract:The multimedia and vision community is making great progress on image recognition using data-driven machine learning techniques. The recent ImageNet results are an impressive example where convolutional neural nets are achieving a breakthrough in performance. However, given that image recognition capabilities are starting to work at a more significant scale, it is time to focus on more meaningful formulation of visual discrimination tasks. What this requires is more careful design and modeling of the visual semantic space that supports multiple facets of visual content description across semantic concepts of people, objects, scenes, actions, activities, events, etc. We highlight a number of semantic modeling issues related to inheritance, mutual exclusivity and completeness across the visual semantic space and discuss their important role for effective visual discriminative learning. We demonstrate sample results that explore different semantic modeling techniques and study their effect on visual discrimination in different image and video data domains.

3:20-3:50 PM

Yingli Tian, The City College, City University of New York

Computer Vision-based Assistive Technology for Blind Persons

Abstract: Abstract: Recent technology developments in computer vision, digital cameras, and portable computers make it possible to develop practical computer vision-based assistive technologies to help blind persons independently explore unfamiliar environments and improve the quality of daily life. In this talk, I will introduce the research conducted in CCNY Media Lab for applying computer vision technologies to assist people who are visually impaired including indoor navigation and wayfinding, text reading, banknote recognition, and clothing pattern recognition, etc.

Biography:Dr. Ying-Li Tian is a professor in the Department of Electrical Engineering at the City College of New York. She received her PhD from the Department of Electronic Engineering at the Chinese University of Hong Kong in 1996 and her BS and MS from TianJin University, China in 1987 and 1990. After she held an associate professor position in National Laboratory of Pattern Recognition at the Chinese Academy of Sciences, Beijing, China, Dr. Tian joined the Robotics Institute in Carnegie Mellon University as a postdoctoral fellow in 1998. From 2001 to 2008, Dr. Tian was a research staff member at IBM T. J. Watson Research Center. She was one of the inventors of the IBM Smart Surveillance Solutions (SSS) product and led the video analytics team. She received several IBM Invention Achievement Awards and the IBM Outstanding Innovation Achievement Award due to her contributions to IBM SSS. Dr. Tian has published more than 150 papers in journals and conferences and holds 20 patents. Her current research focuses on a wide range of computer vision problems from assistive technology, to human identification, facial expression analysis, and video surveillance. She is an area editor for Computer Vision and Image Understanding and a senior member of IEEE.

3:50-4:00 PM


4:00-5:00 PM: Panel Discussion (Session Chair: Alexander Haubold, Google Research)

Theme: Future of Big Data for Multimedia and Computer Vision

Panelists: David Gibbon (AT&T Research), Sanjiv Kumar (Google Research), Ching-Yung Lin (IBM Research).
Additional panelists will be announced soon.

5:00-5:15 PM: Award Presentation and Closing Remarks


Posters and Demos

Call for Papers/Demos

Researchers in the GNY area are invited to submit posters and demos to showcase their work at the meeting.
The following awards ($500 each) will be chosen based on voting by workshop participants: Best Poster Prize, sponsored by Siemens Corporation , Best Demo Prize, sponsored by Google Research, and Best Student Poster Prize (first author must be a student), sponsored by Google Research. Details on the submission procedure can be found in the Call for Papers document here .



The event is free, but please register so that we could have an accurate count of participants.
Registered attendees will be provided a free lunch, sponsored by IBM Research.
Register now  



Steering Committee

John R. Smith (IBM Research)

Shih-Fu Chang (Columbia University)

Tsuhan Chen (Cornell University)

Ajay Divakaran (Sarnoff SRI)

Yingli Tian CCNY, CUNY)




General Chairs

Yao Wang (New York University)

John R Kender (Columbia University)







Program Chairs

Quanfu Fan (IBM Research)

Kevin Chang (Siemens)







Poster Chair

Quanfu Zhu (IBM Research)








Demo Chair

Kevin Chang (Siemens)







Local Arrangement Chair

Raquel Thompson (New York University)

rct274 at nyu.edu






Panel Chair

Alexander Haubold (Google)








Michele Merler (IBM Research)








____ ____ ____ ____ ____ ____


Previous GNY Meetings

1st GNY Area Multimedia and Vision Meeting ____ Tuesday, February 7th, 2012 ____ Stevens Institute of Technology, Hoboken, NJ
____ ____ ____
2nd GNY Area Multimedia and Vision Meeting ____ Friday, June 15th, 2012 ____ Columbia University, New York, NY
____ ____ ____
3rd GNY Area Multimedia and Vision Meeting ____ Friday, June 14th, 2013 ____ The City College of New York, New York, NY


Travel Tips

Direction to the Polytechnic Engineering School of NYU

View Larger Map

Directions to the Polytechnic Engineering School of NYU:

Hotels near the Polytechnic Engineering School of NYU

Note: all the prices in this page are not guaranteed. They are only estimated prices for one night stay on October 2/3, 2014.


Copyright © Research.IBM.com