3rd GNY Area Multimedia and Vision Meeting

Friday, June 14th, 2013

The City College of New York, New York

Registration is now available: Register Now

The Third Multimedia and Vision Day will bring together multimedia and computer vision researchers and students from both academic and industrial research institutions in the Greater NY area. It is a forum that features technical talks from invited speakers, poster presentations by researchers and students, as well as open discussions among participants. It is aimed to provide a regular forum for researchers, practitioners and students to exchange ideas, presenting current work, addressing latest topics, and sharing information in the broad areas of multimedia analytics, search and management, as well as machine learning, pattern recognition and computer vision. It is jointly sponsored by IBM T. J. Watson Research Center and The City College of New York (CCNY). The meeting will be hosted by CCNY on Friday, June 14, 2013.

Location: Lecture Hall in Steinman Hall (Grove School of Engineering Building), The City College of New York, (at Cross of 140th St and Convent Ave.) New York City, NY 10031 (see the map)

Date: Friday, June 14, 2013, 9:00am - 5:00pm

Best Poster and Demo Awards

Best Poster Award:

Felix Yu, Liangliang Cao, Rogerio Feris, John Smith, and Shih-Fu Chang (Columbia University), "Designing category-level attributes for discriminative visual recognition"

Best Demo Award:

Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff, and Shih-Fu Chang (Columbia University), "News Rover: Exploring Topical Structures and Serendipity in Heterogeneous Multimedia News"


9:00am-9:10am Welcome and Chairs' opening remarks

Session 1: Human and Cameras (Session Chair: John Wright, Columbia University)

Time: 9:10am-9:40am

Sharath Pankanti, IBM T.J Watson

Actionable Video Analytics: Insights from Practical Case Studies

Abstract: For the first time in the history of universe, the video data generated by humans exceeds all other forms of the data. While no one denies the utility of the information hidden within this deluge of data, it is clear that it is humanly impossible to browse, navigate, and search a deluge of data from cameras and other sensors in a variety of applications including surveillance, railroad inspection, driver assistance, and biometrics. The practical systems that we built, although in pursuit of different business objectives, share a common goal, which is to intelligently and efficiently glean important actionable information from an overwhelming amount of data, while being able to effectively ignore a large portion of uneventful and/or noisy data. I will summarize insights from our experience that helped us successfully address different technical and business challenges, and to deliver differentiating performance to meet our customers' expectations. Contributors, Lisa Brown, Jon Connell, Ankur Datta, Quanfu Fan, Rogerio Feris, Norman Haas, Jun Li, , Ying Li, Sachiko Miyazawa, Juan Moreno, Hoang Trinh, Nalini Ratha.

Biography: Sharath Pankanti is at Research Staff Member and Manager with Exploratory Computer Vision Group at IBM T J Watson Research Center. He received a B.S. degree in Electrical and Electronics Engg from College of Engineering Pune in 1984, M.Tech in Computer Science from Hyderabad Central University in 1988 and Ph.D. degree in Computer Science from the Michigan State University in 1995. He has published over 100 peer-reviewed publications and has contributed to over 50 inventions related to biometrics, privacy, object detection, and recognition. Dr. Pankanti is an IEEE Fellow. His experience spans a number of safety, productivity, and security focused projects involving biometrics-, multi-sensor surveillance, driver assistance technologies.

Time: 9:40am-10:10am

Gang Hua, Stevens Institute of Technology

Probabilistic Elastic Part Model: A Data Driven Pose-Invariant Representation for Face Recognition

Abstract: One of the major visual complications confronting face recognition is pose variation. It is generally perceived that a part based representation for faces would be more robust to such pose variations. Instead of adopting a set of hand-crafted parts, we take a data driven approach to a probabilistically algined part model, namely probabilistic elastic part (PEP) model. The model is achieved by fitting a spatial appearance Gaussian mixture model (GMM) on dense local features extracted from a set of pose variant face iamges. For a single face image or a track of face images, each mixture component of the learned spatial appearance GMM selects one local feature which induced the highest probablity on it. These selected local features are concatenated to form the final pose invariant representation, namely the PEP representation. We apply the PEP representation for both unconstrained face verification and unsupervised face detector adaptation. For face verification, the PEP model achieved the highest verification accuracy on both the Labeled face in the Wild and the YouTube Video face datasets. For unsupervised face detector adaptation, we observed significant detection performance improvement adapting two state-of-the-art face detectors on three different datasets.

Biography: Gang Hua is an Associate Professor of Computer Science in Stevens Institute of Technology. He also currently holds an Academic Advisor position at IBM T. J. Watson Research Center. He was a Consulting Researcher at Microsoft Research in 2012. Before joining Stevens, he had worked as full-time Researchers at leading industrial research labs for IBM, Nokia, and Microsoft. He received the Ph.D. degree in Electrical and Computer Engineering from Northwestern University in 2006. His research in computer vision studies the interconnections and synergies among the visual data, the semantic and situated context, and the users in the expanded physical world, which can be categorized into three themes: human centered visual computing, big visual data analytics, and vision based cyber-physical systems. He is the author of more than 60 peer reviewed publications in prestigious international journals and conferences. To date, he holds 9 US patent and has 13 more US patents pending. He is a Senior Member of the IEEE and a life member of the ACM.

Time: 10:10am-10:30am

Coffee Break

Time: 10:30am-11:00am

Chandra Kambhamettu, University of Delaware

Large Scale Depth analytics

Abstract: With increasing amounts of multiple view imagery available, large scale depth analytics becomes necessary for efficient generation of depth, storage, and scene understanding. Techniques to analyze the large multiple view data, as well as modeling the resulting depth data for efficient storage and recognition purpose becomes important. I will present some recent techniques that we developed using structure from motion, shape from shading, and stereo analysis of the recently captured largest sea ice stereo data. Ice represents a unique and challenging scene for 3D reconstruction techniques, as it is predominantly white and textureless, with sparse features. This work is part of the Walrus Habitat and Ice Terrain Mapping Using Video Imaging (WHITeMUVI) project, that required the construction of a camera system for the RV Polarstern, a research icebreaker operated by the Alfred Wegner Institute of Germany. The Polar Sea Ice Topography REconstruction System (PSITRES) was designed and built in the UDel VIMS lab. PSITRES was operated during the ICE ARK XXVII/3 summer 2012 research cruise where it captured imagery over 2500 kilometers of the central arctic.

Biography: Chandra Kambhamettu is currently a Professor in the Department of Computer Science, University of Delaware, Newark, where he leads the Video/Image Modeling and Synthesis (VIMS) group. From 1994?996, he was a Research Scientist at the NASA Goddard Space Flight Center (GSFC). His research interests include video modeling and image analysis for biomedical, remote sensing, and multimedia applications. He is best known for his work in motion analysis of deformable bodies, for which he received the NSF CAREER award in 2000. He has published over 200 peer-reviewed papers, supervised ten Ph.D. students and several Masters students in his areas of interest. Dr. Kambhamettu received the Excellence in Research Award from NASA in 1995 while at GSFC. He has served as Area Chair, and has been technical committee member for leading computer vision and medical conferences. He has also served as Associate Editor for the journals Pattern Recognition and Pattern Recognition Letters and the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE.

Time: 11:00am-11:30am

Kristin J. Dana, Rutgers University

Illumination Modeling for Visual MIMO

Abstract:Our modern society has pervasive electronic displays such as billboards, computers, tablets, signage and kiosks. The prevalence of these displays provides opportunities to develop photographic methods for active scenes where intentional information is encoded in the display images and must be recovered by a camera. These active scenes are fundamentally different from traditional passive scenes because image formation is based on display emittance, not surface reflectance. QR-codes on billboards are one example of an active scene with intentional information, albeit a very simple case. The problem becomes more challenging when the message is hidden and dynamic. Detecting and decoding the message requires careful photometric modeling for computational message recovery. We present a novel method for communicating between a camera and display by embedding and recovering information within a displayed image. A handheld camera pointed at the display can receive not only the display image, but also the underlying message. Unlike standard watermarking and steganography that lie outside the domain of computer vision, our message recovery algorithm uses illumination in order to op- tically communicate hidden messages in real world scenes. The key innovation of our approach is an algorithm to perform simultaneous radiometric calibration and message recovery in one convex optimization problem. By modeling the photometry of the system using a camera-display transfer function (CDTF), we derive a physics-based kernel function for support vector machine classification. We demonstrate that our method of optimal online radiometric calibration (OORC) leads to an efficient and robust algorithm for a computational messaging between various commercial cameras and displays. An evaluation of results has been provided by using video messaging with nine different combinations of commercial cameras and displays.

Biography: Kristin J. Dana received the PhD from Columbia University (NY,NY) in 1999 and the MS degree from Massachusetts Institute of Technology in 1992, and a BS degree in 1990 from the Cooper Union (NY,NY). She is an associate professor in the Department of Electrical and Computer Engineering at Rutgers, The State University of New Jersey. Her research interests in computer vision include computational photography, machine learning, illumination modeling, texture and reflectance, motion estimation, optical devices, optimization in vision and applications of robotics. Dr. Dana is the inventor of the "texture camera" for convenient measurement of reflectance and texture. She is also a member of the Rutgers Center for Cognitive Science and a member of Graduate Faculty of the Computer Science Department. From 1992-1995 she was on the research staff at Sarnoff Corporation developing real-time motion estimation algorithms for applications in defense, biomedicine and entertainment industries. She is the recipient of the General Electric "Faculty of the Future" fellowship in 1990, the Sarnoff Corporation Technical Achievement Award in 1994 for the development of a practical algorithm for the real-time alignment of visible and infrared video images, and the National Science Foundation Career Award (2001) for a program investigating surface science for vision and graphics.

Session 2: Poster and Demo (Session Chair: Michele Merler and Quanfu Fan, IBM Research)

Time: 11:30:pm-1:30pm Lunch, Posters/Demos

Session 3: Social Media and Crowd Sourcing (Session Chair: Zhigang Zhu, CCNY)

Time: 1:30pm-2:00pm

Shih-Fu Chang, Columbia University

1200 Popular Concepts and Classifiers for Describing Visual Sentiment in Social Multimedia

Abstract: A picture is worth one thousand words, but what words should be used to describe the sentiments and emotions conveyed in the increasingly popular social multimedia? I will present a principled approach combining sound structures from psychology and the folksonomy information extracted from social multimedia to developing a large visual sentiment ontology. I will also show machine learning classifiers trained using such ontology, and visualization tools supporting intuitive exploration of the rich visual sentiment space. The ontology, dataset, and classifiers will be made available.

Biography: Shih-Fu Chang is Richard Dicker Chair Professor, Director of the Digital Video and Multimedia Lab, and Senior Vice Dean of Engineering School at Columbia University. He is an active researcher leading development of theories, algorithms, and systems for multimedia analysis and retrieval. In the last two decades, he and his students developed some of the earliest image/video search engines, such as VisualSEEk, VideoQ, and WebSEEk, contributing to the foundation of the vibrant field of content-based visual search and commercial systems for Web image search today. Recognized by many best paper awards and high citation impacts, his scholarly work set trends in several important areas, such as compressed-domain video manipulation, video structure parsing, image authentication, large-scale indexing, and semantic video analysis. His group demonstrated the best performance in the international video retrieval evaluation forum TRECVID (2008 and 2010). The video concept classifier library, ontology, and annotated video corpora released by his group have been used by more than 100 groups. He co-led the ADVENT university-industry research consortium with the participation of more than 25 industry sponsors. He has received ACM SIG Multimedia Technical Achievement Award, the IEEE Kiyo Tomiyasu award, IBM Faculty award, and Service Recognition Awards from IEEE and ACM. He served as the general co-chair of ACM Multimedia conference in 2000 and 2010, Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), Chairman of Columbia Electrical Engineering Department (2007-2010), Senior Vice Dean of Columbia Engineering School (2012-date), and advisor for several companies and research institutes. His research has been broadly supported by government agencies (NSF, DARPA, IARPA, NGA, ONR, NY State) as well as many industry sponsors. He is a Fellow of IEEE and the American Association for the Advancement of Science.

Time: 2:00pm-2:30pm

Serge J. Belongie, UC San Diego

Fine Grained Visual Categorization with Humans in the Loop

Abstract:We present an interactive, hybrid human-computer method for object classification. The method applies to classes of problems that are difficult for most people, but are recognizable by people with the appropriate expertise (e.g., animal species or airplane model recognition). The classification method can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. Incorporating user input drives up recognition accuracy to levels that are good enough for practical applications; at the same time, computer vision reduces the amount of human interaction required. The resulting hybrid system is able to handle difficult, large multi-class problems with tightly-related categories. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate the accuracy and computational properties of different computer vision algorithms and the effects of noisy user responses on a dataset of 200 bird species and on the Animals With Attributes dataset. Our results demonstrate the effectiveness and practicality of the hybrid human-computer classification paradigm.

Biography: Serge Belongie received the B.S. degree (with honor) in Electrical Engineering from the California Institute of Technology in 1995 and the M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences (EECS) at U.C. Berkeley in 1997 and 2000, respectively. While at Berkeley, his research was supported by a National Science Foundation Graduate Research Fellowship. He is also a co-founder of Digital Persona, Inc., and the principal architect of the Digital Persona fingerprint recognition algorithm. He is currently a Professor in the Computer Science and Engineering Department at U.C. San Diego. His research interests include computer vision and pattern recognition. He is a recipient of the NSF CAREER Award and the Alfred P. Sloan Research Fellowship. In 2004 MIT Technology Review named him to the list of the 100 top young technology innovators in the world (TR100).

Time: 2:30pm-3:00pm

Amitha Perera, Kitware

Semantics in Machine Analysis of Images and Video

Abstract: The computer vision community has made tremendous progress in extracting semantic information from images and video over the last decade, as evidenced by the various fairly large datasets and detection algorithms, including Image Net and Object Bank; PASCAL VOC; the YouTube action dataset; and others. However, our recent research into applying the state-of-the-art detectors to the TRECVID MED data (a large "in the wild" video collection) shows that there is still work to be done in reliably extracting semantics across datasets. In this talk, we will look at the current state of the art, and see how it stands up to application in the real world. Through this, we will suggest some areas that we believe requires further research to truly make semantic extraction a commodity.

Biography: Dr. Perera’s current research is in video image analysis including moving object detection, tracking, and object recognition to derive high-level understanding from video (e.g., via activity recognition). He is also interested in active vision, robust statistics and estimation, and image segmentation. Dr. Perera's research in video is focused on developing robust algorithms that can be applied to real-world data, and in particular on developing mechanisms to cope gracefully with failure. Dr. Perera received his B.S., B.S. (Hons), and M.S. degrees from the University of the Witwatersrand, Johannesburg, South Africa, and his Ph.D. from Rensselaer Polytechnic Institute. Prior to joining Kitware, Dr. Perera was at the Visualization and Computer Vision group at GE Global Research, where he was involved in a number of projects spanning aerial and ground video analysis, satellite and aerial image analysis, computer-aided detection in mammography, and iris biometrics.

Time: 10:10am-10:30am

Coffee Break

Session 4: Cognitive Computing (Session Chair: Ying Li, IBM Research)

Time: 3:20pm-3:50pm

Noah Snavely, Cornell University

The Distributed Camera

Abstract: We live in a world of ubiquitous imagery, in which the number of images at our fingertips is growing at a seemingly exponential rate. These images come from a wide variety of sources, including mapping sites, webcams, and millions of photographers around the world uploading billions and billions of images to social media and photo-sharing websites, such as Facebook. Taken together, these sources of imagery can be thought of as constituting a distributed camera capturing the entire world at unprecedented scale, and continually documenting its cities, mountains, buildings, people, and events. This talk will focus on how we might use this distributed camera as a fundamental new tool for science, engineering, and environmental monitoring, and how a key problem is *calibration* -- determining the geometry of each photo, and relating it to all other photos, in an efficient, automatic way. I will describe our work on building a massive geometric database of images, and on using this database to automatically calibrate new photos.

Biography: Noah Snavely is an assistant professor of Computer Science at Cornell University, where he has been on the faculty since 2009. He received a B.S. in Computer Science and Mathematics from the University of Arizona in 2003, and a Ph.D. in Computer Science and Engineering from the University of Washington in 2008. Noah works in computer graphics and computer vision, with a particular interest in using vast amounts of imagery from the Internet to reconstruct and visualize our world in 3D, and in creating new tools for enabling people to capture and share their environments. His thesis work was the basis for Microsoft's Photosynth, a tool for building 3D visualizations from photo collections that has been used by many thousands of people. Noah is the recipient of a Microsoft New Faculty Fellowship and an NSF CAREER Award, and has been recognized by Technology Review's TR35.

Time: 3:50pm-4:20pm

James Hays, Brown University

Scene Attributes and Object Sketches

Abstract: I will discuss two recent recognition tasks which are possible because of new, crowdsourced databases. First, I will talk about attribute-based representations of scenes in which images are described by over one hundred attribute labels related to materials, surface properties, lighting, affordances, and spatial layout of scenes. Second, I will present a new database of non-expert sketches of 250 everyday objects such as 'teapot' or 'car' and compare human and machine recognition of such sketches.

Biography: James Hays is the Manning assistant professor of computer science at Brown University. His research interests span computer graphics, computer vision, and computational photography. His research focuses on using "Internet-scale" data and crowds-sourcing to improve scene understanding and allow smarter image synthesis and manipulation. Before joining Brown, James worked with Antonio Torralba as a post-doc at Massachusetts Institute of Technology. He received a Ph.D. in Computer Science from Carnegie Mellon University in 2009 while working with Alexei Efros, and a B.S. in Computer Science from Georgia Institute of Technology in 2003. James is funded by an NSF CAREER award and gifts from Microsoft, Adobe, and Google.

Time: 4:20pm-4:50pm

Vladimir Pavlovic, Rutgers University

Beyond Categorization: Ordinal Modeling in Vision and Affective Computing

Abstract: Categorization or classification is a common paradigm used for solving many problems in computer vision and multimedia, ranging from object recognition and image annotation, to the prediction of human emotions. However, some problems can be better described as ordinal assignment (grading or rating) tasks. I will describe two instances of such problems, modeling of temporal phases or intensity in facial affect and the assignment of ratings to images. Both tasks leverage a new modeling framework for dealing with structured intensity data, known as the Conditional Ordinal Random Field (CORF). I will explain how the intrinsic topology of multidimensional continuous facial affect data can be modeled by an ordinal manifold. The resulting model attains simultaneous dynamic recognition and intensity estimation of facial expressions of multiple emotions. The proposed method is the first to achieve this on both deliberate as well as spontaneous facial affect data. I will then show extensions of this approach to modeling of action units, the pain intensity estimation, and to rating of image annotations.

Biography: Vladimir Pavlovic is an Associate Professor in the Computer Science Department at Rutgers University. He received the PhD in electrical engineering from the University of Illinois in Urbana-Champaign in 1999. From 1999 until 2001 he was a member of research staff at the Cambridge Research Laboratory, Cambridge, MA. Before joining Rutgers in 2002, he held a research professor position in the Bioinformatics Program at Boston University. Vladimir's research interests include probabilistic system modeling, time-series analysis, statistical computer vision and bioinformatics. He has published over 100 peer-reviewed papers in major computer vision, machine learning and pattern recognition journals and conferences.

Session 5: Panel Discussion

Time: 5:00pm-6:00pm Panel Discussion
Time: 6:00pm Announcement of Best Poster/Demo

Panel List: Chandra Kambhamettu (U Delaware), Amitha Perera (Kitware Inc.), Noah Snavely (Cornell U), James Hays (Brown U), and Gang Hua (Stevens IT).

Posters and Demos

Call for Papers/Demos

Poster Session I: Image/Video Classification/Retrieval/Detection

  1. Andrew Kae, Kihyuk Sohn, Honglak Lee, and Erik Learned-Miller (University of Massachusetts Amherst), “Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling” abstract poster website paper link
  2. Shahriar Shariat Talkhoonche and Vladimir Pavlovic (Rutgers University), “Robust Time-series Retrieval by Adaptive Segmental Alignment” abstract
  3. Benjamin Elizalde, Mirco Ravanelli, Gerald Friedland (ICSI Berkelely ), “Audio Concept Ranking for Video Event Detection on User-Generated Content” abstract poster
  4. George Kamberov, Olga Koteoglou, Lazaros Karydas, and Matt Burlick (Stevens Institute of Technology), “Sub-Scenes and Semantic Video Shot Detection” abstract poster
  5. George Kamberov, Matt Burlick, Lazaros Karydas, and Olga Koteoglou (Stevens Institute of Technology), “SCAR: Dynamic adaptation for person detection and persistence analysis in unconstrained videos” abstract poster
  6. Saehoon Yi and Vladimir Pavlovic (Rutgers University), “Spatio-Temporal Context Modeling for BoW-Based Video Classification” abstract
  7. Felix Yu, Liangliang Cao, Rogerio Feris, John Smith, and Shih-Fu Chang (Columbia University), “Designing category-level attributes for discriminative visual recognition” abstract poster website paper link
  8. Carol Mazuera, Xiaodong Yang, Shizhi Chen, and YingLi Tian (CUNY City College), “Visual Speech Segmentation and Recognition Using Dynamic Lip Movement” abstract
  9. Dong Liu, Kuai-Ting Lai, Guangnan Ye, Ming-Syan Chen and Shih-Fu Chang (Columbia University), “Sample-Specific Late Fusion for Visual Category Recognition” abstract
  10. An-Ti Chiang and Yao Wang (NYU Poly), “Human Detection in Fish-Eye Video Using Histogram of Oriented Gradient over Rotated Windows” abstract
  11. Liang Du and Haibin Ling (Temple University), “Dynamic Scene Classification Using Spatially Redundant Instances” abstract poster
  12. Shizhi Chen, Xiaodong Yang, and Yingli Tian (CUNY City College), “Back Propagated Hierarchical K-Means Tree for Large Scale Image Classification” abstract
  13. Mohamed Elhoseiny and Ahmed Elgammal (Rutgers University), "Hierarchical MindMap Generation from Purely Text Description” abstract poster website paper link flier

Poster Session II: Geometry Analysis

  1. Parneet Kaur and Kristin J. Dana (Rutgers University), “Computer Vision for Automated Bridge Deck Evaluation using Ground Penetrating Radar Scan” abstract
  2. Ali Osman Ulusoy and Joseph Mundy (Brown University), “Probabilistic volumetric framework for image based modeling of general dynamic 3D scenes” abstract
  3. Yuqian Zhang, Cun Mu, Han-wen Kuo, and John Wright (Columbia University), “Towards Guaranteed Illumination Models for Non-Convex Objects” abstract
  4. Manjunath Narayana, Erik G. Learned-Miller, Allen Hanson (University of Massachusetts Amherst), “Coherent motion segmentation in moving camera videos using optical flow orientations” abstract
  5. Greg Olmschenk and Zhigang Zhu (CUNY City College), "3D Corridor Modeling Using a Single Image” abstract poster
  6. Feng Hu and Zhigang Zhu (CUNY City College), “Vertical Line Detection and Matching for an iPhone Navigation System with a Portable Omnidirectional Lens” abstract poster
  7. Wenjia Yuan, Kristin Dana, A. Ashok, M. Gruteser, N. Mandayam (Rutgers University), “Spatially Varying Radiometric Calibration for Camera-Display Messaging” abstract

Poster Session III: Subjective Image/Video Analysis

  1. John R. Zhang, John R. Kender, and Xiang Ma (Columbia University), “Dramatic Speaker Gestures as Indicators of Segments of Interest for Video Browsing” abstract poster
  2. Matt Burlick, George Kamberov, Olga Koteoglou, and Lazaros Karydas (Stevens Institute of Technology), "Leveraging Crowdsourced Data for Creating Temporal Segmentation Ground Truths of Subjective Tasks” abstract poster
  3. Po-Yu Chen and Ivan W. Selesnick (NYU Poly), "Speech Enhancement by Translation-Invariant Group Shrinkage/Thresholding” abstract poster website
  4. Jongpil Kim, Sejong Yoon and Vladimir Pavlovic (Rutgers University), "Relative Spatial Features for Image Memorability” abstract poster
  5. Eymen Kurdoglu, Yao Wang, Yong Liu (NYU Poly), “Video quality assessment in Multiparty video conferencing” abstract
  6. Yuanyi Xue, Beril Erkin and Yao Wang (NYU Poly), “A Novel No-reference Temporal Jerkiness Quality Metric for Videos” abstract
  7. Kota Yamaguchi, Luis E. Ortiz and Tamara L. Berg (Stony Brook University), “What makes a popular fashion picture” abstract poster

Demo Session

  1. Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff, and Shih-Fu Chang (Columbia University), “News Rover: Exploring Topical Structures and Serendipity in Heterogeneous Multimedia News” abstract
  2. Yin Cui, Yongzhou Xiang, and Kun Rong (Columbia University), “Large-scale Galaxy Image Retrival” abstract
  3. Noel C. F. Codella, Michele Merler, Liangliang Cao, Leiguang Gong, John R. Smith (IBM Research), "Content Based Medical Image Retrieval for Modality, Body Region, View, and Disease" abstract
  4. Edgardo Molina, Frank Palmer, Lei Ai (The City College of New York), "Vista Wearable" abstract
  5. Lisa M. Brown, Ankur Datta, Quanfu Fan, Rogerio Feris, Rick Kjeldsen, Russell Bobbitt, Sharath Pankanti, Chiao-Fe Shu (IBM Research), "IBM IVA: Attribute-Based People Search" abstract
  6. Jonathan Connell, Etienne Marcheret, Sharath Pankanti, Michiharu Kudoh, Risa Nishiyama (IBM Research), "An Extensible Language Interface for Robot Manipulation" abstract
  7. Nalini Ratha and Jon Connell (IBM Research), "Privacy enhancements in biometrics" abstract
  8. Nalini Ratha and team (IBM Research), "Trusted strong authentication on Mobile devices" abstract


Everyone will be welcome with a free registration here.



Steering Committee

John R. Smith (IBM Research)

Shih-Fu Chang (Columbia)

Tsuhan Chen (Cornell University)

Rogerio Feris (IBM Research)

Ying Li (IBM Research)


General Chairs

Yingli Tian (CCNY, CUNY)

Liangliang Cao (IBM Research)

Harpreet Singh Sawhney (SRI International-Sarnoff)




Program Chairs

Zhigang Zhu (CCNY, CUNY)

Ajay Divakaran (SRI International-Sarnoff)

Sanjiv Kumar (Google Research)




Poster Chairs

Michele Merler (IBM Research)

Haibin Ling (Temple University)





Demo Chairs

Quanfu Fan (IBM Research)

Lu Wang (SRI International-Sarnoff)


Local Arrangement Chairs

Hanghang Tong (CCNY, CUNY)

Yang Xian (CCNY, CUNY)


Panel Chair

Noel Codella (IBM Research)




Website Chair

Chenyang Zhang (CCNY, CUNY)






IBM T. J. Watson Research Center

The City College of New York (CCNY)

Previous GNY Meetings

1st GNY Area Multimedia and Vision Meeting . Tuesday, February 7, 2012. Stevens Institute of Technology, Hoboken, NJ

2nd GNY Area Multimedia and Vision Meeting . Friday, June 15th, 2012. Columbia University, New York, NY


Travel Tips

Direction to CCNY

View Larger Map

Directions to CCNY: From Time Square area (Penn Station), take MTA subway train A (express train preferred), B and D uptown direction to 145th street stations (on St. Nicolas Ave), or take train #1 uptown direction to City College station (137th Street station on Broadway Ave). Walk to Convent Ave & 140th street. Or take Taxi to CCNY at Convent Ave & 140th Street. Steinman Hall is the building of the Grove School of Engineering.

Hotels near The City College of New York

Note: all the prices in this page are not guaranteed. They are only estimated prices for one night stay on June 13/14, 2013.


Copyright © Research.IBM.com