|
Time: TBD
|
Shih-Fu Chang, Columbia University
1200 Popular Concepts and Classifiers for Describing Visual
Sentiment in Social Multimedia
|
Abstract: A picture is worth one thousand words, but what words should be used to
describe the sentiments and emotions conveyed in the increasingly
popular social multimedia? I will present a principled approach
combining sound structures from psychology and the folksonomy
information extracted from social multimedia to developing a large
visual sentiment ontology. I will also show machine learning classifiers
trained using such ontology, and visualization tools supporting
intuitive exploration of the rich visual sentiment space. The ontology,
dataset, and classifiers will be made available.
Biography: Shih-Fu Chang is Richard Dicker Chair Professor, Director of the Digital Video and Multimedia Lab, and Senior Vice Dean of Engineering School at Columbia University. He is an active researcher leading development of theories, algorithms, and systems for multimedia analysis and retrieval. In the last two decades, he and his students developed some of the earliest image/video search engines, such as VisualSEEk, VideoQ, and WebSEEk, contributing to the foundation of the vibrant field of content-based visual search and commercial systems for Web image search today. Recognized by many best paper awards and high citation impacts, his scholarly work set trends in several important areas, such as compressed-domain video manipulation, video structure parsing, image authentication, large-scale indexing, and semantic video analysis. His group demonstrated the best performance in the international video retrieval evaluation forum TRECVID (2008 and 2010). The video concept classifier library, ontology, and annotated video corpora released by his group have been used by more than 100 groups. He co-led the ADVENT university-industry research consortium with the participation of more than 25 industry sponsors. He has received ACM SIG Multimedia Technical Achievement Award, the IEEE Kiyo Tomiyasu award, IBM Faculty award, and Service Recognition Awards from IEEE and ACM. He served as the general co-chair of ACM Multimedia conference in 2000 and 2010, Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), Chairman of Columbia Electrical Engineering Department (2007-2010), Senior Vice Dean of Columbia Engineering School (2012-date), and advisor for several companies and research institutes. His research has been broadly supported by government agencies (NSF, DARPA, IARPA, NGA, ONR, NY State) as well as many industry sponsors. He is a Fellow of IEEE and the American Association for the Advancement of Science.
|
|
Time: TBD
|
Serge J. Belongie, UC San Diego
Fine Grained Visual Categorization with Humans in the Loop
|
Abstract:We present an interactive, hybrid human-computer method for object classification. The method applies to classes of problems that are difficult for most people, but are recognizable by people with the appropriate expertise (e.g., animal species or airplane model recognition). The classification method can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. Incorporating user input drives up recognition accuracy to levels that are good enough for practical applications; at the same time, computer vision reduces the amount of human interaction required. The resulting hybrid system is able to handle difficult, large multi-class problems with tightly-related categories. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate the accuracy and computational properties of different computer vision algorithms and the effects of noisy user responses on a dataset of 200 bird species and on the Animals With Attributes dataset. Our results demonstrate the effectiveness and practicality of the hybrid human-computer classification paradigm.
Biography: Serge Belongie received the B.S. degree (with honor) in Electrical Engineering from the California Institute of Technology in 1995 and the M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences (EECS) at U.C. Berkeley in 1997 and 2000, respectively. While at Berkeley, his research was supported by a National Science Foundation Graduate Research Fellowship. He is also a co-founder of Digital Persona, Inc., and the principal architect of the Digital Persona fingerprint recognition algorithm. He is currently a Professor in the Computer Science and Engineering Department at U.C. San Diego. His research interests include computer vision and pattern recognition. He is a recipient of the NSF CAREER Award and the Alfred P. Sloan Research Fellowship. In 2004 MIT Technology Review named him to the list of the 100 top young technology innovators in the world (TR100).
|
|
Time: TBD
|
Kristin J. Dana, Rutgers University
Illumination Modeling for Visual MIMO
|
Abstract:Our modern society has pervasive electronic displays such as billboards, computers, tablets, signage and kiosks. The prevalence of these displays provides opportunities to develop photographic methods for active scenes where intentional information is encoded in the display images and must be recovered by a camera. These active scenes are fundamentally different from traditional passive scenes because image formation is based on display emittance, not surface reflectance. QR-codes on billboards are one example of an active scene with intentional information, albeit a very simple case. The problem becomes more challenging when the message is hidden and dynamic. Detecting and decoding the message requires careful photometric modeling for computational message recovery. We present a novel method for communicating between a camera and display by embedding and recovering information within a displayed image. A handheld camera pointed at the display can receive not only the display image, but also the underlying message. Unlike standard watermarking and steganography that lie outside the domain of computer vision, our message recovery algorithm uses illumination in order to op- tically communicate hidden messages in real world scenes. The key innovation of our approach is an algorithm to perform simultaneous radiometric calibration and message recovery in one convex optimization problem. By modeling the photometry of the system using a camera-display transfer function (CDTF), we derive a physics-based kernel function for support vector machine classification. We demonstrate that our method of optimal online radiometric calibration (OORC) leads to an efficient and robust algorithm for a computational messaging between various commercial cameras and displays. An evaluation of results has been provided by using video messaging with nine different combinations of commercial cameras and displays.
Biography: Kristin J. Dana received the PhD from Columbia University (NY,NY) in 1999 and the MS degree from Massachusetts Institute of Technology in 1992, and a BS degree in 1990 from the Cooper Union (NY,NY). She is an associate professor in the Department of Electrical and Computer Engineering at Rutgers, The State University of New Jersey.
Her research interests in computer vision include computational photography, machine learning, illumination modeling, texture and reflectance, motion estimation, optical devices, optimization in vision and applications of robotics. Dr. Dana is the inventor of the "texture camera" for convenient measurement of reflectance and texture. She is also a member of the Rutgers Center for Cognitive Science and a member of Graduate Faculty of the Computer Science Department. From 1992-1995 she was on the research staff at Sarnoff Corporation developing real-time motion estimation algorithms for applications in defense, biomedicine and entertainment industries. She is the recipient of the General Electric "Faculty of the Future" fellowship in 1990, the Sarnoff Corporation Technical Achievement Award in 1994 for the development of a practical algorithm for the real-time alignment of visible and infrared video images, and the National Science Foundation Career Award (2001) for a program investigating surface science for vision and graphics.
|
|
Time: TBD
|
James Hays, Brown University
title: TBD
|
Abstract: TBD
Biography: James Hays received my Ph.D. from Carnegie Mellon University in 2009, working with Alexei Efros. I worked with Antonio Torralba as a postdoc at Massachusetts Institute of Technology.
He is now an assistant professor in Brown University.
His research interests span computer graphics, computer vision, and computational photography. My research focuses on using "Internet-scale" data and crowd-sourcing to improve scene understanding and allow smarter image synthesis and manipulation. I am part of the Graphics, Visualization, and Interaction group at Brown.
|
|
Time: TBD
|
Gang Hua, Stevens Institute of Technology
title: TBD
|
Abstract: TBD
Biography: Gang Hua is an Associate Professor of Computer Science in Stevens Institute of Technology. He also holds an Academic Visiting Researcher position at IBM T. J. Watson Research Center. Before that, he was a Research Staff Member at IBM Research T. J. Watson Center from 2010 to 2011, a Senior Researcher at Nokia Research Center, Hollywood from 2009 to 2010, and a Scientist at Microsoft Live Labs Research from 2006 to 2009. He received the Ph.D. degree in Electrical and Computer Engineering from Northwestern University in 2006, and a M.S. in pattern recognition and intelligence system from Xi'an Jiaotong University (XJTU) in 2002. He was selected to the Special Class for the Gifted Young of XJTU in 1994 and received a B.S. in Electrical Engineering in 1999. He received the Richter Fellowship and the Walter P. Murphy Fellowship from Northwestern University in 2005 and 2002, respectively. He is a Senior Member of the IEEE and a Member of the ACM. To date, he holds 8 US patent and has 12 more patents pending.
|
|
Time: TBD
|
Chandra Kambhamettu, University of Delaware
title: TBD
|
Abstract: TBD
Biography: Chandra Kambhamettu is currently a Professor in the Department of Computer Science, University of Delaware, Newark, where he leads the Video/Image Modeling and Synthesis (VIMS) group. From 1994?996, he was a Research Scientist at the NASA Goddard Space Flight Center (GSFC). His research interests include video modeling and image analysis
for biomedical, remote sensing, and multimedia applications. He is best known for his work in motion analysis of deformable bodies, for which he received the NSF CAREER award in 2000. He has published over 200 peer-reviewed papers, supervised ten Ph.D. students and several
Masters students in his areas of interest. Dr. Kambhamettu received the Excellence in Research Award from NASA in 1995 while at GSFC. He has served as Area Chair, and has been technical
committee member for leading computer vision and medical conferences. He has also served as Associate Editor for the journals Pattern Recognition and Pattern Recognition Letters and the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE.
|
|
Time: TBD
|
Aude Oliva, MIT
Predicting Image Memorability
|
Abstract: When glancing at a magazine or browsing the Internet,
we are continuously exposed to images. Despite this overflow of visual information,
humans are extremely good at remembering thousands of pictures along with their visual details.
But not all images are created equal. Whereas some images stick in our minds,
others are ignored or quickly forgotten. What makes an image memorable?
Our recent work shows that one can predict image memorability,
opening a new domain of investigation at the interface between human cognition and computer vision.
Biography: After a French baccalaureate in Physics and Mathematics and a B.Sc. in Psychology, Aude Oliva received two M.Sc. degrees –in Experimental Psychology, and in Cognitive Science and a Ph.D from the Institut National Polytechnique of Grenoble, France. She joined the MIT faculty in the Department of Brain and Cognitive Sciences in 2004 and the MIT Computer Science and Artificial Intelligence Laboratory - CSAIL - in 2012.
Her research is cross-disciplinary, spanning human perception/cognition, computer vision, and cognitive neuroscience, focusing on research questions at the intersection of the three domains. Her work has been featured in the scientific and popular press and has made its way in textbooks of Perception, Cognition, Computer Vision, Design, as well as in museums of Art and Science. She is the recipient of a National Science Foundation CAREER Award. Her research programs are funded by the National Science Foundation, the National Eye Institute, QCRI, Google and Xerox.
|
|
Time: TBD
|
Sharath Pankanti, IBM T.J Watson
tilte: TBD
|
Abstract: TBD
Biography: Sharath Pankanti (sharat@us.ibm.com) is a Research Staff Member in Software Research Department at the Thomas
J. Watson Research Center. He received Ph.D. degree in Computer Science from the Michigan State University.
He is manager of Exploratory Computer Vision Group at T J Watson Research Center where he has led a number
of safety, productivity, and security focused projects. He is coauthor of more than 80 inventions and more than 125
technical papers. Dr. Pankanti has co-edited the first comprehensive book on biometrics, “Biometrics: Personal
Identification?Kluwer, 1999 and co-authored, “A Guide to Biometrics? Springer 2004 which is being used in many
undergraduate and graduate biometrics curricula. He is a member of ACM and fellow of IEEE.
|
|
Time: TBD
|
Amitha Perera, Kitware
tilte: TBD
|
Abstract: TBD
Biography: Dr. Perera’s current research is in video image analysis including moving object detection, tracking, and object recognition to derive high-level understanding from video (e.g., via activity recognition). He is also interested in active vision, robust statistics and estimation, and image segmentation. Dr. Perera's research in video is focused on developing robust algorithms that can be applied to real-world data, and in particular on developing mechanisms to cope gracefully with failure.
Dr. Perera received his B.S., B.S. (Hons), and M.S. degrees from the University of the Witwatersrand, Johannesburg, South Africa, and his Ph.D. from Rensselaer Polytechnic Institute. Prior to joining Kitware, Dr. Perera was at the Visualization and Computer Vision group at GE Global Research, where he was involved in a number of projects spanning aerial and ground video analysis, satellite and aerial image analysis, computer-aided detection in mammography, and iris biometrics.
|
|
Time: TBD
|
Vladimir Pavlovic, Rutgers University
Beyond Categorization: Ordinal Modeling in Vision and Affective Computing
|
Abstract: Categorization or classification is a common paradigm used for solving
many problems in computer vision and multimedia, ranging from object
recognition and image annotation, to the prediction of human emotions.
However, some problems can be better described as ordinal assignment
(grading or rating) tasks. I will describe two instances of such
problems, modeling of temporal phases or intensity in facial affect
and the assignment of ratings to images. Both tasks leverage a new
modeling framework for dealing with structured intensity data, known
as the Conditional Ordinal Random Field (CORF). I will explain how
the intrinsic topology of multidimensional continuous facial affect
data can be modeled by an ordinal manifold. The resulting model
attains simultaneous dynamic recognition and intensity estimation of
facial expressions of multiple emotions. The proposed method is the
first to achieve this on both deliberate as well as spontaneous facial
affect data. I will then show extensions of this approach to modeling
of action units, the pain intensity estimation, and to rating of image
annotations.
Biography: Vladimir Pavlovic is an Associate Professor in the Computer Science
Department at Rutgers University. He received the PhD in electrical
engineering from the University of Illinois in Urbana-Champaign in
1999. From 1999 until 2001 he was a member of research staff at the
Cambridge Research Laboratory, Cambridge, MA. Before joining Rutgers
in 2002, he held a research professor position in the Bioinformatics
Program at Boston University. Vladimir's research interests include
probabilistic system modeling, time-series analysis, statistical
computer vision and bioinformatics. He has published over 100
peer-reviewed papers in major computer vision, machine learning and
pattern recognition journals and conferences.
|
|
Time: TBD
|
Noah Snavely, Cornell University
The Distributed Camera
|
Abstract: We live in a world of ubiquitous imagery, in which the number of images at our fingertips is growing at a seemingly exponential rate. These images come from a wide variety of sources, including mapping sites, webcams, and millions of photographers around the world uploading billions and billions of images to social media and photo-sharing websites, such as Facebook. Taken together, these sources of imagery can be thought of as constituting a distributed camera capturing the entire world at unprecedented scale, and
continually documenting its cities, mountains, buildings, people, and events. This talk will focus on how we might use this distributed camera as a fundamental new tool for science, engineering, and environmental monitoring, and how a key problem is *calibration* -- determining the geometry of each photo, and relating it to all other photos, in an
efficient, automatic way. I will describe our work on building a massive geometric database of images, and on using this database to automatically calibrate new photos.
Biography: Noah Snavely is an assistant professor of Computer Science at Cornell University, where he has been on the faculty since 2009. He received a B.S. in Computer Science and Mathematics from the University of Arizona in 2003, and a Ph.D. in Computer Science and Engineering from the University of Washington in 2008. Noah works in computer graphics and computer vision, with a particular interest in using vast amounts of imagery from the Internet to reconstruct and visualize our world in 3D, and in creating new
tools for enabling people to capture and share their environments. His thesis work was the basis for Microsoft's Photosynth, a tool for building 3D visualizations from photo collections that has been used by many thousands of people. Noah is the recipient of a Microsoft New Faculty Fellowship and an NSF CAREER Award, and has been recognized by Technology Review's TR35.
|