Publications

(For a full list see below)

Conferences

IEEE Conference on Computer Vision and Pattern Recognition

Light Field Neural Rendering
M. Suhail, C. Esteves, L. Sigal, A. Makadia

Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene. Methods based on geometric reconstruction need only sparse views, but cannot accurately model non-Lambertian effects. We introduce a model that combines the strengths and mitigates the limitations of these two directions. By operating on a four-dimensional representation of the light field, our model learns to represent view-dependent effects accurately. By enforcing geometric constraints during training and inference, the scene geometry is implicitly learned from a sparse set of views. Concretely, we introduce a two-stage transformer-based model that first aggregates features along epipolar lines, then aggregates features along reference views to produce the color of a target ray. Our model outperforms the state-of-the-art on multiple forward-facing and 360° datasets, with larger margins on scenes with severe view-dependent variations.

GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation
X. He, B. Wandt, H. Rhodin

Segmenting an image into its parts is a frequent preprocess for high-level vision tasks such as image editing. However, annotating masks for supervised training is expensive. Weakly-supervised and unsupervised methods exist, but they depend on the comparison of pairs of images, such as from multi-views, frames of videos, and image augmentation, which limits their applicability. To address this, we propose a GAN-based approach that generates images conditioned on latent masks, thereby alleviating full or weak annotations required in previous approaches. We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner on latent keypoints that define the position of parts explicitly. Without requiring supervision of masks or points, this strategy increases robustness to viewpoint and object positions changes. It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.

LOLNeRF: Learn from One Look
D. Rebain, M. Matthews, K. M. Yi, D. Lagun, A. Tagliasacchi

We present a method for learning a generative 3D model based on neural radiance fields, trained solely from data with only single views of each object. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. We show that, unlike existing methods, one does not need multi-view data to achieve this goal. Specifically, we show that by reconstructing many images aligned to an approximate canonical pose with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models shape and appearance for a class of objects. We demonstrate this by training models to reconstruct object categories using datasets that contain only one view of each subject without depth or geometry information. Our experiments show that we achieve state-of-the-art results in novel view synthesis and competitive results for monocular depth prediction.

Conference on Neural Information Processing Systems

TriBERT: Human-centric Audio-visual Representation Learning
T. Rahman, M. Yang, L. Sigal

The recent success of transformer models in language, such as BERT, has motivated the use of such architectures for multi-modal feature learning and tasks. However, most multi-modal variants (e.g., ViLBERT) have limited themselves to visuallinguistic data. Relatively few have explored its use in audio-visual modalities, and none, to our knowledge, illustrate them in the context of granular audio-visual detection or segmentation tasks such as sound source separation and localization. In this work, we introduce TriBERT – a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention. The use of pose keypoints is inspired by recent works that illustrate that such representations can significantly boost performance in many audio-visual scenarios where often one or more persons are responsible for the sound explicitly (e.g., talking) or implicitly (e.g., sound produced as a function of human manipulating an object). From a technical perspective, as part of the TriBERT architecture, we introduce a learned visual tokenization scheme based on spatial attention and leverage weak-supervision to allow granular cross-modal interactions for visual and pose modalities. Further, we supplement learning with sound-source separation loss formulated across all three streams. We pre-train our model on the large MUSIC21 dataset and demonstrate improved performance in audio-visual sound source separation on that dataset as well as other datasets through fine-tuning. In addition, we show that the learned TriBERT representations are generic and significantly improve performance on other audio-visual tasks such as cross-modal audio-visual-pose retrieval by as much as 66.7% in top-1 accuracy.

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose
S.-Y. Su, F. Yu, M. Zollhoefer, H. Rhodin

While deep learning reshaped the classical motion capture pipeline with feed-forward networks, generative models are required to recover fine alignment via iterative refinement. Unfortunately, the existing models are usually hand-crafted or learned in controlled conditions, only applicable to limited domains. We propose a method to learn a generative neural body model from unlabelled monocular videos by extending Neural Radiance Fields (NeRFs). We equip them with a skeleton to apply to time-varying and articulated motion. A key insight is that implicit models require the inverse of the forward kinematics used in explicit surface models. Our reparameterization defines spatial latent variables relative to the pose of body parts and thereby overcomes ill-posed inverse operations with an overparameterization. This enables learning volumetric body shape and appearance from scratch while jointly refining the articulated pose; all without ground truth labels for appearance, pose, or 3D shape on the input videos. When used for novel-view-synthesis and motion capture, our neural model improves accuracy on diverse datasets.

IEEE Conference on Computer Vision and Pattern Recognition

PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers
F. Yu, M. Salzmann, P. Fua, H. Rhodin

Local processing is an essential feature of CNNs and other neural network architectures - it is one of the reasons why they work so well on images where relevant information is, to a large extent, local. However, perspective effects stemming from the projection in a conventional camera vary for different global positions in the image. We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry - and show that accounting for the perspective consistently improves the accuracy of state-of-the-art 3D pose reconstruction methods. PCLs are modular neural network layers, which, when inserted into existing CNN and MLP architectures, deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network unchanged. We demonstrate that PCL leads to improved 3D human pose reconstruction accuracy for CNN architectures that use cropping operations, such as spatial transformer networks (STN), and, somewhat surprisingly, MLPs used for 2D-to-3D keypoint lifting. Our conclusion is that it is important to utilize camera calibration information when available, for classical and deep-learning-based computer vision alike. PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.

Energy-Based Learning for Scene Graph Generation
M. Suhail, A. Mittal, B. Siddiquie, C. Broaddus, J. Eledath, G. Medioni, L. Sigal

Traditional scene graph generation methods are trained using cross-entropy losses that treat objects and relationships as independent entities. Such a formulation, however, ignores the structure in the output space, in an inherently structured prediction problem. In this work, we introduce a novel energy-based learning framework for generating scene graphs. The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space. This additional constraint in the learning framework acts as an inductive bias and allows models to learn efficiently from a small number of labels. We use the proposed energy-based framework to train existing state-of-the-art models and obtain a significant performance improvement, of up to 21% and 27%, on the Visual Genome and GQA benchmark datasets, respectively. Furthermore, we showcase the learning efficiency of the proposed framework by demonstrating superior performance in the zero- and few-shot settings where data is scarce.

MIST: Multiple Instance Spatial Transformer Network
B. Angles, Y. Jin, S. Kornblith, A. Tagliasacchi, K. M. Yi

We propose a deep network that can be trained to tackle image reconstruction and classification problems that involve detection of multiple object instances, without any supervision regarding their whereabouts. The network learns to extract the most significant top-K patches, and feeds these patches to a task-specific network -- e.g., auto-encoder or classifier -- to solve a domain specific problem. The challenge in training such a network is the non-differentiable top-K selection process. To address this issue, we lift the training optimization problem by treating the result of top-K selection as a slack variable, resulting in a simple, yet effective, multi-stage training. Our method is able to learn to detect recurrent structures in the training dataset by learning to reconstruct images. It can also learn to localize structures when only knowledge on the occurrence of the object is provided, and in doing so it outperforms the state-of-the-art.

DeRF: Decomposed Radiance Fields
D. Rebain, W. Jiang, S. Yazdani, K. Li, K. Yi, and A. Tagliasacchi

With the advent of Neural Radiance Fields (NeRF), neural networks can now render novel views of a 3D scene with quality that fools the human eye. Yet, generating these images is very computationally intensive, limiting their applicability in practical scenarios. In this paper, we propose a technique based on spatial decomposition capable of mitigating this issue. Our key observation is that there are diminishing returns in employing larger (deeper and/or wider) networks. Hence, we propose to spatially decompose a scene and dedicate smaller networks for each decomposed part. When working together, these networks can render the whole scene. This allows us near-constant inference time regardless of the number of decomposed parts. Moreover, we show that a Voronoi spatial decomposition is preferable for this purpose, as it is provably compatible with the Painter's Algorithm for efficient and GPU-friendly rendering. Our experiments show that for real-world scenes, our method provides up to 3x more efficient inference than NeRF (with the same rendering quality), or an improvement of up to 1.0~dB in PSNR (for the same inference cost).

VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
J. Choi, K. M. Yi, J. Kim, J. Choo, B. Kim, J.-Y. Chang, Y. Gwon, H. J. Chang

Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes’ rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms should be considered together when estimating the probability of a classifier making a mistake for a given sample; i) probability of mislabelling a class, ii) likelihood of the data given a predicted class, and iii) the prior probability on the abundance of a predicted class. Implementing these terms requires a generative model and an intractable likelihood estimation. Therefore, we train a Variational Auto Encoder (VAE) for this purpose. To further tie the VAE with the classifier and facilitate VAE training, we use the classifiers’ deep feature representations as input to the VAE. By considering all three probabilities, among them, especially the data imbalance, we can substantially improve the potential of existing methods under limited data budget. We show that our method can be applied to classification tasks on multiple different datasets – including one that is a real-world dataset with heavy data imbalance – significantly outperforming the state of the art.

Full List

Light Field Neural Rendering
M. Suhail, C. Esteves, L. Sigal, A. Makadia
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation
X. He, B. Wandt, H. Rhodin
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

LOLNeRF: Learn from One Look
D. Rebain, M. Matthews, K. M. Yi, D. Lagun, A. Tagliasacchi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Weakly-supervised Audio-visual Sound Source Detection and Separation
T. Rahman and L. Sigal
IEEE International Conference on Multimedia and Expo (ICME)

TriBERT: Human-centric Audio-visual Representation Learning
T. Rahman, M. Yang, L. Sigal
Conference on Neural Information Processing Systems (NeurIPS), 2021

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose
S.-Y. Su, F. Yu, M. Zollhoefer, H. Rhodin
Conference on Neural Information Processing Systems (NeurIPS), 2021

PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers
F. Yu, M. Salzmann, P. Fua, H. Rhodin
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Energy-Based Learning for Scene Graph Generation
M. Suhail, A. Mittal, B. Siddiquie, C. Broaddus, J. Eledath, G. Medioni, L. Sigal
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Canonical Capsules: Unsupervised Capsules in Canonical Pose
W. Sun, A. Tagliasacch, B. Deng, S. Sabou, S. Yazdani, G. E. Hinton, and K. M. Yi
arXiv:2012.04718, 2020

LatentKeypointGAN: Controlling GANs via Latent Keypoints
X. He and B. Wandt and H. Rhodin
arXiv:2103.15812, 2021

MIST: Multiple Instance Spatial Transformer Network
B. Angles, Y. Jin, S. Kornblith, A. Tagliasacchi, K. M. Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

DeRF: Decomposed Radiance Fields
D. Rebain, W. Jiang, S. Yazdani, K. Li, K. Yi, and A. Tagliasacchi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
J. Choi, K. M. Yi, J. Kim, J. Choo, B. Kim, J.-Y. Chang, Y. Gwon, H. J. Chang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning
W. Sun, W. Jiang, E. Trulls, A. Tagliasacchi, K. M. Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Image Matching Across Wide Baselines: From Paper to Practice
Y. Jin, D. Mishkin, A. Mishchuk, J. Matas, P. Fua, K. M. Yi, E. Trulls
International Journal of Computer Vision (IJCV), 2020

Optimizing Through Learned Errors for Accurate Sports Field Registration
W. Jiang, J. C. G. Higuera, B. Angles, W. Sun, M. Javan, K. M. Yi
Winter Conference on Applications of Computer Vision (WACV), 2020

Linearized Multi-Sampling for Differentiable Image Transformation
W. Jiang, W. Sun, A. Tagliasacchi, E. Trulls, K. M. Yi
IEEE/CVF International Conference on Computer Vision (ICCV), 2019

Mixture-Kernel Graph Attention Network for Situation Recognition
M. Suhail and L. Sigal
IEEE/CVF International Conference on Computer Vision (ICCV), 2019

ATTENTIONRNN: A Structured Spatial Attention Mechanism
S. Khandelwal and L. Sigal
IEEE/CVF International Conference on Computer Vision (ICCV), 2019

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
T. Rahman, B. Xu and L. Sigal
IEEE/CVF International Conference on Computer Vision (ICCV), 2019

GraphGROUND: Graph-based Language Grounding
M. Bajaj, L. Wang and L. Sigal
IEEE/CVF International Conference on Computer Vision (ICCV), 2019

DwNet: Dense warp-based network for pose-guided human video generation
P. Zablotskaia, A. Siarohin, B. Zhao and L. Sigal
British Machine Vision Conference (BMVC), 2019

Spatio-temporal Relational Reasoning for Video Question Answering
G. Singh, L. Sigal and J. Little
British Machine Vision Conference (BMVC), 2019

A Less Biased Evaluation of Out-of-distribution Sample Detectors
A. Shafaei, Mark Schmidt, James J. Little
British Machine Vision Conference (BMVC), 2019

Pan-tilt-zoom SLAM for Sports Videos
J Lu, J Chen and J J. Little
British Machine Vision Conference (BMVC), 2019

Image Generation from Layout
B. Zhao, L. Meng, W. Yin and L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Modular Generative Adversarial Networks
B. Zhao, B. Chang, Z. Jie and L. Sigal
European Conference on Computer Vision (ECCV), 2018

Probabilistic Video Generation using Holistic Attribute Control
J. He, A. Lehrmann, J. Marino, G. Mori and L. Sigal
European Conference on Computer Vision (ECCV), 2018

A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
P. Dogan, B. Li, L. Sigal and M. Gross
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Show Me a Story: Towards Coherent Neural Story Illustration
H. Ravi, L. Wang, C Muniz, L. Sigal, D. Metaxas and M. Kapadia
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Predicting Personality from Book Preferences with User-Generated Content Labels
N. Annalyn, M. W. Bos, L. Sigal and B. Li
IEEE Transactions on Affective Computing (TAC), 2018

Where should cameras look at soccer games: improving smoothness using the overlapped hidden Markov model
J Chen and J J. Little
Compuer Vision and Image Understanding (2017)

Story Albums: Creating Fictional Stories from Personal Photograph Sets
O. Radiano, Y. Graber, M. Mahler, L. Sigal and A. Shamir
Computer Graphics Forum, Volume 36, 2017

Non-parametric Structured Outputs Networks
A. Lehrmann and L. Sigal
Neural Information Processing Systems (NIPS), 2017

Visual Reference Resolution using Attention Memory for Visual Dialog
P. H. Seo, A. Lehrmann, B. Han and L. Sigal
Neural Information Processing Systems (NIPS), 2017

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
F. Xiao, L. Sigal and Y. J. Lee
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Story Albums: Creating Fictional Stories from Personal Photograph Sets
O. Radiano, Y. Graber, M. Mahler, L. Sigal and A. Shamir
Computer Graphics Forum, Volume 36, 2017

Learn How to Choose: Independent Detectors versus Composite Visual Phrases

Winter Conference on Applications of Computer Vision (WACV), 2017

Where should cameras look at soccer games: improving smoothness using the overlapped hidden Markov model
J Chen and J J. Little
Compuer Vision and Image Understanding (2017)

Learning Online Smooth Predictions for Realtime Camera Planning using Recurrent Decision Trees
J Chen, H M. Le. P Carr, Y Yue, J J. Little
Computer Vision and Pattern Recognition (2016)

Real-time Physics-based Motion Capture with Sparse Sensors
S. Andrews, I. Huerta, T. Komura, L. Sigal and K. Mitchell
European Conference on Visual Media Production (CVMP), 2016

Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
B. Xu, Y. Fu, Y.-G. Jiang, B. Li and L. Sigal
IEEE Transactions on Affective Computing (TAC), 2016

Learning Language-Visual Embedding for Movie Understanding with Natural-Language
A. Torabi, N. Tandon and L. Sigal
arXiv:1609.081241, 2016

Semi-supervised Vocabulary-informed Learning
Y. Fu and L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Learning Activity Progression in LSTMs for Activity Detection and Early Detection
S. Ma, L. Sigal and S. Sclaroff
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Harnessing Object and Scene Semantics for Large-Scale Video Understanding
Z. Wu, Y. Fu, Y.-G. Jiang and L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Video Emotion Recognition with Transferred Deep Feature Encodings
B. Xu, Y. Fu, Y.-G. Jiang, B. Li and L. Sigal
ACM International Conference in Multimedia Retrieval (ICMR), 2016

Knowledge Transfer with Interactive Learning of Semantics Relationships
J. Choi, S. Hwang, L. Sigal and L. Davis
AAAI Conference on Artificial Intelligence (AAAI), 2016

Exploiting View-Specific Appearance Similarities Across Classes for Zero-shot Pose Prediction: A Metric Learning Approach
A. Kuznetsova, S. Hwang, B. Rosenhahn and L. Sigal
AAAI Conference on Artificial Intelligence (AAAI), 2016

Learning to Generate Posters of Scientific Papers
Y. Qiang, Y. Fu, Y. Guo, Z.-H. Zhou and L. Sigal
AAAI Conference on Artificial Intelligence (AAAI), 2016

Play and Learn: Using Video Games to Train Computer Vision Models
A. Shafaei, J. J. Little, Mark Schmidt
BMVC (2016)

Real-Time Human Motion Capture with Multiple Depth Cameras
A. Shafaei, J. J. Little
CRV (2016)

Learning Online Smooth Predictions for Realtime Camera Planning using Recurrent Decision Trees
J Chen, H M. Le. P Carr, Y Yue, J J. Little
Computer Vision and Pattern Recognition (2016)

Storyline Representation of Egocentric Videos with an Application to Story-based Search
B. Xiong, G. Kim and L. Sigal
IEEE International Conference on Computer Vision (ICCV), 2015

Learning from Synthetic Data Using a Stacked Multichannel Autoencoder
X. Zhang, Y. Fu, S. Jiang, L. Sigal and G. Agam
IEEE International Conference on Machine Learning and Applications (ICMLA), 2015

Cross-Domain Matching with Squared-Loss Mutual Information
M. Yamada, L. Sigal, M. Raptis, M. Toyoda, Y. Chang and M. Sugiyama
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2015

A Perceptual Control Space for Garment Simulation
L. Sigal, M. Mahler, S. Diaz, K. McIntosh, E. Carter, T. Richards and J. Hodgins
ACM Transactions on Graphics (Proc. SIGGRAPH), 2015

Discovering Collective Narratives of Theme Parks from Large Collections of Visitors Photo Streams
G. Kim and L. Sigal
KDD 2015

Hierarchical Maximum-Margin Clustering
G.-T. Zhou, S. Hwang, M. Schmidt, L. Sigal and G. Mori
arXiv:1502.01827, 2015

Joint Photo Stream and Blog Post Summarization and Exploration
G. Kim, S. Moon, L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Ranking and Retrival of Image Sequences from Multiple Paragraph Queries
G. Kim, S. Moon, L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Space-Time Tree Ensemble for Action Recognition
S. Ma, L. Sigal, S. Sclaroff
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos
A. Kuznetsova, S.-J. Hwang, B. Rosenhahn, L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

Learning to Select and Order Vacation Photographs
F. Sadeghi, J. R. Tena, A. Farhadi, L. Sigal
IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

Family Member Identification from Photo Collections
Q. Dai, P. Carr, L. Sigal, D. Hoiem
IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

Unlabelled 3D Motion Examples Improve Cross View Action Recognition
A. Gupta, A. Shafaei, J. J. Little and R. J. Woodham
BMVC (2014)

A Unified Semantic Embedding: Relating Taxonomies and Attributes
S.-J. Hwang, L. Sigal
Neural Information Processing Systems (NIPS), 2014

Parameterizing Object Detectors in the Continuous Pose Space
K. He, L. Sigal, S. Sclaroff
European Conference on Computer Vision (ECCV), 2014

Nonparametric Clustering with Distance Dependent Hierarchies
S. Ghosh, M. Raptis, L. Sigal, E. Sudderth
Conference on Uncertainty in Artificial Intelligence (UAI), 2014

Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction
G. Kim, L. Sigal, E. P. Xing
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

Domain Adaptation for Structured Regression
M. Yamada, Y. Chang and L. Sigal
International Journal of Computer Vision (IJCV), Special Issue on Domain Adaptation for Vision Applications, 2014

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
M. Yamada, W. Jitkrittum, L. Sigal, E. P. Xing and M. Sugiyama
Neural Computation (NC), 26(1):185-207, 2014

Covariate Shift Adaptation for Discriminative 3D Pose Estimation
M. Yamada, L. Sigal and M. Raptis
EEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2013

Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
N. Shapovalova, M. Raptis, L. Sigal
G. Mori, Neural Information Processing Systems (NIPS), 2013

From Subcategories to Visual Composites: A Multi-Level Framework for Object Detection
T. Lan, M. Raptis, L. Sigal, G. Mori
IEEE International Conference on Computer Vision (ICCV), 2013

Poselet Key-framing: A Model for Human Activity Recognition
M. Raptis, L. Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Dynamical Simulation Priors for Human Motion Tracking
M. Vondrak, L. Sigal and O. C. Jenkins
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(1):52-65, 2013

Canonical Locality Preserving Latent Variable Model for Discriminative Pose Inference
Y. Tian, L. Sigal, F. De la Torre and Y. Jia
Image and Vision Computing (IVC), 31(3):223-230, 2013

Destination Flow for Crowd Simulation
S. Pellegrini, J. Gall, L. Sigal, L. van Gool
Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams (ARTEMIS'12), 2012

No Bias Left Behind: Covariate Shift Adaptation for Discriminative 3D Pose Estimation
M. Yamada, L. Sigal, M. Raptis
European Conference on Computer Vision (ECCV), 2012

Multi-linear Data-Driven Dynamic Hair Model with Efficient Hair-Body Collision Handling
P. Guan, L. Sigal, V. Reznitskaya, J. K. Hodgins
ACM/Eurographics Symposium on Computer Animation (SCA), 2012

Video-based 3D Motion Capture through Biped Control
M. Vondrak, L. Sigal, J. K. Hodgins and Odest Jenkins
ACM Transactions on Graphics (Proc. SIGGRAPH), 2012

Human Context: Modeling human-human interactions for monocular 3D pose estimation
M. Andriluka and L. Sigal
VII Conference on Articulated Motion and Deformable Objects (AMDO), 2012

Social Roles in Hierarchical Models for Human Activity Recognition
T. Lan, L. Sigal and G. Mor
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012

Human attributes from 3D pose tracking
M. Livne, L. Sigal, N. Troje and D. Fleet
Computer Vision and Image Understanding (CVIU), 116:648-660, 2012

Shared kernel information embedding for discriminative inference
R. Memisevic, L. Sigal and D. Fleet
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(4):778-790, 2012

Loose-limbed People: Estimating Human Pose and Motion using Non-parametric Belief Propagation
L. Sigal, M. Isard, H. Haussecker and M. J. Black
International Journal of Computer Vision (IJCV), 98(1):15-48, 2012

Recognizing Character-directed Utterances in Multi-child Interactions
H. Hajishirzi, J. Lehman, K. Kumatani, L. Sigal, and J. Hodgins
late-breaking report section of Human Robot Interaction (HRI), 2012

Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
M. Zeiler, G. Taylor, L. Sigal, I. Matthews and R. Fergus
Neural Information Processing Systems (NIPS), 2011

Visual Analysis of Humans: Looking at People
T. Moeslund, A. Hilton, V. Krüger and L. Sigal
ISBN 978-0-85729-996-3. To be published by Springer Verlag in October 2011

Benchmark Datasets for Pose Estimation and Tracking
M. Andriluka, L. Sigal and M. J. Black, Visual Analysis of Humans, Looking at People, T. Moeslund, A. Hilton, V. Krüger and L. Sigal
ISBN 978-0-85729-996-3. To be published by Springer Verlag in October 2011

Human Pose Estimation
L. Sigal
Encyclopedia of Computer Vision, Springer, 2011

Motion Capture from Body-Mounted Cameras
. Shiratori, H. S. Park, L. Sigal, Y. Sheikh and J. K. Hodgins
ACM Transactions on Graphics (Proc. SIGGRAPH), July 2011

Inferring 3D Body Pose Using Variational Semi-parametric Regression
Y. Tian, Y. Jia, Y. Shi, Y. Liu, J. Hao and L. Sigal
IEEE International Conference on Image Processing (ICIP), 2011

Latent Gaussian Mixture Regression for Human Pose Estimation
Y. Tian, L. Sigal, H. Badino, F. De la Torre and Y. Liu
Asian Conference on Computer Vision (ACCV), 2010

Human Attributes from 3D Pose Tracking
L. Sigal, D. Fleet, N. Troje, M. Livne
European Conference on Computer Vision, ECCV 2010.

Stable Spaces for Real-time Clothing
E. de Aguiar, L. Sigal, A. Treuille and J. K. Hodgins
ACM Trans. Graphics (Proc. SIGGRAPH), July 2010

Dynamical Binary Latent Variable Models for 3D Human Pose Tracking
G. Taylor, L. Sigal, D. Fleet, G. Hinton
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010

HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion
L. Sigal, A. Balan and M. J. Black
International Journal of Computer Vision (IJCV), Special Issue on Evaluation of Articulated Human Motion and Pose Estimation, 2010

Estimating Contact Dynamics
M. Brubaker, L. Sigal, D. Fleet
IEEE International Conference on Computer Vision, ICCV 2009

Dynamics and Control of Multibody Systems
M. Vondrak, L. Sigal and O. C. Jenkins
Motion Control, A. Lazinica (Eds), ISBN978-953-7619-X-X, 2009

Shared Kernel Information Embedding for Discriminative Inference
L. Sigal, R. Memisevic, D. Fleet
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009

Video-Based People Tracking
M. Brubaker, L. Sigal and D. Fleet
Handbook on Ambient Intelligence and Smart Environments, H. Nakashima, H. Aghajan, and J.C. Augusto (Eds), Springer Verlag, 2009

Physical Simulation for Probabilistic Motion Tracking
M. Vondrak, L. Sigal and O. C. Jenkins
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008