| Type | Title | Homepage | Code | Code Stars |
|---|---|---|---|---|
| Best Paper | Planning-oriented Autonomous Driving | Link | Github | |
| Best Paper | Visual Programming: Compositional visual reasoning without training | Link | Github | |
| Best Paper Honorable Mention | DynIBaR: Neural Dynamic Image-Based Rendering | Link | Github | |
| Best Student Paper | 3D Registration with Maximal Cliques | Link | Github | |
| Best Student Paper Honorable Mention | DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | Link | Github |
The following CVPR2023 paper information is extracted from the following web page and saved in the papers_info.json file.
https://openaccess.thecvf.com/CVPR2023?day=all
https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
If you find any errors in the paper information or missing Githubs, you are welcome to modify the corresponding content of the papers_info_refined.json file and submit a Pull Request.
| Title | Paper | Code | Github Stars |
|---|---|---|---|
| YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors | Link | Github | |
| From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models | Link | Github | |
| Co-Training 2L Submodels for Visual Recognition | Link | Github | |
| Token Turing Machines | Link | Github | |
| How Can Objects Help Action Recognition? | Link | Github | |
| GINA-3D: Learning To Generate Implicit Neural Assets in the Wild | Link | Github | |
| Images Speak in Images: A Generalist Painter for In-Context Visual Learning | Link | Github | |
| Planning-Oriented Autonomous Driving | Link | Github | |
| Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks | Link | Github | |
| InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions | Link | Github | |
| DepGraph: Towards Any Structural Pruning | Link | Github | |
| EVA: Exploring the Limits of Masked Visual Representation Learning at Scale | Link | Github | |
| Universal Instance Perception As Object Discovery and Retrieval | Link | Github | |
| PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360Β° | Link | Github | |
| EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention | Link | Github | |
| Unifying Vision, Text, and Layout for Universal Document Processing | Link | Github | |
| ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders | Link | Github | |
| FlexiViT: One Model for All Patch Sizes | Link | Github | |
| CLIPPO: Image-and-Language Understanding From Pixels Only | Link | Github | |
| Neighborhood Attention Transformer | Link | Github | |
| SeqTrack: Sequence to Sequence Learning for Visual Object Tracking | Link | Github | |
| Deep Learning of Partial Graph Matching via Differentiable Top-K | Link | Github | |
| Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation | Link | Github | |
| Paint by Example: Exemplar-Based Image Editing With Diffusion Models | Link | Github | |
| Cut and Learn for Unsupervised Object Detection and Instance Segmentation | Link | Github | |
| Masked Image Modeling With Local Multi-Scale Reconstruction | Link | Github | |
| PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters | Link | Github | |
| Learning To Generate Image Embeddings With User-Level Differential Privacy | Link | Github | |
| Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures | Link | Github | |
| InstMove: Instance Motion for Object-Centric Video Segmentation | Link | Github | |
| Activating More Pixels in Image Super-Resolution Transformer | Link | Github | |
| VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking | Link | Github | |
| Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking | Link | Github | |
| OpenGait: Revisiting Gait Recognition Towards Better Practicality | Link | Github | |
| Run, Donβt Walk: Chasing Higher FLOPS for Faster Neural Networks | Link | Github | |
| All Are Worth Words: A ViT Backbone for Diffusion Models | Link | Github | |
| Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion | Link | Github | |
| MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis | Link | Github | |
| Mask-Free Video Instance Segmentation | Link | Github | |
| Compressing Volumetric Radiance Fields to 1 MB | Link | Github | |
| PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers | Link | Github | |
| DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network | Link | Github | |
| FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction | Link | Github | |
| Detecting Everything in the Open World: Towards Universal Object Detection | Link | Github | |
| Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning | Link | Github | |
| Cross-Domain Image Captioning With Discriminative Finetuning | Link | Github | |
| NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360Β° Views | Link | Github | |
| Scaling Language-Image Pre-Training via Masking | Link | Github | |
| Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation | Link | Github | |
| RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation | Link | Github | |
| MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors | Link | Github | |
| ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing | Link | Github | |
| BiFormer: Vision Transformer With Bi-Level Routing Attention | Link | Github | |
| All in One: Exploring Unified Video-Language Pre-Training | Link | Github | |
| Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation | Link | Github | |
| Wavelet Diffusion Models Are Fast and Scalable Image Generators | Link | Github | |
| Efficient and Explicit Modelling of Image Hierarchies for Image Restoration | Link | Github | |
| 3D Registration With Maximal Cliques | Link | Github | |
| Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering | Link | Github | |
| Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks | Link | Github | |
| DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets | Link | Github | |
| BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points | Link | Github | |
| EDICT: Exact Diffusion Inversion via Coupled Transformations | Link | Github | |
| Disentangling Writer and Character Styles for Handwriting Generation | Link | Github | |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Link | Github | |
| Conditional Image-to-Video Generation With Latent Flow Diffusion Models | Link | Github | |
| Inversion-Based Style Transfer With Diffusion Models | Link | Github | |
| Recurrent Vision Transformers for Object Detection With Event Cameras | Link | Github | |
| Dense Distinct Query for End-to-End Object Detection | Link | Github | |
| Neural Video Compression With Diverse Contexts | Link | Github | |
| Spherical Transformer for LiDAR-Based 3D Recognition | Link | Github | |
| You Only Segment Once: Towards Real-Time Panoptic Segmentation | Link | Github | |
| Referring Image Matting | Link | Github | |
| VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking | Link | Github | |
| Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation | Link | Github | |
| NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation | Link | Github | |
| High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization | Link | Github | |
| GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction | Link | Github | |
| OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering | Link | Github | |
| PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces | Link | Github | |
| MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation | Link | Github | |
| Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation | Link | Github | |
| LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs | Link | Github | |
| Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation | Link | Github | |
| GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis | Link | Github | |
| Learning a Sparse Transformer Network for Effective Image Deraining | Link | Github | |
| Visual Prompt Multi-Modal Tracking | Link | Github | |
| DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting | Link | Github | |
| HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining | Link | Github | |
| Learning Visual Representations via Language-Guided Sampling | Link | Github | |
| GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning | Link | Github | |
| MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection | Link | Github | |
| NeRF-RPN: A General Framework for Object Detection in NeRFs | Link | Github | |
| ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation | Link | Github | |
| Position-Guided Text Prompt for Vision-Language Pre-Training | Link | Github | |
| Query-Centric Trajectory Prediction | Link | Github | |
| Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need | Link | Github | |
| LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion | Link | Github | |
| Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training | Link | Github | |
| BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection | Link | Github | |
| SimpleNet: A Simple Network for Image Anomaly Detection and Localization | Link | Github | |
| Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving | Link | Github | |
| Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention | Link | Github | |
| CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion | Link | Github | |
| Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking | Link | Github | |
| Identity-Preserving Talking Face Generation With Landmark and Appearance Priors | Link | Github | |
| LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation | Link | Github | |
| Delving Into Shape-Aware Zero-Shot Semantic Segmentation | Link | Github | |
| Aligning Bag of Regions for Open-Vocabulary Object Detection | Link | Github | |
| ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation | Link | Github | |
| MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers | Link | Github | |
| Data-Driven Feature Tracking for Event Cameras | Link | Github | |
| FeatureBooster: Boosting Feature Descriptors With a Lightweight Neural Network | Link | Github | |
| Omni Aggregation Networks for Lightweight Image Super-Resolution | Link | Github | |
| Shifted Diffusion for Text-to-Image Generation | Link | Github | |
| A Generalized Framework for Video Instance Segmentation | Link | Github | |
| Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild | Link | Github | |
| LANA: A Language-Capable Navigator for Instruction Following and Generation | Link | Github | |
| Learning Generative Structure Prior for Blind Text Image Super-Resolution | Link | Github | |
| Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement | Link | Github | |
| TriDet: Temporal Action Detection With Relative Boundary Modeling | Link | Github | |
| GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds | Link | Github | |
| Fix the Noise: Disentangling Source Feature for Controllable Domain Translation | Link | Github | |
| Multimodal Prompting With Missing Modalities for Visual Recognition | Link | Github | |
| Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving | Link | Github | |
| Enhanced Training of Query-Based Object Detection via Selective Query Recollection | Link | Github | |
| Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision | Link | Github | |
| Super-Resolution Neural Operator | Link | Github | |
| Revisiting Rotation Averaging: Uncertainties and Robust Losses | Link | Github | |
| PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes | Link | Github | |
| Human Guided Ground-Truth Generation for Realistic Image Super-Resolution | Link | Github | |
| DynamicDet: A Unified Dynamic Architecture for Object Detection | Link | Github | |
| FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation | Link | Github | |
| HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization | Link | Github | |
| Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information | Link | Github | |
| UniHCP: A Unified Model for Human-Centric Perceptions | Link | Github | |
| NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images | Link | Github | |
| Adaptive Assignment for Geometry Aware Local Feature Matching | Link | Github | |
| Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs | Link | Github | |
| CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation | Link | Github | |
| Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection | Link | Github | |
| Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision | Link | Github | |
| CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search | Link | Github | |
| DNF: Decouple and Feedback Network for Seeing in the Dark | Link | Github | |
| Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing | Link | Github | |
| Scalable, Detailed and Mask-Free Universal Photometric Stereo | Link | Github | |
| Learning To Dub Movies via Hierarchical Prosody Models | Link | Github | |
| BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation | Link | Github | |
| Generic-to-Specific Distillation of Masked Autoencoders | Link | Github | |
| EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding | Link | Github | |
| Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning | Link | Github | |
| Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? | Link | Github | |
| Unifying Short and Long-Term Tracking With Graph Hierarchies | Link | Github | |
| Hierarchical Fine-Grained Image Forgery Detection and Localization | Link | Github | |
| CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution | Link | Github | |
| Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting | Link | Github | |
| Masked Image Training for Generalizable Deep Image Denoising | Link | Github | |
| CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP | Link | Github | |
| Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring | Link | Github | |
| Multimodal Industrial Anomaly Detection via Hybrid Fusion | Link | Github | |
| LinK: Linear Kernel for LiDAR-Based 3D Perception | Link | Github | |
| V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting | Link | Github | |
| Meta Architecture for Point Cloud Analysis | Link | Github | |
| CF-Font: Content Fusion for Few-Shot Font Generation | Link | Github | |
| ViTs for SITS: Vision Transformers for Satellite Image Time Series | Link | Github | |
| ISBNet: A 3D Point Cloud Instance Segmentation Network With Instance-Aware Sampling and Box-Aware Dynamic Convolution | Link | Github | |
| A Light Weight Model for Active Speaker Detection | Link | Github | |
| Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark | Link | Github | |
| DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation | Link | Github | |
| Understanding Imbalanced Semantic Segmentation Through Neural Collapse | Link | Github | |
| MP-Former: Mask-Piloted Transformer for Image Segmentation | Link | Github | |
| Hierarchical Dense Correlation Distillation for Few-Shot Segmentation | Link | Github | |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Link | Github | |
| IFSeg: Image-Free Semantic Segmentation via Vision-Language Model | Link | Github | |
| AutoFocusFormer: Image Segmentation off the Grid | Link | Github | |
| EqMotion: Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning | Link | Github | |
| GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds | Link | Github | |
| Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models | Link | Github | |
| Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models | Link | Github | |
| Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation | Link | Github | |
| Two-View Geometry Scoring Without Correspondences | Link | Github | |
| CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability | Link | Github | |
| Learning Semantic Relationship Among Instances for Image-Text Matching | Link | Github | |
| LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation | Link | Github | |
| Robust Mean Teacher for Continual and Gradual Test-Time Adaptation | Link | Github | |
| AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders | Link | Github | |
| Directional Connectivity-Based Segmentation of Medical Images | Link | Github | |
| Zero-Shot Referring Image Segmentation With Global-Local Context Features | Link | Github | |
| Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank | Link | Github | |
| Dynamic Focus-Aware Positional Queries for Semantic Segmentation | Link | Github | |
| Vision Transformer With Super Token Sampling | Link | Github | |
| Sampling Is Matter: Point-Guided 3D Human Mesh Reconstruction | Link | Github | |
| 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds | Link | Github | |
| PROB: Probabilistic Objectness for Open World Object Detection | Link | Github | |
| Benchmarking Robustness of 3D Object Detection to Common Corruptions | Link | Github | |
| Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images | Link | Github | |
| MARLIN: Masked Autoencoder for Facial Video Representation LearnINg | Link | Github | |
| ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing | Link | Github | |
| Interactive and Explainable Region-Guided Radiology Report Generation | Link | Github | |
| SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection | Link | Github | |
| Real-Time 6K Image Rescaling With Rate-Distortion Optimization | Link | Github | |
| Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring | Link | Github | |
| Frequency-Modulated Point Cloud Rendering With Easy Editing | Link | Github | |
| Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning | Link | Github | |
| BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models | Link | Github | |
| LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling | Link | Github | |
| DynaFed: Tackling Client Data Heterogeneity With Global Dynamics | Link | Github | |
| Frame Flexible Network | Link | Github | |
| GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training | Link | Github | |
| Collaboration Helps Camera Overtake LiDAR in 3D Detection | Link | Github | |
| CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning | Link | Github | |
| RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving | Link | Github | |
| Generalized Relation Modeling for Transformer Tracking | Link | Github | |
| WildLight: In-the-Wild Inverse Rendering With a Flashlight | Link | Github | |
| Equiangular Basis Vectors | Link | Github | |
| DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium | Link | Github | |
| Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification | Link | Github | |
| Diversity-Aware Meta Visual Prompting | Link | Github | |
| MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training | Link | Github | |
| Texts as Images in Prompt Tuning for Multi-Label Image Recognition | Link | Github | |
| PointConvFormer: Revenge of the Point-Based Convolution | Link | Github | |
| Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection | Link | Github | |
| RILS: Masked Visual Reconstruction in Language Semantic Space | Link | Github | |
| Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization | Link | Github | |
| StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN | Link | Github | |
| SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation | Link | Github | |
| Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning | Link | Github | |
| Handwritten Text Generation From Visual Archetypes | Link | Github | |
| Post-Training Quantization on Diffusion Models | Link | Github | |
| DPF: Learning Dense Prediction Fields With Weak Supervision | Link | Github | |
| OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer | Link | Github | |
| SCPNet: Semantic Scene Completion on Point Cloud | Link | Github | |
| Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation | Link | Github | |
| Novel Class Discovery for 3D Point Cloud Semantic Segmentation | Link | Github | |
| Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With Cross-Scale Distortion Awareness | Link | Github | |
| M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis | Link | Github | |
| Masked and Adaptive Transformer for Exemplar Based Image Translation | Link | Github | |
| DCFace: Synthetic Face Generation With Dual Condition Diffusion Model | Link | Github | |
| T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection | Link | Github | |
| SMPConv: Self-Moving Point Representations for Continuous Convolution | Link | Github | |
| N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution | Link | Github | |
| A Large-Scale Homography Benchmark | Link | Github | |
| GeoMVSNet: Learning Multi-View Stereo With Geometry Perception | Link | Github | |
| Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression | Link | Github | |
| FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks | Link | Github | |
| Learning Transferable Spatiotemporal Representations From Natural Script Knowledge | Link | Github | |
| Rethinking Federated Learning With Domain Shift: A Prototype View | Link | Github | |
| Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization | Link | Github | |
| Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection | Link | Github | |
| Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning | Link | Github | |
| Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time | Link | Github | |
| Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation | Link | Github | |
| Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization | Link | Github | |
| Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process | Link | Github | |
| A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image | Link | Github | |
| DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects | Link | Github | |
| Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models | Link | Github | |
| Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation | Link | Github | |
| Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation | Link | Github | |
| Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark | Link | Github | |
| VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud | Link | Github | |
| Sharpness-Aware Gradient Matching for Domain Generalization | Link | Github | |
| Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration | Link | Github | |
| Decoupled Multimodal Distilling for Emotion Recognition | Link | Github | |
| Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation | Link | Github | |
| An Image Quality Assessment Dataset for Portraits | Link | Github | |
| Leveraging Hidden Positives for Unsupervised Semantic Segmentation | Link | Github | |
| Semantic-Conditional Diffusion Networks for Image Captioning | Link | Github | |
| STMixer: A One-Stage Sparse Action Detector | Link | Github | |
| Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset | Link | Github | |
| Joint Visual Grounding and Tracking With Natural Language Specification | Link | Github | |
| Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization | Link | Github | |
| Power Bundle Adjustment for Large-Scale 3D Reconstruction | Link | Github | |
| Rethinking Domain Generalization for Face Anti-Spoofing: Separability and Alignment | Link | Github | |
| A Unified Pyramid Recurrent Network for Video Frame Interpolation | Link | Github | |
| Revisiting Reverse Distillation for Anomaly Detection | Link | Github | |
| SOOD: Towards Semi-Supervised Oriented Object Detection | Link | Github | |
| POEM: Reconstructing Hand in a Point Embedded Multi-View Stereo | Link | Github | |
| Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors | Link | Github | |
| QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation | Link | Github | |
| MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID | Link | Github | |
| Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment | Link | Github | |
| Task Residual for Tuning Vision-Language Models | Link | Github | |
| Structured Sparsity Learning for Efficient Video Super-Resolution | Link | Github | |
| Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior | Link | Github | |
| Imitation Learning As State Matching via Differentiable Physics | Link | Github | |
| PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration | Link | Github | |
| Twin Contrastive Learning With Noisy Labels | Link | Github | |
| TarViS: A Unified Approach for Target-Based Video Segmentation | Link | Github | |
| Clover: Towards a Unified Video-Language Alignment and Fusion Model | Link | Github | |
| Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need | Link | Github | |
| Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers | Link | Github | |
| Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images | Link | Github | |
| Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos | Link | Github | |
| Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision | Link | Github | |
| Interactive Segmentation As Gaussion Process Classification | Link | Github | |
| PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation | Link | Github | |
| Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization | Link | Github | |
| Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo | Link | Github | |
| TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets | Link | Github | |
| Exploring Discontinuity for Video Frame Interpolation | Link | Github | |
| Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections | Link | Github | |
| Affordance Grounding From Demonstration Video To Target Image | Link | Github | |
| Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection | Link | Github | |
| How to Backdoor Diffusion Models? | Link | Github | |
| LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising | Link | Github | |
| Neuron Structure Modeling for Generalizable Remote Physiological Measurement | Link | Github | |
| Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation | Link | Github | |
| STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection | Link | Github | |
| RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor | Link | Github | |
| Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation | Link | Github | |
| Learning Federated Visual Prompt in Null Space for MRI Reconstruction | Link | Github | |
| Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution | Link | Github | |
| Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective | Link | Github | |
| PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery | Link | Github | |
| MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences | Link | Github | |
| CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection | Link | Github | |
| Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective | Link | Github | |
| Polynomial Implicit Neural Representations for Large Diverse Datasets | Link | Github | |
| 3D-Aware Multi-Class Image-to-Image Translation With NeRFs | Link | Github | |
| Masked Motion Encoding for Self-Supervised Video Representation Learning | Link | Github | |
| Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning | Link | Github | |
| Towards Scalable Neural Representation for Diverse Videos | Link | Github | |
| CLOTH4D: A Dataset for Clothed Human Reconstruction | Link | Github | |
| Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration | Link | Github | |
| Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations | Link | Github | |
| Robust Test-Time Adaptation in Dynamic Scenarios | Link | Github | |
| Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification | Link | Github | |
| FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training | Link | Github | |
| MOSO: Decomposing MOtion, Scene and Object for Video Prediction | Link | Github | |
| ALOFT: A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform for Domain Generalization | Link | Github | |
| A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others | Link | Github | |
| SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency | Link | Github | |
| Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data | Link | Github | |
| Viewpoint Equivariance for Multi-View 3D Object Detection | Link | Github | |
| DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection | Link | Github | |
| Regularizing Second-Order Influences for Continual Learning | Link | Github | |
| Backdoor Defense via Adaptively Splitting Poisoned Dataset | Link | Github | |
| Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method | Link | Github | |
| JacobiNeRF: NeRF Shaping With Mutual Information Gradients | Link | Github | |
| Accelerating Vision-Language Pretraining With Free Language Modeling | Link | Github | |
| Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection | Link | Github | |
| PA&DA: Jointly Sampling Path and Data for Consistent NAS | Link | Github | |
| An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling | Link | Github | |
| QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity | Link | Github | |
| Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | Link | Github | |
| ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection | Link | Github | |
| Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation | Link | Github | |
| AdaptiveMix: Improving GAN Training via Feature Space Shrinkage | Link | Github | |
| Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation | Link | Github | |
| Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction | Link | Github | |
| A Strong Baseline for Generalized Few-Shot Semantic Segmentation | Link | Github | |
| FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection | Link | Github | |
| Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation | Link | Github | |
| Siamese DETR | Link | Github | |
| Distribution Shift Inversion for Out-of-Distribution Prediction | Link | Github | |
| Towards Unified Scene Text Spotting Based on Sequence Generation | Link | Github | |
| CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer | Link | Github | |
| Supervised Masked Knowledge Distillation for Few-Shot Transformers | Link | Github | |
| MELTR: Meta Loss Transformer for Learning To Fine-Tune Video Foundation Models | Link | Github | |
| Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors | Link | Github | |
| KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation | Link | Github | |
| Adaptive Human Matting for Dynamic Videos | Link | Github | |
| Making Vision Transformers Efficient From a Token Sparsification View | Link | Github | |
| ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection | Link | Github | |
| Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning | Link | Github | |
| ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion | Link | Github | |
| Weakly Supervised Posture Mining for Fine-Grained Classification | Link | Github | |
| H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction | Link | Github | |
| E2PN: Efficient SE(3)-Equivariant Point Network | Link | Github | |
| Audio-Visual Grouping Network for Sound Localization From Mixtures | Link | Github | |
| StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping | Link | Github | |
| MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset | Link | Github | |
| Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation | Link | Github | |
| Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation | Link | Github | |
| Glocal Energy-Based Learning for Few-Shot Open-Set Recognition | Link | Github | |
| Indiscernible Object Counting in Underwater Scenes | Link | Github | |
| Curricular Object Manipulation in LiDAR-Based Object Detection | Link | Github | |
| TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification | Link | Github | |
| Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification | Link | Github | |
| HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models | Link | Github | |
| Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | Link | Github | |
| Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection | Link | Github | |
| DAA: A Delta Age AdaIN Operation for Age Estimation via Binary Code Transformer | Link | Github | |
| Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution | Link | Github | |
| The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks | Link | Github | |
| Dynamic Conceptional Contrastive Learning for Generalized Category Discovery | Link | Github | |
| Class Adaptive Network Calibration | Link | Github | |
| Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation | Link | Github | |
| FAC: 3D Representation Learning via Foreground Aware Feature Contrast | Link | Github | |
| NICO++: Towards Better Benchmarking for Domain Generalization | Link | Github | |
| Bridging Search Region Interaction With Template for RGB-T Tracking | Link | Github | |
| Rotation-Invariant Transformer for Point Cloud Matching | Link | Github | |
| Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm | Link | Github | |
| CXTrack: Improving 3D Point Cloud Tracking With Contextual Information | Link | Github | |
| CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment | Link | Github | |
| Revisiting Residual Networks for Adversarial Robustness | Link | Github | |
| Upcycling Models Under Domain and Category Shift | Link | Github | |
| Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video | Link | Github | |
| PDPP:Projected Diffusion for Procedure Planning in Instructional Videos | Link | Github | |
| NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation | Link | Github | |
| Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification | Link | Github | |
| Detecting Backdoors in Pre-Trained Encoders | Link | Github | |
| Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution | Link | Github | |
| TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision | Link | Github | |
| Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container | Link | Github | |
| VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution | Link | Github | |
| Re-Thinking Federated Active Learning Based on Inter-Class Diversity | Link | Github | |
| Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction | Link | Github | |
| Federated Incremental Semantic Segmentation | Link | Github | |
| Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces | Link | Github | |
| Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems | Link | Github | |
| Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition | Link | Github | |
| Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data | Link | Github | |
| Optimization-Inspired Cross-Attention Transformer for Compressive Sensing | Link | Github | |
| Context-Based Trit-Plane Coding for Progressive Image Compression | Link | Github | |
| Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation | Link | Github | |
| Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection | Link | Github | |
| GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency | Link | Github | |
| BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation | Link | Github | |
| On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering | Link | Github | |
| Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral Hand Disentanglement | Link | Github | |
| sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model | Link | Github | |
| Reliability in Semantic Segmentation: Are We on the Right Track? | Link | Github | |
| Diversity-Measurable Anomaly Detection | Link | Github | |
| ABCD: Arbitrary Bitwise Coefficient for De-Quantization | Link | Github | |
| Block Selection Method for Using Feature Norm in Out-of-Distribution Detection | Link | Github | |
| Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution | Link | Github | |
| Two-Shot Video Object Segmentation | Link | Github | |
| MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition | Link | Github | |
| Extracting Class Activation Maps From Non-Discriminative Features As Well | Link | Github | |
| Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception | Link | Github | |
| MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos | Link | Github | |
| Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction | Link | Github | |
| Visual Prompt Tuning for Generative Transfer Learning | Link | Github | |
| Improved Test-Time Adaptation for Domain Generalization | Link | Github | |
| Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring | Link | Github | |
| Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition | Link | Github | |
| Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis | Link | Github | |
| DiGA: Distil To Generalize and Then Adapt for Domain Adaptive Semantic Segmentation | Link | Github | |
| Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models | Link | Github | |
| SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation | Link | Github | |
| DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection | Link | Github | |
| On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks | Link | Github | |
| ScarceNet: Animal Pose Estimation With Scarce Annotations | Link | Github | |
| Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric | Link | Github | |
| Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction | Link | Github | |
| Preserving Linear Separability in Continual Learning by Backward Feature Projection | Link | Github | |
| Generalizable Implicit Neural Representations via Instance Pattern Composers | Link | Github | |
| Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching | Link | Github | |
| Progressive Neighbor Consistency Mining for Correspondence Pruning | Link | Github | |
| Trainable Projected Gradient Method for Robust Fine-Tuning | Link | Github | |
| Independent Component Alignment for Multi-Task Learning | Link | Github | |
| Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit | Link | Github | |
| DualVector: Unsupervised Vector Font Synthesis With Dual-Part Representation | Link | Github | |
| Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images | Link | Github | |
| Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection | Link | Github | |
| Partial Network Cloning | Link | Github | |
| Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark | Link | Github | |
| Object Detection With Self-Supervised Scene Adaptation | Link | Github | |
| Generative Bias for Robust Visual Question Answering | Link | Github | |
| MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation | Link | Github | |
| Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning | Link | Github | |
| Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures | Link | Github | |
| SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence | Link | Github | |
| B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution | Link | Github | |
| High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors | Link | Github | |
| DivClust: Controlling Diversity in Deep Clustering | Link | Github | |
| Large-Scale Training Data Search for Object Re-Identification | Link | Github | |
| Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning | Link | Github | |
| CREPE: Can Vision-Language Foundation Models Reason Compositionally? | Link | Github | |
| Semi-Supervised Domain Adaptation With Source Label Adaptation | Link | Github | |
| StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning | Link | Github | |
| Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples | Link | Github | |
| ScanDMM: A Deep Markov Model of Scanpath Prediction for 360Β° Images | Link | Github | |
| PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification | Link | Github | |
| DIP: Dual Incongruity Perceiving Network for Sarcasm Detection | Link | Github | |
| Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos | Link | Github | |
| PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer | Link | Github | |
| Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation | Link | Github | |
| VQACL: A Novel Visual Question Answering Continual Learning Setting | Link | Github | |
| RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval | Link | Github | |
| PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations | Link | Github | |
| MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection | Link | Github | |
| The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training | Link | Github | |
| Computationally Budgeted Continual Learning: What Does Matter? | Link | Github | |
| PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers | Link | Github | |
| Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network | Link | Github | |
| R2Former: Unified Retrieval and Reranking Transformer for Place Recognition | Link | Github | |
| Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization | Link | Github | |
| Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement | Link | Github | |
| DistilPose: Tokenized Pose Regression With Heatmap Distillation | Link | Github | |
| Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration | Link | Github | |
| DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks | Link | Github | |
| BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency | Link | Github | |
| Representation Learning for Visual Object Tracking by Masked Appearance Transfer | Link | Github | |
| AnchorFormer: Point Cloud Completion From Discriminative Nodes | Link | Github | |
| TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation | Link | Github | |
| Proximal Splitting Adversarial Attack for Semantic Segmentation | Link | Github | |
| NVTC: Nonlinear Vector Transform Coding | Link | Github | |
| CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose | Link | Github | |
| Enhancing the Self-Universality for Transferable Targeted Attacks | Link | Github | |
| Randomized Adversarial Training via Taylor Expansion | Link | Github | |
| Long Range Pooling for 3D Large-Scale Scene Understanding | Link | Github | |
| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training | Link | Github | |
| Federated Domain Generalization With Generalization Adjustment | Link | Github | |
| CoMFormer: Continual Learning in Semantic and Panoptic Segmentation | Link | Github | |
| Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning | Link | Github | |
| MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering | Link | Github | |
| STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition | Link | Github | |
| An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions | Link | Github | |
| Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions | Link | Github | |
| Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning | Link | Github | |
| Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation | Link | Github | |
| Bias Mimicking: A Simple Sampling Approach for Bias Mitigation | Link | Github | |
| OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields | Link | Github | |
| Multi-Level Logit Distillation | Link | Github | |
| Real-Time Evaluation in Online Continual Learning: A New Hope | Link | Github | |
| Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction | Link | Github | |
| CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network With Large Input | Link | Github | |
| Boosting Video Object Segmentation via Space-Time Correspondence Learning | Link | Github | |
| Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation | Link | Github | |
| TINC: Tree-Structured Implicit Neural Compression | Link | Github | |
| Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels | Link | Github | |
| DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting | Link | Github | |
| Large-Capacity and Flexible Video Steganography via Invertible Neural Network | Link | Github | |
| VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization | Link | Github | |
| LINe: Out-of-Distribution Detection by Leveraging Important Neurons | Link | Github | |
| Neural Transformation Fields for Arbitrary-Styled Font Generation | Link | Github | |
| Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning | Link | Github | |
| Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation | Link | Github | |
| Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation | Link | Github | |
| FCC: Feature Clusters Compression for Long-Tailed Visual Recognition | Link | Github | |
| Neural Vector Fields: Implicit Representation by Explicit Learning | Link | Github | |
| Learning Action Changes by Measuring Verb-Adverb Textual Relationships | Link | Github | |
| Make Landscape Flatter in Differentially Private Federated Learning | Link | Github | |
| Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization | Link | Github | |
| Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning | Link | Github | |
| Knowledge Combination To Learn Rotated Detection Without Rotated Annotation | Link | Github | |
| Uncurated Image-Text Datasets: Shedding Light on Demographic Bias | Link | Github | |
| Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion | Link | Github | |
| PointCert: Point Cloud Classification With Deterministic Certified Robustness Guarantees | Link | Github | |
| Advancing Visual Grounding With Scene Knowledge: Benchmark and Method | Link | Github | |
| Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt | Link | Github | |
| 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention | Link | Github | |
| Self-Supervised 3D Scene Flow Estimation Guided by Superpoints | Link | Github | |
| End-to-End Video Matting With Trimap Propagation | Link | Github | |
| Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement | Link | Github | |
| Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection | Link | Github | |
| RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts | Link | Github | |
| Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising | Link | Github | |
| Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network | Link | Github | |
| MAGVLT: Masked Generative Vision-and-Language Transformer | Link | Github | |
| Focused and Collaborative Feedback Integration for Interactive Image Segmentation | Link | Github | |
| OpenMix: Exploring Outlier Samples for Misclassification Detection | Link | Github | |
| Adaptive Data-Free Quantization | Link | Github | |
| VideoTrack: Learning To Track Objects via Video Transformer | Link | Github | |
| Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module | Link | Github | |
| Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation | Link | Github | |
| Contrastive Grouping With Transformer for Referring Image Segmentation | Link | Github | |
| Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation | Link | Github | |
| 3D-POP β An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture | Link | Github | |
| PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering | Link | Github | |
| Towards Open-World Segmentation of Parts | Link | Github | |
| PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning | Link | Github | |
| Quantum Multi-Model Fitting | Link | Github | |
| Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment | Link | Github | |
| Practical Network Acceleration With Tiny Sets | Link | Github | |
| Feature Alignment and Uniformity for Test Time Adaptation | Link | Github | |
| Finding Geometric Models by Clustering in the Consensus Space | Link | Github | |
| VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation | Link | Github | |
| Meta-Learning With a Geometry-Adaptive Preconditioner | Link | Github | |
| Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning | Link | Github | |
| Physical-World Optical Adversarial Attacks on 3D Face Recognition | Link | Github | |
| Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning | Link | Github | |
| On Calibrating Semantic Segmentation Models: Analyses and an Algorithm | Link | Github | |
| Binary Latent Diffusion | Link | Github | |
| Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! | Link | Github | |
| MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection | Link | Github | |
| Behavioral Analysis of Vision-and-Language Navigation Agents | Link | Github | |
| FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding | Link | Github | |
| Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation | Link | Github | |
| Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections | Link | Github | |
| Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection | Link | Github | |
| Non-Contrastive Unsupervised Learning of Physiological Signals From Video | Link | Github | |
| Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning | Link | Github | |
| Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Transfer | Link | Github | |
| Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning | Link | Github | |
| PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation | Link | Github | |
| Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation | Link | Github | |
| Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning | Link | Github | |
| Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification | Link | Github | |
| Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning | Link | Github | |
| Abstract Visual Reasoning: An Algebraic Approach for Solving Ravenβs Progressive Matrices | Link | Github | |
| Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup | Link | Github | |
| Boosting Verified Training for Robust Image Classifications via Abstraction | Link | Github | |
| DaFKD: Domain-Aware Federated Knowledge Distillation | Link | Github | |
| Resource-Efficient RGBD Aerial Tracking | Link | Github | |
| BiasBed β Rigorous Texture Bias Evaluation | Link | Github | |
| Progressive Open Space Expansion for Open-Set Model Attribution | Link | Github | |
| Harmonious Feature Learning for Interactive Hand-Object Pose Estimation | Link | Github | |
| Masked Images Are Counterfactual Samples for Robust Fine-Tuning | Link | Github | |
| MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning | Link | Github | |
| CFA: Class-Wise Calibrated Fair Adversarial Training | Link | Github | |
| Regularization of Polynomial Networks for Image Recognition | Link | Github | |
| SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples | Link | Github | |
| Depth Estimation From Indoor Panoramas With Neural Scene Representation | Link | Github | |
| Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions | Link | Github | |
| EfficientSCI: Densely Connected Network With Space-Time Factorization for Large-Scale Video Snapshot Compressive Imaging | Link | Github | |
| GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task | Link | Github | |
| Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval | Link | Github | |
| Towards Practical Plug-and-Play Diffusion Models | Link | Github | |
| Where We Are and What Weβre Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes | Link | Github | |
| PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training | Link | Github | |
| From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm | Link | Github | |
| Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning With Hyperspherical Embeddings | Link | Github | |
| Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning | Link | Github | |
| Layout-Based Causal Inference for Object Navigation | Link | Github | |
| Ensemble-Based Blackbox Attacks on Dense Prediction | Link | Github | |
| Adversarial Robustness via Random Projection Filters | Link | Github | |
| NLOST: Non-Line-of-Sight Imaging With Transformer | Link | Github | |
| Fast Contextual Scene Graph Generation With Unbiased Context Augmentation | Link | Github | |
| Event-Based Blurry Frame Interpolation Under Blind Exposure | Link | Github | |
| Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning | Link | Github | |
| GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting | Link | Github | |
| Balanced Product of Calibrated Experts for Long-Tailed Recognition | Link | Github | |
| Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions | Link | Github | |
| Annealing-Based Label-Transfer Learning for Open World Object Detection | Link | Github | |
| Make-a-Story: Visual Memory Conditioned Consistent Story Generation | Link | Github | |
| Revisiting Prototypical Network for Cross Domain Few-Shot Learning | Link | Github | |
| Perception and Semantic Aware Regularization for Sequential Confidence Calibration | Link | Github | |
| Semi-Weakly Supervised Object Kinematic Motion Prediction | Link | Github | |
| Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding | Link | Github | |
| MaLP: Manipulation Localization Using a Proactive Scheme | Link | Github | |
| Adjustment and Alignment for Unbiased Open Set Domain Adaptation | Link | Github | |
| Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions | Link | Github | |
| Sliced Optimal Partial Transport | Link | Github | |
| HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions | Link | Github | |
| Trap Attention: Monocular Depth Estimation With Manual Traps | Link | Github | |
| GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection | Link | Github | |
| Learning From Noisy Labels With Decoupled Meta Label Purifier | Link | Github | |
| Local Connectivity-Based Density Estimation for Face Clustering | Link | Github | |
| Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography | Link | Github | |
| Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks | Link | Github | |
| A Probabilistic Framework for Lifelong Test-Time Adaptation | Link | Github | |
| PointCMP: Contrastive Mask Prediction for Self-Supervised Learning on Point Cloud Videos | Link | Github | |
| Deep Polarization Reconstruction With PDAVIS Events | Link | Github | |
| Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting | Link | Github | |
| Probabilistic Debiasing of Scene Graphs | Link | Github | |
| PMR: Prototypical Modal Rebalance for Multimodal Learning | Link | Github | |
| Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning | Link | Github | |
| HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering | Link | Github | |
| Document Image Shadow Removal Guided by Color-Aware Background | Link | Github | |
| DLBD: A Self-Supervised Direct-Learned Binary Descriptor | Link | Github | |
| Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning | Link | Github | |
| Learning Debiased Representations via Conditional Attribute Interpolation | Link | Github | |
| Bayesian Posterior Approximation With Stochastic Ensembles | Link | Github | |
| Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning | Link | Github | |
| Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning | Link | Github | |
| Noisy Correspondence Learning With Meta Similarity Correction | Link | Github | |
| RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases | Link | Github | |
| Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval | Link | Github | |
| BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration | Link | Github | |
| Are Data-Driven Explanations Robust Against Out-of-Distribution Data? | Link | Github | |
| Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection | Link | Github | |
| Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning | Link | Github | |
| High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition | Link | Github | |
| A Bag-of-Prototypes Representation for Dataset-Level Applications | Link | Github | |
| Neural Dependencies Emerging From Learning Massive Categories | Link | Github | |
| Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking | Link | Github | |
| CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset | Link | Github | |
| Balanced Energy Regularization Loss for Out-of-Distribution Detection | Link | Github | |
| Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training | Link | Github | |
| Masked Representation Learning for Domain Generalized Stereo Matching | Link | Github | |
| Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization | Link | Github | |
| Genie: Show Me the Data for Quantization | Link | Github | |
| G-MSM: Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors | Link | Github | |
| TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers | Link | Github | |
| Hierarchical Prompt Learning for Multi-Task Learning | Link | Github | |
| Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising | Link | Github | |
| Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration | Link | Github | |
| Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization | Link | Github | |
| Towards Effective Visual Representations for Partial-Label Learning | Link | Github | |
| Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation | Link | Github | |
| Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation | Link | Github | |
| Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic Segmentation | Link | Github | |
| Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint | Link | Github | |
| Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval | Link | Github | |
| Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder | Link | Github | |
| Towards Bridging the Performance Gaps of Joint Energy-Based Models | Link | Github | |
| Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection | Link | Github | |
| AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection | Link | Github | |
| ConStruct-VL: Data-Free Continual Structured VL Concepts Learning | Link | Github | |
| X-Pruner: eXplainable Pruning for Vision Transformers | Link | Github | |
| Efficient Mask Correction for Click-Based Interactive Image Segmentation | Link | Github | |
| Dynamic Aggregated Network for Gait Recognition | Link | Github | |
| Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery | Link | Github | |
| Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor | Link | Github | |
| Adaptive Plasticity Improvement for Continual Learning | Link | Github | |
| Jedi: Entropy-Based Localization and Removal of Adversarial Patches | Link | Github | |
| BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling | Link | Github | |
| Leverage Interactive Affinity for Affordance Learning | Link | Github | |
| Evolved Part Masking for Self-Supervised Learning | Link | Github | |
| CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning | Link | Github | |
| High-Fidelity Event-Radiance Recovery via Transient Event Frequency | Link | Github | |
| Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures | Link | Github | |
| Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns | Link | Github | |
| Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains | Link | Github | |
| A Soma Segmentation Benchmark in Full Adult Fly Brain | Link | Github | |
| KD-DLGAN: Data Limited Image Generation via Knowledge Distillation | Link | Github | |
| PIVOT: Prompting for Video Continual Learning | Link | Github | |
| Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks | Link | Github | |
| L-CoIns: Language-Based Colorization With Instance Awareness | Link | Github | |
| Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph | Link | Github | |
| Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration | Link | Github | |
| Dense Network Expansion for Class Incremental Learning | Link | Github | |
| Unsupervised Intrinsic Image Decomposition With LiDAR Intensity | Link | Github | |
| Neuralizer: General Neuroimage Analysis Without Re-Training | Link | Github | |
| Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | Link | Github | |
| Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling | Link | Github | |
| Modular Memorability: Tiered Representations for Video Memorability Prediction | Link | Github | |
| Federated Learning With Data-Agnostic Distribution Fusion | Link | Github | |
| Four-View Geometry With Unknown Radial Distortion | Link | Github | |
| Manipulating Transfer Learning for Property Inference | Link | Github | |
| BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image | Link | Github | |
| 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud | Link | Github | |
| Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks | Link | Github | |
| Towards Professional Level Crowd Annotation of Expert Domain Data | Link | Github | |
| Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation | Link | Github | |
| Similarity Metric Learning for RGB-Infrared Group Re-Identification | Link | Github | |
| On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer | Link | Github | |
| Camouflaged Instance Segmentation via Explicit De-Camouflaging | Link | Github | |
| Global Vision Transformer Pruning With Hessian-Aware Saliency | Link | Github | |
| DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation | Link | Github | |
| ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer | Link | Github | |
| AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning | Link | Github | |
| Simulated Annealing in Early Layers Leads to Better Generalization | Link | Github | |
| Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding | Link | Github | |
| Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation | Link | Github | |
| Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation | Link | Github | |
| MEDIC: Remove Model Backdoors via Importance Driven Cloning | Link | Github | |
| Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives | Link | Github | |
| Adaptive Graph Convolutional Subspace Clustering | Link | Github | |
| Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language | Link | Github | |
| Correlational Image Modeling for Self-Supervised Visual Pre-Training | Link | Github | |
| Text With Knowledge Graph Augmented Transformer for Video Captioning | Link | Github | |
| Panoptic Video Scene Graph Generation | Link | Github | |
| DartBlur: Privacy Preservation With Detection Artifact Suppression | Link | Github | |
| IDGI: A Framework To Eliminate Explanation Noise From Integrated Gradients | Link | Github | |
| Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity | Link | Github | |
| Vector Quantization With Self-Attention for Quality-Independent Representation Learning | Link | Github | |
| Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses | Link | Github | |
| DETRs With Hybrid Matching | Link | Github | |
| GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods | Link | Github | |
| AltFreezing for More General Video Face Forgery Detection | Link | Github | |
| Heterogeneous Continual Learning | Link | Github | |
| EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets | Link | Github | |
| Efficient Movie Scene Detection Using State-Space Transformers | Link | Github | |
| Private Image Generation With Dual-Purpose Auxiliary Classifier | Link | Github | |
| BASiS: Batch Aligned Spectral Embedding Space | Link | Github | |
| A Large-Scale Robustness Analysis of Video Action Recognition Models | Link | Github | |
| Neumann Network With Recursive Kernels for Single Image Defocus Deblurring | Link | Github | |
| Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning | Link | Github | |
| ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling | Link | Github | |
| Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization | Link | Github | |
| S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning | Link | Github | |
| DINN360: Deformable Invertible Neural Network for Latitude-Aware 360Β° Image Rescaling | Link | Github | |
| Patch-Craft Self-Supervised Training for Correlated Image Denoising | Link | Github | |
| Learning Decorrelated Representations Efficiently Using Fast Fourier Transform | Link | Github | |
| AstroNet: When Astrocyte Meets Artificial Neural Network | Link | Github | |
| PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding | Link | Github | |
| Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge | Link | Github | |
| Polarized Color Image Denoising | Link | Github |