| 2025-07-23 |
Yume: An Interactive World Generation Model |
Xiaofeng Mao et.al. |
2507.17744 |
null |
| 2025-07-23 |
STQE: Spatial-Temporal Quality Enhancement for G-PCC Compressed Dynamic Point Clouds |
Tian Guo et.al. |
2507.17522 |
null |
| 2025-07-23 |
Unfolding Data Quality Dimensions in Practice: A Survey |
Vasileios Papastergios et.al. |
2507.17507 |
null |
| 2025-07-23 |
DFDNet: Dynamic Frequency-Guided De-Flare Network |
Minglong Xue et.al. |
2507.17489 |
null |
| 2025-07-23 |
Development of a Standardized Testing Environment for QRNGs based on Semiconductor Laser Phase Noise |
Matthias Ostner et.al. |
2507.17471 |
null |
| 2025-07-23 |
Parametric Integration with Neural Integral Operators |
Christoph Schied et.al. |
2507.17440 |
null |
| 2025-07-23 |
CAPRI-CT: Causal Analysis and Predictive Reasoning for Image Quality Optimization in Computed Tomography |
Sneha George Gnanakalavathy et.al. |
2507.17420 |
null |
| 2025-07-23 |
Perceptual Classifiers: Detecting Generative Images using Perceptual Features |
Krishna Srikar Durbha et.al. |
2507.17240 |
null |
| 2025-07-23 |
Hierarchical Fusion and Joint Aggregation: A Multi-Level Feature Representation Method for AIGC Image Quality Assessment |
Linghe Meng et.al. |
2507.17182 |
null |
| 2025-07-23 |
UNICE: Training A Universal Image Contrast Enhancer |
Ruodai Cui et.al. |
2507.17157 |
null |
| 2025-07-22 |
A Tutorial on MRI Reconstruction: From Modern Methods to Clinical Implications |
Tolga Çukur et.al. |
2507.16715 |
null |
| 2025-07-22 |
Introducing Quality Estimation to Machine Translation Post-editing Workflow: An Empirical Study on Its Usefulness |
Siqi Liu et.al. |
2507.16515 |
null |
| 2025-07-22 |
DenseSR: Image Shadow Removal as Dense Prediction |
Yu-Fan Lin et.al. |
2507.16472 |
null |
| 2025-07-22 |
Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model |
Mingtao Guo et.al. |
2507.16341 |
null |
| 2025-07-22 |
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs |
Zitong Xu et.al. |
2507.16193 |
null |
| 2025-07-22 |
LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image Generation |
Jyun-Ze Tang et.al. |
2507.16154 |
null |
| 2025-07-21 |
Improving Personalized Image Generation through Social Context Feedback |
Parul Gupta et.al. |
2507.16095 |
null |
| 2025-07-21 |
"Just a strange pic": Evaluating 'safety' in GenAI Image safety annotation tasks from diverse annotators' perspectives |
Ding Wang et.al. |
2507.16033 |
null |
| 2025-07-21 |
From Logic to Language: A Trust Index for Problem Solving with LLMs |
Tehseen Rug et.al. |
2507.16028 |
null |
| 2025-07-21 |
Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars |
Marcel C. Bühler et.al. |
2507.15979 |
null |
| 2025-07-21 |
A Lightweight Face Quality Assessment Framework to Improve Face Verification Performance in Real-Time Screening Applications |
Ahmed Aman Ibrahim et.al. |
2507.15961 |
null |
| 2025-07-21 |
Efficient Face Image Quality Assessment via Self-training and Knowledge Distillation |
Wei Sun et.al. |
2507.15709 |
null |
| 2025-07-23 |
Visual-Language Model Knowledge Distillation Method for Image Quality Assessment |
Yongkang Hou et.al. |
2507.15680 |
null |
| 2025-07-21 |
SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models |
Giordano d'Aloisio et.al. |
2507.15663 |
null |
| 2025-07-21 |
tiDAS: a time invariant approximation of the Delay and Sum algorithm for biomedical ultrasound PSF reconstructions |
Chiara Razzetta et.al. |
2507.15464 |
null |
| 2025-07-21 |
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation |
Hui-Guan Yuan et.al. |
2507.15396 |
null |
| 2025-07-21 |
BEAM-Net: A Deep Learning Framework with Bone Enhancement Attention Mechanism for High Resolution High Frame Rate Ultrasound Beamforming |
Midhila Madhusoodanan et.al. |
2507.15306 |
null |
| 2025-07-23 |
EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Contro |
An Wang et.al. |
2507.15292 |
null |
| 2025-07-21 |
Conditional Video Generation for High-Efficiency Video Compression |
Fangqiu Yi et.al. |
2507.15269 |
null |
| 2025-07-20 |
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback |
Qiaoyu Tang et.al. |
2507.15024 |
null |
| 2025-07-20 |
FastSmoothSAM: A Fast Smooth Method For Segment Anything Model |
Jiasheng Xu et.al. |
2507.15008 |
null |
| 2025-07-20 |
Rate-Distortion-Perception Trade-off with Strong Realism Constraints: Role of Side Information and Common Randomness |
Yassine Hamdi et.al. |
2507.14825 |
null |
| 2025-07-20 |
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models |
Beier Zhu et.al. |
2507.14797 |
null |
| 2025-07-19 |
QUTCC: Quantile Uncertainty Training and Conformal Calibration for Imaging Inverse Problems |
Cassandra Tong Ye et.al. |
2507.14760 |
null |
| 2025-07-19 |
Benchmarking GANs, Diffusion Models, and Flow Matching for T1w-to-T2w MRI Translation |
Andrea Moschetto et.al. |
2507.14575 |
null |
| 2025-07-19 |
Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate Adaptation |
Han Gong et.al. |
2507.14454 |
null |
| 2025-07-19 |
Adaptive 3D Gaussian Splatting Video Streaming |
Han Gong et.al. |
2507.14432 |
null |
| 2025-07-18 |
Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution |
Weiming Ren et.al. |
2507.14367 |
null |
| 2025-07-21 |
Lessons from the TREC Plain Language Adaptation of Biomedical Abstracts (PLABA) track |
Brian Ondov et.al. |
2507.14096 |
null |
| 2025-07-18 |
D2IP: Deep Dynamic Image Prior for 3D Time-sequence Pulmonary Impedance Imaging |
Hao Fang et.al. |
2507.14046 |
null |
| 2025-07-18 |
Converting T1-weighted MRI from 3T to 7T quality using deep learning |
Malo Gicquel et.al. |
2507.13782 |
null |
| 2025-07-18 |
Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis |
Tongtong Su et.al. |
2507.13753 |
null |
| 2025-07-18 |
ATRO: A Fast Solver-Free Algorithm for Topology and Routing Optimization of Reconfigurable Datacenter Networks |
Yingming Mao et.al. |
2507.13717 |
null |
| 2025-07-18 |
Global Modeling Matters: A Fast, Lightweight and Effective Baseline for Efficient Image Restoration |
Xingyu Jiang et.al. |
2507.13663 |
null |
| 2025-07-18 |
Isotropic Remeshing with Inter-Angle Optimization |
Hanbing Zheng et.al. |
2507.13641 |
null |
| 2025-07-18 |
Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition |
Cheng-Hung Hu et.al. |
2507.13626 |
null |
| 2025-07-18 |
Efficient Burst Super-Resolution with One-step Diffusion |
Kento Kawai et.al. |
2507.13607 |
null |
| 2025-07-18 |
TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting |
Kaiyuan Tang et.al. |
2507.13586 |
null |
| 2025-07-17 |
$\nabla$ NABLA: Neighborhood Adaptive Block-Level Attention |
Dmitrii Mikhailov et.al. |
2507.13546 |
null |
| 2025-07-17 |
IConMark: Robust Interpretable Concept-Based Watermark For AI Images |
Vinu Sankar Sadasivan et.al. |
2507.13407 |
null |
| 2025-07-17 |
Taming Diffusion Transformer for Real-Time Mobile Video Generation |
Yushu Wu et.al. |
2507.13343 |
null |
| 2025-07-17 |
Label-Consistent Dataset Distillation with Detector-Guided Refinement |
Yawen Zou et.al. |
2507.13074 |
null |
| 2025-07-17 |
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes |
Zhou Feng et.al. |
2507.12932 |
null |
| 2025-07-17 |
DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment |
Junjie Gao et.al. |
2507.12796 |
null |
| 2025-07-17 |
Local Representative Token Guided Merging for Text-to-Image Generation |
Min-Jeong Lee et.al. |
2507.12771 |
null |
| 2025-07-17 |
HairShifter: Consistent and High-Fidelity Video Hair Transfer via Anchor-Guided Animation |
Wangzheng Shi et.al. |
2507.12758 |
null |
| 2025-07-17 |
Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images |
Zahra TehraniNasab et.al. |
2507.12698 |
null |
| 2025-07-16 |
TRIQA: Image Quality Assessment by Contrastive Pretraining on Ordered Distortion Triplets |
Rajesh Sureddi et.al. |
2507.12687 |
null |
| 2025-07-16 |
InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion |
Ananya Raghu et.al. |
2507.12669 |
null |
| 2025-07-16 |
Pathology-Guided Virtual Staining Metric for Evaluation and Training |
Qiankai Wang et.al. |
2507.12624 |
null |
| 2025-07-16 |
FADE: Adversarial Concept Erasure in Flow Models |
Zixuan Fu et.al. |
2507.12283 |
null |
| 2025-07-16 |
Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese |
Yikang Liu et.al. |
2507.12260 |
null |
| 2025-07-16 |
MambaRate: Speech Quality Assessment Across Different Sampling Rates |
Panos Kakoulidis et.al. |
2507.12090 |
null |
| 2025-07-16 |
CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos |
Wei Sun et.al. |
2507.11900 |
null |
| 2025-07-15 |
JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs |
Junyi Fan et.al. |
2507.11636 |
null |
| 2025-07-14 |
Expert Operational GANS: Towards Real-Color Underwater Image Restoration |
Ozer Can Devecioglu et.al. |
2507.11562 |
null |
| 2025-07-14 |
3D Wavelet Latent Diffusion Model for Whole-Body MR-to-CT Modality Translation |
Jiaxu Zheng et.al. |
2507.11557 |
null |
| 2025-07-15 |
CATVis: Context-Aware Thought Visualization |
Tariq Mehmood et.al. |
2507.11522 |
null |
| 2025-07-15 |
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge |
Marvin Sach et.al. |
2507.11306 |
null |
| 2025-07-15 |
Robust ID-Specific Face Restoration via Alignment Learning |
Yushun Fang et.al. |
2507.10943 |
null |
| 2025-07-15 |
Evaluating Generated Commit Messages with Large Language Models |
Qunhong Zeng et.al. |
2507.10906 |
null |
| 2025-07-15 |
Digital defocus aberration interference for automated optical microscopy |
Haowen Zhou et.al. |
2507.10867 |
null |
| 2025-07-16 |
Text-Visual Semantic Constrained AI-Generated Image Quality Assessment |
Qiang Li et.al. |
2507.10432 |
null |
| 2025-07-14 |
Spatial Lifting for Dense Prediction |
Mingzhi Xu et.al. |
2507.10222 |
null |
| 2025-07-14 |
Nonlinear Spectral Fusion Super-Resolution Fluorescence Microscopy based on Progressively Saturated Upconversion Nanoparticles |
Yongtao Liu et.al. |
2507.10129 |
null |
| 2025-07-15 |
Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS) in Edge Iterative MRI Lesion Localization System (EdgeIMLocSys) |
Guohao Huo et.al. |
2507.09995 |
null |
| 2025-07-14 |
Aligning Generative Speech Enhancement with Human Preferences via Direct Preference Optimization |
Haoyang Li et.al. |
2507.09929 |
null |
| 2025-07-15 |
IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution |
Sejin Park et.al. |
2507.09923 |
null |
| 2025-07-13 |
Towards Robust RTC in Sparse LEO Constellations |
Aashish Gottipati et.al. |
2507.09798 |
null |
| 2025-07-13 |
Continental scale habitat modelling with artificial intelligence and multimodal earth observation |
Sara Si-Moussi et.al. |
2507.09732 |
null |
| 2025-07-13 |
Hybrid Quantum-Classical Generative Adversarial Networks with Transfer Learning |
Asma Al-Othni et.al. |
2507.09706 |
null |
| 2025-07-13 |
prNet: Data-Driven Phase Retrieval via Stochastic Refinement |
Mehmet Onurcan Kaya et.al. |
2507.09608 |
null |
| 2025-07-13 |
Demystifying Flux Architecture |
Or Greenberg et.al. |
2507.09595 |
null |
| 2025-07-13 |
WordCraft: Interactive Artistic Typography with Attention Awareness and Noise Blending |
Zhe Wang et.al. |
2507.09573 |
null |
| 2025-07-13 |
RectifiedHR: High-Resolution Diffusion via Energy Profiling and Adaptive Guidance Scheduling |
Ankit Sanjyal et.al. |
2507.09441 |
null |
| 2025-07-12 |
Deep Image Prior Assisted ISAR Imaging for Missing Data Case |
Necmettin Bayar et.al. |
2507.09393 |
null |
| 2025-07-11 |
LLMCup: Ranking-Enhanced Comment Updating with LLMs |
Hua Ge et.al. |
2507.08671 |
null |
| 2025-07-11 |
Refraction corrected specular beamforming applied to cortical bone enhances interface visibility of bone-soft tissues interfaces |
Amadou S. Dia et.al. |
2507.08497 |
null |
| 2025-07-11 |
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers |
Wongi Jeong et.al. |
2507.08422 |
null |
| 2025-07-11 |
Enforcing Speech Content Privacy in Environmental Sound Recordings using Segment-wise Waveform Reversal |
Modan Tailleur et.al. |
2507.08412 |
null |
| 2025-07-11 |
From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning |
Sen Wang et.al. |
2507.08380 |
null |
| 2025-07-11 |
Unsupervised Methods for Video Quality Improvement: A Survey of Restoration and Enhancement Techniques |
Alexandra Malyugina et.al. |
2507.08375 |
null |
| 2025-07-11 |
CoCo-Bot: Energy-based Composable Concept Bottlenecks for Interpretable Generative Models |
Sangwon Kim et.al. |
2507.08334 |
null |
| 2025-07-10 |
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling |
Haoyu Wu et.al. |
2507.07982 |
null |
| 2025-07-10 |
Assessing the Alignment of Audio Representations with Timbre Similarity Ratings |
Haokun Tian et.al. |
2507.07764 |
null |
| 2025-07-10 |
Energy Efficient p-Circuits for Generative Neural Networks |
Lakshmi A. Ghantasala et.al. |
2507.07763 |
null |
| 2025-07-10 |
IRAF-SLAM: An Illumination-Robust and Adaptive Feature-Culling Front-End for Visual SLAM in Challenging Environments |
Thanh Nguyen Canh et.al. |
2507.07752 |
null |
| 2025-07-10 |
Generic Speech Enhancement with Self-Supervised Representation Space Loss |
Hiroshi Sato et.al. |
2507.07631 |
null |
| 2025-07-10 |
Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic Segmentation |
Chunyan Wang et.al. |
2507.07578 |
null |
| 2025-07-10 |
SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction |
Wei Yao et.al. |
2507.07465 |
null |
| 2025-07-10 |
Degradation-Agnostic Statistical Facial Feature Transformation for Blind Face Restoration in Adverse Weather Conditions |
Chang-Hwan Son et.al. |
2507.07464 |
null |
| 2025-07-10 |
Subject grounding to reduce electromagnetic interference for MRI scanners operating in unshielded environments |
Beatrice Lena et.al. |
2507.07459 |
null |
| 2025-07-08 |
Generative Panoramic Image Stitching |
Mathieu Tuli et.al. |
2507.07133 |
null |
| 2025-07-09 |
4KAgent: Agentic Any Image to 4K Super-Resolution |
Yushen Zuo et.al. |
2507.07105 |
null |
| 2025-07-10 |
Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting |
Fei Teng et.al. |
2507.06971 |
null |
| 2025-07-09 |
Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception |
Noah Jaffe et.al. |
2507.06917 |
null |
| 2025-07-09 |
Speckle2Self: Self-Supervised Ultrasound Speckle Reduction Without Clean Data |
Xuesong Li et.al. |
2507.06828 |
null |
| 2025-07-09 |
Democratizing High-Fidelity Co-Speech Gesture Video Generation |
Xu Yang et.al. |
2507.06812 |
null |
| 2025-07-09 |
Better frame rates or better visuals? An early report of Esports player practice in Dota 2 |
Arjun Madhusudan et.al. |
2507.06790 |
null |
| 2025-07-08 |
Deprecating Benchmarks: Criteria and Framework |
Ayrton San Joaquin et.al. |
2507.06434 |
null |
| 2025-07-08 |
LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models |
Zhihao Chen et.al. |
2507.06140 |
null |
| 2025-07-08 |
Bridging Sequential Deep Operator Network and Video Diffusion: Residual Refinement of Spatio-Temporal PDE Solutions |
Jaewan Park et.al. |
2507.06133 |
null |
| 2025-07-08 |
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis |
Xintong Hu et.al. |
2507.06116 |
null |
| 2025-07-08 |
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models |
Chihan Huang et.al. |
2507.06078 |
null |
| 2025-07-08 |
Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration |
Maximilian Tschuchnig et.al. |
2507.06067 |
null |
| 2025-07-08 |
VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis |
Alexandre Symeonidis-Herzig et.al. |
2507.06060 |
null |
| 2025-07-08 |
Semantic Certainty Assessment in Vector Retrieval Systems: A Novel Framework for Embedding Quality Evaluation |
Y. Du et.al. |
2507.05933 |
null |
| 2025-07-08 |
Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data |
Rui Huang et.al. |
2507.05914 |
null |
| 2025-07-08 |
D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos |
Wenkang Zhang et.al. |
2507.05859 |
null |
| 2025-07-08 |
TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model |
Yujie Hu et.al. |
2507.05790 |
null |
| 2025-07-08 |
Text-Guided Token Communication for Wireless Image Transmission |
Bole Liu et.al. |
2507.05781 |
null |
| 2025-07-08 |
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos |
Rongsheng Wang et.al. |
2507.05675 |
null |
| 2025-07-08 |
Diffusion-Based Limited-Angle CT Reconstruction under Noisy Conditions |
Jiaqi Guo et.al. |
2507.05647 |
null |
| 2025-07-08 |
AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework |
Suoxiang Zhang et.al. |
2507.05621 |
null |
| 2025-07-07 |
Conversational Education at Scale: A Multi-LLM Agent Workflow for Procedural Learning and Pedagogic Quality Assessment |
Jiahuan Pei et.al. |
2507.05528 |
null |
| 2025-07-07 |
LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving |
Giulio Federico et.al. |
2507.05499 |
null |
| 2025-07-07 |
Self-supervised Deep Learning for Denoising in Ultrasound Microvascular Imaging |
Lijie Huang et.al. |
2507.05451 |
null |
| 2025-07-07 |
Enhancing Underwater Images Using Deep Learning with Subjective Image Quality Integration |
Jose M. Montero et.al. |
2507.05393 |
null |
| 2025-07-07 |
SegmentDreamer: Towards High-fidelity Text-to-3D Synthesis with Segmented Consistency Trajectory Distillation |
Jiahao Zhu et.al. |
2507.05256 |
null |
| 2025-07-07 |
In-Context Learning as an Effective Estimator of Functional Correctness of LLM-Generated Code |
Susmita Das et.al. |
2507.05200 |
null |
| 2025-07-07 |
SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model |
Chun Xie et.al. |
2507.05148 |
null |
| 2025-07-07 |
Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation |
Jianjiang Yang et.al. |
2507.04946 |
null |
| 2025-07-07 |
HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding |
Yuxuan Cai et.al. |
2507.04909 |
null |
| 2025-07-07 |
Efficacy of Image Similarity as a Metric for Augmenting Small Dataset Retinal Image Segmentation |
Thomas Wallace et.al. |
2507.04862 |
null |
| 2025-07-07 |
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations |
Yuji Wang et.al. |
2507.04705 |
null |
| 2025-07-07 |
Quantitative Single-particle Profiling of Extracellular Vesicles via Fluorescent Nanoparticle Tracking Analysis |
Yiting Liu et.al. |
2507.04655 |
null |
| 2025-07-07 |
Introduction to the China Space Station Telescope (CSST) |
CSST Collaboration et.al. |
2507.04618 |
null |
| 2025-07-06 |
TeleSim: A Network-Aware Testbed and Benchmark Dataset for Telerobotic Applications |
Zexin Deng et.al. |
2507.04425 |
null |
| 2025-07-06 |
DMAT: An End-to-End Framework for Joint Atmospheric Turbulence Mitigation and Object Detection |
Paul Hill et.al. |
2507.04323 |
null |
| 2025-07-06 |
Towards Lightest Low-Light Image Enhancement Architecture for Mobile Devices |
Guangrui Bai et.al. |
2507.04277 |
null |
| 2025-07-06 |
Siberian radioheliograph image classification using ensemble of CLIP, EfficientNet and CatBoost models |
Yaroslav Egorov et.al. |
2507.04211 |
null |
| 2025-07-05 |
Towards Spatially-Varying Gain and Binning |
Anqi Yang et.al. |
2507.04190 |
null |
| 2025-07-05 |
A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality |
Shuo Xin et.al. |
2507.04147 |
null |
| 2025-07-05 |
MMMOS: Multi-domain Multi-axis Audio Quality Assessment |
Yi-Cheng Lin et.al. |
2507.04094 |
null |
| 2025-07-05 |
Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM |
Xiaolei Lang et.al. |
2507.04004 |
null |
| 2025-07-08 |
LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression Artifacts |
Aleksandr Gushchin et.al. |
2507.03990 |
null |
| 2025-07-08 |
StreamDiT: Real-Time Streaming Text-to-Video Generation |
Akio Kodaira et.al. |
2507.03745 |
null |
| 2025-07-04 |
TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection |
Xixiang He et.al. |
2507.03673 |
null |
| 2025-07-03 |
RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation |
Liheng Zhang et.al. |
2507.02792 |
null |
| 2025-07-03 |
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation |
Xiangyang Luo et.al. |
2507.02691 |
null |
| 2025-07-03 |
Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data |
Irena Girshovitz et.al. |
2507.02628 |
null |
| 2025-07-03 |
Addressing Camera Sensors Faults in Vision-Based Navigation: Simulation and Dataset Development |
Riccardo Gallon et.al. |
2507.02602 |
null |
| 2025-07-03 |
IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising |
Hailong Yan et.al. |
2507.02445 |
null |
| 2025-07-03 |
Are Synthetic Videos Useful? A Benchmark for Retrieval-Centric Evaluation of Synthetic Videos |
Zecheng Zhao et.al. |
2507.02316 |
null |
| 2025-07-03 |
MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement |
Fanghai Yi et.al. |
2507.02270 |
null |
| 2025-07-02 |
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices |
Hailong Yan et.al. |
2507.01838 |
null |
| 2025-07-02 |
Enhancing Multi-Exposure High Dynamic Range Imaging with Overlapped Codebook for Improved Representation Learning |
Keuntek Lee et.al. |
2507.01588 |
null |
| 2025-07-02 |
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation |
Jimyeong Kim et.al. |
2507.01496 |
null |
| 2025-07-02 |
SD-Acc: Accelerating Stable Diffusion through Phase-aware Sampling and Hardware Co-Optimizations |
Zhican Wang et.al. |
2507.01309 |
null |
| 2025-07-02 |
Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer |
Runze Cheng et.al. |
2507.01254 |
null |
| 2025-07-01 |
Empirical Analysis Of Heuristic and Approximation Algorithms for the The Mutual-Visibility Problem |
Vanja Stojanović et.al. |
2507.01076 |
null |
| 2025-07-01 |
Diffusion Classifier Guidance for Non-robust Classifiers |
Philipp Vaeth et.al. |
2507.00687 |
null |
| 2025-07-01 |
Mind the Detail: Uncovering Clinically Relevant Image Details in Accelerated MRI with Semantically Diverse Reconstructions |
Jan Nikolas Morshuis et.al. |
2507.00670 |
null |
| 2025-07-03 |
MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound |
Rusi Chen et.al. |
2507.00660 |
null |
| 2025-07-01 |
Integration of quantum random number generators with post-quantum cryptography algorithms |
Paula Alonso Blanco et.al. |
2507.00658 |
null |
| 2025-07-01 |
Physics-Informed Neural ODEs for Temporal Dynamics Modeling in Cardiac T1 Mapping |
Nuno Capitão et.al. |
2507.00613 |
null |
| 2025-07-01 |
Latent Posterior-Mean Rectified Flow for Higher-Fidelity Perceptual Face Restoration |
Xin Luo et.al. |
2507.00447 |
null |
| 2025-07-01 |
MedDiff-FT: Data-Efficient Diffusion Model Fine-tuning with Structural Guidance for Controllable Medical Image Synthesis |
Jianhao Xie et.al. |
2507.00377 |
null |
| 2025-07-01 |
GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control |
Xingjun Wang et.al. |
2507.00363 |
null |
| 2025-06-30 |
A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss |
Tarikul Islam Tamiti et.al. |
2507.00229 |
null |
| 2025-06-30 |
How to Design and Train Your Implicit Neural Representation for Video Compression |
Matthew Gwilliam et.al. |
2506.24127 |
null |
| 2025-06-30 |
Epona: Autoregressive Diffusion World Model for Autonomous Driving |
Kaiwen Zhang et.al. |
2506.24113 |
null |
| 2025-06-30 |
Navigating with Annealing Guidance Scale in Diffusion Space |
Shai Yehezkel et.al. |
2506.24108 |
null |
| 2025-06-30 |
URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition |
Jiahe Wang et.al. |
2506.23874 |
null |
| 2025-07-03 |
RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment |
Jianing Jin et.al. |
2506.23852 |
null |
| 2025-06-30 |
Transition Matching: Scalable and Flexible Generative Modeling |
Neta Shaul et.al. |
2506.23589 |
null |
| 2025-06-30 |
Metadata, Wavelet, and Time Aware Diffusion Models for Satellite Image Super Resolution |
Luigi Sigillo et.al. |
2506.23566 |
null |
| 2025-06-30 |
TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity |
Yuzhuo Chen et.al. |
2506.23484 |
null |
| 2025-06-29 |
Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement |
Siyuan Chai et.al. |
2506.23353 |
null |
| 2025-06-29 |
DiffFit: Disentangled Garment Warping and Texture Refinement for Virtual Try-On |
Xiang Xu et.al. |
2506.23295 |
null |
| 2025-06-29 |
PixelBoost: Leveraging Brownian Motion for Realistic-Image Super-Resolution |
Aradhana Mishra et.al. |
2506.23254 |
null |
| 2025-06-29 |
Research on Comprehensive Classroom Evaluation System Based on Multiple AI Models |
Cong Xie et.al. |
2506.23079 |
null |
| 2025-06-28 |
Point Cloud Compression and Objective Quality Assessment: A Survey |
Yiling Xu et.al. |
2506.22902 |
null |
| 2025-06-28 |
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing |
Junsung Lee et.al. |
2506.22868 |
null |
| 2025-06-28 |
ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge |
Yixu Chen et.al. |
2506.22790 |
null |
| 2025-06-28 |
VSRM: A Robust Mamba-Based Framework for Video Super-Resolution |
Dinh Phu Tran et.al. |
2506.22762 |
null |
| 2025-06-28 |
Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography |
Jianing Zhang et.al. |
2506.22753 |
null |
| 2025-06-27 |
High Resolution Isotropic 3D Cine imaging with Automated Segmentation using Concatenated 2D Real-time Imaging and Deep Learning |
Mark Wrobel et.al. |
2506.22532 |
null |
| 2025-07-01 |
Dehazing Light Microscopy Images with Guided Conditional Flow Matching: finding a sweet spot between fidelity and realism |
Anirban Ray et.al. |
2506.22397 |
null |
| 2025-06-27 |
DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model |
Yuliang Huang et.al. |
2506.22280 |
null |
| 2025-06-27 |
ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning |
Ming Zhao et.al. |
2506.22216 |
null |
| 2025-06-27 |
RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation |
Liudi Yang et.al. |
2506.22007 |
null |
| 2025-06-27 |
HighRateMOS: Sampling-Rate Aware Modeling for Speech Quality Assessment |
Wenze Ren et.al. |
2506.21951 |
null |
| 2025-06-27 |
Quality Assessment and Distortion-aware Saliency Prediction for AI-Generated Omnidirectional Images |
Liu Yang et.al. |
2506.21925 |
null |
| 2025-06-27 |
GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles |
Mengyi Shan et.al. |
2506.21839 |
null |
| 2025-06-26 |
TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation |
Hakan Çapuk et.al. |
2506.21681 |
null |
| 2025-06-26 |
Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising |
Hojat Asgariandehkordi et.al. |
2506.21499 |
null |
| 2025-06-26 |
Counterfactual Voting Adjustment for Quality Assessment and Fairer Voting in Online Platforms with Helpfulness Evaluation |
Chang Liu et.al. |
2506.21362 |
null |
| 2025-06-26 |
Bridging Video Quality Scoring and Justification via Large Multimodal Models |
Qizhi Xie et.al. |
2506.21011 |
null |
| 2025-06-26 |
Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology |
Qiuyi Qi et.al. |
2506.21001 |
null |
| 2025-06-26 |
3D Scene-Camera Representation with Joint Camera Photometric Optimization |
Weichen Dai et.al. |
2506.20979 |
null |
| 2025-06-26 |
Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality |
Naihe Feng et.al. |
2506.20978 |
null |
| 2025-06-26 |
Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models |
Donggoo Kang et.al. |
2506.20946 |
null |
| 2025-06-25 |
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models |
Cansu Korkmaz et.al. |
2506.20832 |
null |
| 2025-06-25 |
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling |
Tobias Vontobel et.al. |
2506.20452 |
null |
| 2025-06-25 |
DreamAnywhere: Object-Centric Panoramic 3D Scene Generation |
Edoardo Alberto Dominici et.al. |
2506.20367 |
null |
| 2025-06-25 |
FundaQ-8: A Clinically-Inspired Scoring Framework for Automated Fundus Image Quality Assessment |
Lee Qi Zun et.al. |
2506.20303 |
null |
| 2025-06-25 |
TDiR: Transformer based Diffusion for Image Restoration Tasks |
Abbas Anwar et.al. |
2506.20302 |
null |
| 2025-06-25 |
Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations |
Shunqi Mao et.al. |
2506.20294 |
null |
| 2025-06-26 |
Signal-to-noise and spatial resolution in in-line imaging. 2. Phase-contrast tomography |
T. E. Gureyev et.al. |
2506.20277 |
null |
| 2025-06-25 |
RaRa Clipper: A Clipper for Gaussian Splatting Based on Ray Tracer and Rasterizer |
Da Li et.al. |
2506.20202 |
null |
| 2025-06-25 |
MS-IQA: A Multi-Scale Feature Fusion Network for PET/CT Image Quality Assessment |
Siqiao Li et.al. |
2506.20200 |
null |
| 2025-06-24 |
Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack |
Xuesong Wang et.al. |
2506.19886 |
null |
| 2025-06-24 |
Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation |
Xingyang Li et.al. |
2506.19852 |
null |
| 2025-06-24 |
Active View Selector: Fast and Accurate Active View Selection with Cross Reference Image Quality Assessment |
Zirui Wang et.al. |
2506.19844 |
null |
| 2025-06-24 |
Improving Progressive Generation with Decomposable Flow Matching |
Moayed Haji-Ali et.al. |
2506.19839 |
null |
| 2025-06-24 |
Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models |
Johannes Rückert et.al. |
2506.19825 |
null |
| 2025-06-24 |
Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales |
Seyedmorteza Sadat et.al. |
2506.19713 |
null |
| 2025-06-24 |
Filling of incomplete sinograms from sparse PET detector configurations using a residual U-Net |
Klara Leffler et.al. |
2506.19600 |
null |
| 2025-06-24 |
ReMAR-DS: Recalibrated Feature Learning for Metal Artifact Reduction and CT Domain Transformation |
Mubashara Rehman et.al. |
2506.19531 |
null |
| 2025-06-24 |
Angio-Diff: Learning a Self-Supervised Adversarial Diffusion Model for Angiographic Geometry Generation |
Zhifeng Wang et.al. |
2506.19455 |
null |
| 2025-06-24 |
NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs |
Khuram Naveed et.al. |
2506.19387 |
null |
| 2025-06-24 |
Learning to assess subjective impressions from speech |
Yuto Kondo et.al. |
2506.19335 |
null |
| 2025-06-24 |
Style Transfer: A Decade Survey |
Tianshan Zhang et.al. |
2506.19278 |
null |
| 2025-06-24 |
Automated Image Recognition Framework |
Quang-Binh Nguyen et.al. |
2506.19261 |
null |
| 2025-06-23 |
VHU-Net: Variational Hadamard U-Net for Body MRI Bias Field Correction |
Xin Zhu et.al. |
2506.19181 |
null |
| 2025-06-23 |
A B-Spline Finite Element Method for Cloth Simulation |
Yuqi Meng et.al. |
2506.18867 |
null |
| 2025-06-23 |
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset |
Zhuowei Chen et.al. |
2506.18851 |
null |
| 2025-06-23 |
ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs |
Michal Nazarczuk et.al. |
2506.18792 |
null |
| 2025-06-23 |
Matrix-Game: Interactive World Foundation Model |
Yifan Zhang et.al. |
2506.18701 |
null |
| 2025-06-23 |
RDPO: Real Data Preference Optimization for Physics Consistency Video Generation |
Wenxu Qian et.al. |
2506.18655 |
null |
| 2025-06-23 |
2D Triangle Splatting for Direct Differentiable Mesh Training |
Kaifeng Sheng et.al. |
2506.18575 |
link |
| 2025-06-23 |
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning |
Xuanyu Zhang et.al. |
2506.18564 |
null |
| 2025-06-23 |
What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models |
Yiyao Wang et.al. |
2506.18407 |
null |
| 2025-06-23 |
Selecting N-lowest scores for training MOS prediction models |
Yuto Kondo et.al. |
2506.18326 |
null |
| 2025-06-23 |
ARSAR-Net: Adaptively Regularized SAR Imaging Network via Non-matrix-inversion ADMM |
Shiping Fu et.al. |
2506.18324 |
null |
| 2025-06-23 |
A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement |
Muhammad Azeem Aslam et.al. |
2506.18323 |
null |
| 2025-06-23 |
Rethinking Mean Opinion Scores in Speech Quality Assessment: Aggregation through Quantized Distribution Fitting |
Yuto Kondo et.al. |
2506.18307 |
null |
| 2025-06-22 |
InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating |
Fuyu Wang et.al. |
2506.18102 |
null |
| 2025-06-22 |
Face-Voice Association for Audiovisual Active Speaker Detection in Egocentric Recordings |
Jason Clarke et.al. |
2506.18055 |
null |
| 2025-06-22 |
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP |
Chenyue Song et.al. |
2506.17969 |
null |
| 2025-06-21 |
DreamJourney: Perpetual View Generation with Video Diffusion Models |
Bo Pan et.al. |
2506.17705 |
null |
| 2025-06-21 |
Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning |
Shih-Wen Liu et.al. |
2506.17645 |
null |
| 2025-06-21 |
MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization |
Tingting Liu et.al. |
2506.17540 |
null |
| 2025-06-20 |
The Hidden Cost of an Image: Quantifying the Energy Consumption of AI Image Generation |
Giulia Bertazzini et.al. |
2506.17016 |
null |
| 2025-06-20 |
PET Tracer Separation Using Conditional Diffusion Transformer with Multi-latent Space Learning |
Bin Huang et.al. |
2506.16934 |
null |
| 2025-06-20 |
Infrared and Visible Image Fusion Based on Implicit Neural Representations |
Shuchen Sun et.al. |
2506.16773 |
null |
| 2025-06-19 |
MetaQAP -- A Meta-Learning Approach for Quality-Aware Pretraining in Image Quality Assessment |
Muhammad Azeem Aslam et.al. |
2506.16601 |
null |
| 2025-06-19 |
DiffO: Single-step Diffusion for Image Compression at Ultra-Low Bitrates |
Chanung Park et.al. |
2506.16572 |
link |
| 2025-06-19 |
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ |
Yunkee Chae et.al. |
2506.16538 |
null |
| 2025-06-19 |
Data Compression with Relative Entropy Coding |
Gergely Flamich et.al. |
2506.16309 |
null |
| 2025-06-19 |
Active MRI Acquisition with Diffusion Guided Bayesian Experimental Design |
Jacopo Iollo et.al. |
2506.16237 |
null |
| 2025-06-19 |
From Coarse to Continuous: Progressive Refinement Implicit Neural Representation for Motion-Robust Anisotropic MRI Reconstruction |
Zhenxuan Zhang et.al. |
2506.16210 |
null |
| 2025-06-19 |
Neural Prioritisation for Web Crawling |
Francesza Pezzuti et.al. |
2506.16146 |
null |
| 2025-06-19 |
Enhanced Dermatology Image Quality Assessment via Cross-Domain Training |
Ignacio Hernández Montilla et.al. |
2506.16116 |
null |
| 2025-06-19 |
Fast Training-free Perceptual Image Compression |
Ziran Zhu et.al. |
2506.16102 |
null |
| 2025-06-19 |
STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution |
Yucheng Jin et.al. |
2506.16061 |
null |
| 2025-06-19 |
Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization |
Cong Wang et.al. |
2506.15980 |
link |
| 2025-06-19 |
CORAL: Disentangling Latent Representations in Long-Tailed Diffusion |
Esther Rodriguez et.al. |
2506.15933 |
null |
| 2025-06-18 |
Demystifying the Visual Quality Paradox in Multimodal Large Language Models |
Shuo Xing et.al. |
2506.15645 |
null |
| 2025-06-20 |
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution |
Yujing Sun et.al. |
2506.15591 |
link |
| 2025-06-18 |
Advanced cervical cancer classification: enhancing pap smear images with hybrid PMD Filter-CLAHE |
Ach Khozaimi et.al. |
2506.15489 |
null |
| 2025-06-18 |
When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class |
Yujin Kim et.al. |
2506.15381 |
null |
| 2025-06-20 |
Privacy-Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference |
Jonghun Kim et.al. |
2506.15258 |
link |
| 2025-06-18 |
RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories |
Qingsong Yan et.al. |
2506.15242 |
null |
| 2025-06-18 |
DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder |
Dan He et.al. |
2506.15218 |
link |
| 2025-06-18 |
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models |
Xuelin Shen et.al. |
2506.15201 |
link |
| 2025-06-18 |
You Only Render Once: Enhancing Energy and Computation Efficiency of Mobile Virtual Reality |
Xingyu Chen et.al. |
2506.15183 |
null |
| 2025-06-17 |
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments |
Md Jahangir Alam Khondkar et.al. |
2506.15000 |
link |
| 2025-06-17 |
Improved Image Reconstruction and Diffusion Parameter Estimation Using a Temporal Convolutional Network Model of Gradient Trajectory Errors |
Jonathan B. Martin et.al. |
2506.14995 |
link |
| 2025-06-17 |
Plug-and-Play with 2.5D Artifact Reduction Prior for Fast and Accurate Industrial Computed Tomography Reconstruction |
Haley Duba-Sullivan et.al. |
2506.14719 |
null |
| 2025-06-17 |
3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting |
Yuke Xing et.al. |
2506.14642 |
link |
| 2025-06-17 |
QUEST: Quality-aware Semi-supervised Table Extraction for Business Documents |
Eliott Thomas et.al. |
2506.14568 |
null |
| 2025-06-17 |
Causally Steered Diffusion for Automated Video Counterfactual Generation |
Nikos Spyrou et.al. |
2506.14404 |
link |
| 2025-06-17 |
Compressed Video Super-Resolution based on Hierarchical Encoding |
Yuxuan Jiang et.al. |
2506.14381 |
null |
| 2025-06-18 |
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies |
Jinyan Yuan et.al. |
2506.14315 |
null |
| 2025-06-17 |
Quality Assessment of Python Tests Generated by Large Language Models |
Victor Alves et.al. |
2506.14297 |
link |
| 2025-06-17 |
synth-dacl: Does Synthetic Defect Data Enhance Segmentation Accuracy and Robustness for Real-World Bridge Inspections? |
Johannes Flotzinger et.al. |
2506.14255 |
null |
| 2025-06-17 |
MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment |
Junghwan Kim et.al. |
2506.14199 |
null |
| 2025-06-17 |
Breaking the Multi-Enhancement Bottleneck: Domain-Consistent Quality Enhancement for Compressed Images |
Qunliang Xing et.al. |
2506.14152 |
null |
| 2025-06-15 |
Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing |
Zhuoying Li et.al. |
2506.13827 |
null |
| 2025-06-14 |
ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering |
Lufei Liu et.al. |
2506.13814 |
null |
| 2025-06-16 |
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms |
Sirui Li et.al. |
2506.13709 |
null |
| 2025-06-16 |
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions |
Zhucun Xue et.al. |
2506.13691 |
null |
| 2025-06-17 |
First Positronium Lifetime Imaging with Scandium-44 on a Long Axial Field-of-view PET/CT |
Lorenzo Mercolli et.al. |
2506.13460 |
null |
| 2025-06-16 |
DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration |
Yan Chen et.al. |
2506.13355 |
null |
| 2025-06-16 |
Efficient Approximate Temporal Triangle Counting in Streaming with Predictions |
Giorgio Venturin et.al. |
2506.13173 |
link |
| 2025-06-14 |
Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting |
Xingzhong Hou et.al. |
2506.12530 |
null |
| 2025-06-14 |
Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity |
Mohsen Jenadeleh et.al. |
2506.12505 |
null |
| 2025-06-14 |
Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments |
Zaiqiang Wu et.al. |
2506.12348 |
link |
| 2025-06-13 |
ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing |
Babak Naderi et.al. |
2506.12269 |
link |
| 2025-06-13 |
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment |
Wei Wang et.al. |
2506.12260 |
null |
| 2025-06-13 |
SphereDrag: Spherical Geometry-Aware Panoramic Image Editing |
Zhiao Feng et.al. |
2506.11863 |
null |
| 2025-06-13 |
Fast MRI of bones in the knee -- An AI-driven reconstruction approach for adiabatic inversion recovery prepared ultra-short echo time sequences |
Philipp Hans Nunn et.al. |
2506.11771 |
null |
| 2025-06-13 |
EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment |
Zhaoyang Wang et.al. |
2506.11549 |
null |
| 2025-06-13 |
CGVQM+D: Computer Graphics Video Quality Metric and Dataset |
Akshay Jindal et.al. |
2506.11546 |
link |
| 2025-06-13 |
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders |
Xingwei Sun et.al. |
2506.11514 |
link |
| 2025-06-13 |
Taming Stable Diffusion for Computed Tomography Blind Super-Resolution |
Chunlei Li et.al. |
2506.11496 |
null |
| 2025-06-13 |
Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates |
Kai Yue et.al. |
2506.11413 |
null |
| 2025-06-13 |
A Watermark for Auto-Regressive Image Generation Models |
Yihan Wu et.al. |
2506.11371 |
null |
| 2025-06-12 |
Score-based Generative Diffusion Models to Synthesize Full-dose FDG Brain PET from MRI in Epilepsy Patients |
Jiaqi Wu et.al. |
2506.11297 |
null |
| 2025-06-12 |
M4V: Multi-Modal Mamba for Text-to-Video Generation |
Jiancheng Huang et.al. |
2506.10915 |
null |
| 2025-06-12 |
AIR: Zero-shot Generative Model Adaptation with Iterative Refinement |
Guimeng Liu et.al. |
2506.10895 |
link |
| 2025-06-12 |
Stroke-based Cyclic Amplifier: Image Super-Resolution at Arbitrary Ultra-Large Scales |
Wenhao Guo et.al. |
2506.10774 |
null |
| 2025-06-12 |
Underage Detection through a Multi-Task and MultiAge Approach for Screening Minors in Unconstrained Imagery |
Christopher Gaul et.al. |
2506.10689 |
null |
| 2025-06-12 |
Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework |
Xia Du et.al. |
2506.10685 |
null |
| 2025-06-12 |
High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model |
Eshan Ramesh et.al. |
2506.10605 |
null |
| 2025-06-12 |
A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks |
Junhua Lin et.al. |
2506.10502 |
null |
| 2025-06-12 |
Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On |
Zaiqiang Wu et.al. |
2506.10468 |
link |
| 2025-06-12 |
PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting |
Lintao Xiang et.al. |
2506.10335 |
null |
| 2025-06-12 |
Research on Audio-Visual Quality Assessment Dataset and Method for User-Generated Omnidirectional Video |
Fei Zhao et.al. |
2506.10331 |
null |
| 2025-06-12 |
DUN-SRE: Deep Unrolling Network with Spatiotemporal Rotation Equivariance for Dynamic MRI Reconstruction |
Yuliang Zhu et.al. |
2506.10309 |
null |
| 2025-06-12 |
Discrete Audio Tokens: More Than a Survey! |
Pooneh Mousavi et.al. |
2506.10274 |
null |
| 2025-06-11 |
D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning |
Muqi Zou et.al. |
2506.10125 |
null |
| 2025-06-10 |
Ambient Diffusion Omni: Training Good Models with Bad Data |
Giannis Daras et.al. |
2506.10038 |
link |
| 2025-06-10 |
FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training |
Fuhan Cai et.al. |
2506.10035 |
null |
| 2025-06-11 |
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits |
Ron Yosef et.al. |
2506.09988 |
null |
| 2025-06-11 |
Error-Guided Pose Augmentation: Enhancing Rehabilitation Exercise Assessment through Targeted Data Generation |
Omar Sherif et.al. |
2506.09833 |
null |
| 2025-06-11 |
Metritocracy: Representative Metrics for Lite Benchmarks |
Ariel Procaccia et.al. |
2506.09813 |
null |
| 2025-06-11 |
Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment |
Amritha Premkumar et.al. |
2506.09795 |
null |
| 2025-06-11 |
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation |
Yukang Feng et.al. |
2506.09427 |
null |
| 2025-06-10 |
Seedance 1.0: Exploring the Boundaries of Video Generation Models |
Yu Gao et.al. |
2506.09113 |
null |
| 2025-06-10 |
Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment |
Maximilian Tschuchnig et.al. |
2506.08716 |
null |
| 2025-06-10 |
Brevity is the soul of sustainability: Characterizing LLM response lengths |
Soham Poddar et.al. |
2506.08686 |
link |
| 2025-06-10 |
Biologically Inspired Deep Learning Approaches for Fetal Ultrasound Image Classification |
Rinat Prochii et.al. |
2506.08623 |
null |
| 2025-06-10 |
LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4 $\times$ RTX 4090s |
Xijun Wang et.al. |
2506.08529 |
null |
| 2025-06-10 |
Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance |
June Suk Choi et.al. |
2506.08456 |
null |
| 2025-06-10 |
Better Reasoning with Less Data: Enhancing VLMs Through Unified Modality Scoring |
Mingjie Xu et.al. |
2506.08429 |
null |
| 2025-06-10 |
Image Demoiréing Using Dual Camera Fusion on Mobile Phones |
Yanting Mei et.al. |
2506.08361 |
link |
| 2025-06-10 |
How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models |
Huixuan Zhang et.al. |
2506.08351 |
null |
| 2025-06-10 |
Complex-Valued Holographic Radiance Fields |
Yicheng Zhan et.al. |
2506.08350 |
null |
| 2025-06-09 |
High-density three-dimensional holography using rapid modulation of light |
Jorge-Alberto Peralta-Ángeles et.al. |
2506.08253 |
null |
| 2025-06-09 |
Dreamland: Controllable World Creation with Simulator and Generative Models |
Sicheng Mo et.al. |
2506.08006 |
null |
| 2025-06-09 |
Audio-Sync Video Generation with Multi-Stream Temporal Control |
Shuchen Weng et.al. |
2506.08003 |
null |
| 2025-06-09 |
Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor |
Rishit Dagli et.al. |
2506.07932 |
null |
| 2025-06-09 |
Efficient Seismic Data Interpolation via Sparse Attention Transformer and Diffusion Model |
Xiaoli Wei et.al. |
2506.07923 |
null |
| 2025-06-09 |
Video Unlearning via Low-Rank Refusal Vector |
Simone Facchiano et.al. |
2506.07891 |
null |
| 2025-06-09 |
Diffusion Counterfactual Generation with Semantic Abduction |
Rajat Rasal et.al. |
2506.07883 |
link |
| 2025-06-09 |
M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration |
Yongzhen Wang et.al. |
2506.07814 |
null |
| 2025-06-09 |
Research quality evaluation by AI in the era of Large Language Models: Advantages, disadvantages, and systemic effects |
Mike Thelwall et.al. |
2506.07748 |
null |
| 2025-06-09 |
PIG: Physically-based Multi-Material Interaction with 3D Gaussians |
Zeyu Xiao et.al. |
2506.07657 |
null |
| 2025-06-09 |
Information-guided optimization of image-based sensorless adaptive optics methods |
Biwei Zhang et.al. |
2506.07482 |
null |
| 2025-06-09 |
Compressed Feature Quality Assessment: Dataset and Baselines |
Changsheng Gao et.al. |
2506.07412 |
null |
| 2025-06-09 |
Distributed Image Semantic Communication via Nonlinear Transform Coding |
Yufei Bo et.al. |
2506.07391 |
null |
| 2025-06-08 |
Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI |
Aditya Chakravarty et.al. |
2506.07286 |
null |
| 2025-06-08 |
First positronium imaging using $^{44}$ Sc with the J-PET scanner: a case study on the NEMA-Image Quality phantom |
Manish Das et.al. |
2506.07230 |
null |
| 2025-06-08 |
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models |
Sangwon Jang et.al. |
2506.07177 |
null |
| 2025-06-08 |
Mathesis: Towards Formal Theorem Proving from Natural Languages |
Yu Xuejun et.al. |
2506.07047 |
null |
| 2025-06-08 |
Deep regularization networks for inverse problems with noisy operators |
Fatemeh Pourahmadian et.al. |
2506.07008 |
null |
| 2025-06-07 |
SPC to 3D: Novel View Synthesis from Binary SPC via I2I translation |
Sumit Sharma et.al. |
2506.06890 |
null |
| 2025-06-07 |
Controllable Coupled Image Generation via Diffusion Models |
Chenfei Yuan et.al. |
2506.06826 |
null |
| 2025-06-07 |
An Efficient Digital Watermarking Technique for Small Scale devices |
Kaushik Talathi et.al. |
2506.06691 |
null |
| 2025-06-06 |
Bidirectional Image-Event Guided Low-Light Image Enhancement |
Zhanwen Liu et.al. |
2506.06120 |
null |
| 2025-06-06 |
On Inverse Problems, Parameter Estimation, and Domain Generalization |
Deborah Pereg et.al. |
2506.06024 |
null |
| 2025-06-06 |
Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness |
Steven Landgraf et.al. |
2506.05917 |
null |
| 2025-06-05 |
On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images |
Andreas Meuleman et.al. |
2506.05558 |
null |
| 2025-06-05 |
EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh |
Tao Hu et.al. |
2506.05554 |
null |
| 2025-06-05 |
F2T2-HiT: A U-Shaped FFT Transformer and Hierarchical Transformer for Reflection Removal |
Jie Cai et.al. |
2506.05489 |
null |
| 2025-06-05 |
Implicit Neural Representation for Video Restoration |
Mary Aiyetigbo et.al. |
2506.05488 |
null |
| 2025-06-05 |
Degradation-Aware Image Enhancement via Vision-Language Classification |
Jie Cai et.al. |
2506.05450 |
null |
| 2025-06-05 |
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training |
Jianyi Wang et.al. |
2506.05301 |
null |
| 2025-06-06 |
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers |
Haosong Liu et.al. |
2506.05096 |
null |
| 2025-06-05 |
Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations |
Igor Meleshin et.al. |
2506.04951 |
null |
| 2025-06-05 |
Time-Lapse Video-Based Embryo Grading via Complementary Spatial-Temporal Pattern Mining |
Yong Sun et.al. |
2506.04950 |
null |
| 2025-06-05 |
Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations |
Christoph Schirninger et.al. |
2506.04781 |
null |
| 2025-06-05 |
Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement |
Niki Martinel et.al. |
2506.04753 |
null |
| 2025-06-05 |
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model |
Zelu Qi et.al. |
2506.04715 |
link |
| 2025-06-05 |
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation |
Ranjith Merugu et.al. |
2506.04567 |
link |
| 2025-06-04 |
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation |
Hermann Kumbong et.al. |
2506.04421 |
null |
| 2025-06-04 |
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks |
Jubayer Ahmed Bhuiyan Shawon et.al. |
2506.04367 |
null |
| 2025-06-04 |
SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization |
Junpyo Seo et.al. |
2506.04283 |
link |
| 2025-06-04 |
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation |
Tianyu Huang et.al. |
2506.04225 |
null |
| 2025-06-04 |
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models |
Yuhao Wu et.al. |
2506.04180 |
null |
| 2025-06-04 |
Synthetic multi-inversion time magnetic resonance images for visualization of subcortical structures |
Savannah P. Hays et.al. |
2506.04173 |
null |
| 2025-06-04 |
Point Cloud Quality Assessment Using the Perceptual Clustering Weighted Graph (PCW-Graph) and Attention Fusion Network |
Abdelouahed Laazoufi et.al. |
2506.04081 |
null |
| 2025-06-04 |
Collaborative On-Sensor Array Cameras |
Jipeng Sun et.al. |
2506.04061 |
null |
| 2025-06-04 |
Joint Video Enhancement with Deblurring, Super-Resolution, and Frame Interpolation Network |
Giyong Choi et.al. |
2506.03892 |
null |
| 2025-06-04 |
EuroGEST: Investigating gender stereotypes in multilingual language models |
Jacqueline Rowe et.al. |
2506.03867 |
null |
| 2025-06-04 |
Conformer-based Ultrasound-to-Speech Conversion |
Ibrahim Ibrahimov et.al. |
2506.03831 |
link |
| 2025-06-04 |
ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning |
Feng Han et.al. |
2506.03596 |
link |
| 2025-06-04 |
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models |
Ziyi Wu et.al. |
2506.03517 |
null |
| 2025-06-03 |
CamCloneMaster: Enabling Reference-based Camera Control for Video Generation |
Yawen Luo et.al. |
2506.03140 |
null |
| 2025-06-03 |
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation |
Zhengyao Lv et.al. |
2506.03123 |
null |
| 2025-06-03 |
Astrophotography turbulence mitigation via generative models |
Joonyeoup Kim et.al. |
2506.02981 |
null |
| 2025-06-03 |
IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator |
Yusuke Sakai et.al. |
2506.02899 |
null |
| 2025-06-03 |
NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results |
Xiaohong Liu et.al. |
2506.02875 |
null |
| 2025-06-03 |
ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration |
Cheng Yang et.al. |
2506.02633 |
null |
| 2025-06-03 |
One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation |
Xue Wu et.al. |
2506.02605 |
null |
| 2025-06-03 |
Multi-modal brain MRI synthesis based on SwinUNETR |
Haowen Pang et.al. |
2506.02467 |
null |
| 2025-06-02 |
Motion aware video generative model |
Bowen Xue et.al. |
2506.02244 |
null |
| 2025-06-02 |
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction |
Saurabh Agrawal et.al. |
2506.02082 |
null |
| 2025-06-02 |
IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout |
Fei Shen et.al. |
2506.01949 |
null |
| 2025-06-04 |
MedEBench: Revisiting Text-instructed Image Editing on Medical Domain |
Minghao Liu et.al. |
2506.01921 |
null |
| 2025-06-02 |
Beyond Pixel Agreement: Large Language Models as Clinical Guardrails for Reliable Medical Image Segmentation |
Jiaxi Sheng et.al. |
2506.01841 |
null |
| 2025-06-02 |
WorldExplorer: Towards Generating Fully Navigable 3D Scenes |
Manuel-Andreas Schneider et.al. |
2506.01799 |
null |
| 2025-06-03 |
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability |
Genta Indra Winata et.al. |
2506.01789 |
link |
| 2025-06-02 |
STORM: Benchmarking Visual Rating of MLLMs with a Comprehensive Ordinal Regression Dataset |
Jinhong Wang et.al. |
2506.01738 |
null |
| 2025-06-02 |
Observation of the Crab Nebula with the Single-Mirror Small-Size Telescope stereoscopic system at low altitude |
C. Alispach et.al. |
2506.01733 |
null |
| 2025-06-02 |
Self-Supervised Speech Quality Assessment (S3QA): Leveraging Speech Foundation Models for a Scalable Speech Quality Metric |
Mattson Ogg et.al. |
2506.01655 |
null |
| 2025-06-02 |
Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment |
Kaixun Jiang et.al. |
2506.01511 |
null |
| 2025-06-02 |
Universal Preference-Score-based Pairwise Speech Quality Assessment |
Yu-Fei Shi et.al. |
2506.01455 |
null |
| 2025-05-30 |
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL |
Yu Zhang et.al. |
2505.24875 |
null |
| 2025-05-30 |
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text |
Li yunhan et.al. |
2505.24826 |
link |
| 2025-05-30 |
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation |
Yucheng Zhou et.al. |
2505.24787 |
link |
| 2025-05-30 |
RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement |
Raman Jha et.al. |
2505.24705 |
link |
| 2025-05-30 |
ARECHO: Autoregressive Evaluation via Chain-Based Hypothesis Optimization for Speech Multi-Metric Estimation |
Jiatong Shi et.al. |
2505.24518 |
null |
| 2025-05-30 |
Digital twins enable full-reference quality assessment of photoacoustic image reconstructions |
Janek Gröhl et.al. |
2505.24514 |
null |
| 2025-05-30 |
Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation |
Ximing Xing et.al. |
2505.24499 |
null |
| 2025-05-30 |
VietMix: A Naturally Occurring Vietnamese-English Code-Mixed Corpus with Iterative Augmentation for Machine Translation |
Hieu Tran et.al. |
2505.24472 |
null |
| 2025-05-30 |
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering |
Runnan Lu et.al. |
2505.24417 |
link |
| 2025-05-30 |
PCIE_Interaction Solution for Ego4D Social Interaction Challenge |
Kanokphan Lertniphonphan et.al. |
2505.24404 |
null |
| 2025-05-30 |
Interactive Video Generation via Domain Adaptation |
Ishaan Rawal et.al. |
2505.24253 |
null |
| 2025-05-30 |
STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models |
Zheng Tan et.al. |
2505.24210 |
link |
| 2025-05-29 |
DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment |
Vaishnav Ramesh et.al. |
2505.24002 |
link |
| 2025-05-29 |
DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP |
Amber Yijia Zheng et.al. |
2505.23743 |
null |
| 2025-05-29 |
SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods |
Roksana Goworek et.al. |
2505.23714 |
null |
| 2025-05-29 |
Multilook Coherent Imaging: Theoretical Guarantees and Algorithms |
Xi Chen et.al. |
2505.23594 |
link |
| 2025-05-29 |
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation |
Kaijie Chen et.al. |
2505.23493 |
null |
| 2025-05-29 |
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation |
Shi-Xue Zhang et.al. |
2505.23484 |
link |
| 2025-05-29 |
Fine-Tuning Next-Scale Visual Autoregressive Models with Group Relative Policy Optimization |
Matteo Gallici et.al. |
2505.23331 |
null |
| 2025-05-29 |
Quality assessment of 3D human animation: Subjective and objective evaluation |
Rim Rekik et.al. |
2505.23301 |
null |
| 2025-05-29 |
UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes |
Yixun Liang et.al. |
2505.23253 |
link |
| 2025-05-29 |
Advancing Image Super-resolution Techniques in Remote Sensing: A Comprehensive Survey |
Yunliang Qi et.al. |
2505.23248 |
null |
| 2025-05-30 |
WTEFNet: Real-Time Low-Light Object Detection for Advanced Driver Assistance Systems |
Hao Wu et.al. |
2505.23201 |
null |
| 2025-05-29 |
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement |
Gabriele Sarti et.al. |
2505.23183 |
link |
| 2025-05-29 |
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation |
Siyuan Wang et.al. |
2505.23120 |
link |
| 2025-05-29 |
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance |
Keren Ye et.al. |
2505.23119 |
null |
| 2025-05-28 |
ATI: Any Trajectory Instruction for Controllable Video Generation |
Angtian Wang et.al. |
2505.22944 |
null |
| 2025-05-28 |
Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape |
Ruichen Chen et.al. |
2505.22918 |
link |
| 2025-05-28 |
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation |
Zhe Kong et.al. |
2505.22647 |
link |
| 2025-05-28 |
Scaling-up Perceptual Video Quality Assessment |
Ziheng Jia et.al. |
2505.22543 |
null |
| 2025-05-28 |
PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models |
Junwen Chen et.al. |
2505.22523 |
null |
| 2025-05-28 |
Understanding Adversarial Training with Energy-based Models |
Mujtaba Hussain Mirza et.al. |
2505.22486 |
null |
| 2025-05-28 |
Large-Area Fabrication-aware Computational Diffractive Optics |
Kaixuan Wei et.al. |
2505.22313 |
null |
| 2025-05-28 |
FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing |
Guanwen Feng et.al. |
2505.22141 |
null |
| 2025-05-28 |
Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach |
Alejandro D. Mousist et.al. |
2505.22128 |
null |
| 2025-05-28 |
SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model |
Yifan Chang et.al. |
2505.22126 |
null |
| 2025-05-28 |
High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models |
Tristan S. W. Stevens et.al. |
2505.22090 |
null |
| 2025-05-28 |
AquaMonitor: A multimodal multi-view image sequence dataset for real-life aquatic invertebrate biodiversity monitoring |
Mikko Impiö et.al. |
2505.22065 |
null |
| 2025-05-28 |
One-Way Ticket:Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models |
Senmao Li et.al. |
2505.21960 |
null |
| 2025-05-28 |
Patch-based Reconstruction for Unsupervised Dynamic MRI using Learnable Tensor Function with Implicit Neural Representation |
Yuanyuan Liu et.al. |
2505.21894 |
null |
| 2025-05-28 |
FPAN: Mitigating Replication in Diffusion Models through the Fine-Grained Probabilistic Addition of Noise to Token Embeddings |
Jingqi Xu et.al. |
2505.21848 |
null |
| 2025-05-27 |
HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation |
Bowen Chen et.al. |
2505.21831 |
null |
| 2025-05-29 |
Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis |
Sinchana Ramakanth Bhat et.al. |
2505.21700 |
link |
| 2025-05-27 |
Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis |
Yipengjing Sun et.al. |
2505.21502 |
null |
| 2025-05-27 |
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers |
Wei Pang et.al. |
2505.21497 |
link |
| 2025-05-27 |
Policy Optimized Text-to-Image Pipeline Design |
Uri Gadot et.al. |
2505.21478 |
null |
| 2025-05-27 |
OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers |
Ziqiao Peng et.al. |
2505.21448 |
null |
| 2025-05-28 |
Towards Robust Automated Perceptual Voice Quality Assessment with Speech Foundation Models |
Whenty Ariyanti et.al. |
2505.21356 |
null |
| 2025-05-27 |
Model as Loss: A Self-Consistent Training Paradigm |
Saisamarth Rajesh Phaye et.al. |
2505.21156 |
null |
| 2025-05-27 |
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance |
Badr Moufad et.al. |
2505.21101 |
link |
| 2025-05-27 |
All-optical discrete illumination-based compressed ultrafast photography |
Long Cheng et.al. |
2505.21086 |
null |
| 2025-05-27 |
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals |
Davide Lobba et.al. |
2505.21062 |
link |
| 2025-05-27 |
Proposal for the optical design of three robust and highly performing FPI systems for the European Solar Telescope |
Goran B. Scharmer et.al. |
2505.21053 |
null |
| 2025-05-28 |
CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians |
Weihang Liu et.al. |
2505.21041 |
null |
| 2025-05-27 |
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy |
Aiyue Chen et.al. |
2505.21036 |
null |
| 2025-05-27 |
Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution |
Minghao Han et.al. |
2505.20984 |
null |
| 2025-05-28 |
Stereo Radargrammetry Using Deep Learning from Airborne SAR Images |
Tatsuya Sasayama et.al. |
2505.20876 |
null |
| 2025-05-27 |
Uni-VERSA: Versatile Speech Assessment with a Unified Network |
Jiatong Shi et.al. |
2505.20741 |
null |
| 2025-05-27 |
LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation |
Pascal Zwick et.al. |
2505.20723 |
link |
| 2025-05-27 |
Photography Perspective Composition: Towards Aesthetic Perspective Recommendation |
Lujian Yao et.al. |
2505.20655 |
null |
| 2025-05-27 |
InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling |
Xiaoxiao Jiang et.al. |
2505.20600 |
null |
| 2025-05-26 |
MultLFG: Training-free Multi-LoRA composition using Frequency-domain Guidance |
Aniket Roy et.al. |
2505.20525 |
null |
| 2025-05-26 |
In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation |
Yu Xu et.al. |
2505.20271 |
null |
| 2025-05-26 |
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion |
Zheqi Lv et.al. |
2505.20053 |
link |
| 2025-05-26 |
ICDM: Interference Cancellation Diffusion Models for Wireless Semantic Communications |
Tong Wu et.al. |
2505.19983 |
null |
| 2025-05-26 |
PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction |
Kanglei Zhou et.al. |
2505.19972 |
link |
| 2025-05-27 |
Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM |
Peng Liu et.al. |
2505.19901 |
null |
| 2025-05-26 |
Navigating PESQ: Up-to-Date Versions and Open Implementations |
Matteo Torcoli et.al. |
2505.19760 |
link |
| 2025-05-26 |
CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement |
Maria Dziuba et.al. |
2505.19757 |
null |
| 2025-05-26 |
HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt Guidance |
Jue Gong et.al. |
2505.19742 |
link |
| 2025-05-26 |
Modeling Beyond MOS: Quality Assessment Models Must Integrate Context, Reasoning, and Multimodality |
Mohamed Amine Kerkouri et.al. |
2505.19696 |
null |
| 2025-05-26 |
DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving |
Wenchao Sun et.al. |
2505.19692 |
link |
| 2025-05-26 |
Burst Image Super-Resolution via Multi-Cross Attention Encoding and Multi-Scan State-Space Decoding |
Tengda Huang et.al. |
2505.19668 |
null |
| 2025-05-26 |
VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models |
Hu Xiaobin et.al. |
2505.19571 |
link |
| 2025-05-26 |
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs |
Juntong Wang et.al. |
2505.19535 |
null |
| 2025-05-26 |
Ring artifacts correction method in x-ray computed tomography based on stripe classification and removal in sinogram images |
Yang Zou et.al. |
2505.19513 |
null |
| 2025-05-26 |
ViewCraft3D: High-Fidelity and View-Consistent 3D Vector Graphics Synthesis |
Chuang Wang et.al. |
2505.19492 |
null |
| 2025-05-26 |
Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals |
Nate Gillman et.al. |
2505.19386 |
null |
| 2025-05-26 |
Neural nanophotonic object detector with ultra-wide field-of-view |
Ji Chen et.al. |
2505.19379 |
null |
| 2025-05-25 |
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline |
Helin Wang et.al. |
2505.19314 |
link |
| 2025-05-25 |
Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning |
Xinyao Liao et.al. |
2505.19196 |
link |
| 2025-05-25 |
Triangle Splatting for Real-Time Radiance Field Rendering |
Jan Held et.al. |
2505.19175 |
null |
| 2025-05-25 |
MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection |
Shuyu Wang et.al. |
2505.19149 |
null |
| 2025-05-23 |
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration |
Sudarshan Rajagopalan et.al. |
2505.18047 |
null |
| 2025-05-23 |
DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning |
Bin Wu et.al. |
2505.17910 |
null |
| 2025-05-23 |
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding |
Anjie Le et.al. |
2505.17779 |
null |
| 2025-05-23 |
SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain |
Jiawei Zhou et.al. |
2505.17727 |
null |
| 2025-05-23 |
CAS-IQA: Teaching Vision-Language Models for Synthetic Angiography Quality Assessment |
Bo Wang et.al. |
2505.17619 |
null |
| 2025-05-23 |
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model |
Kwanyoung Kim et.al. |
2505.17561 |
null |
| 2025-05-23 |
Deeper Diffusion Models Amplify Bias |
Shahin Hakemi et.al. |
2505.17560 |
null |
| 2025-05-23 |
PD $^3$ : A Project Duplication Detection Framework via Adapted Multi-Agent Debate |
Dezheng Bao et.al. |
2505.17492 |
null |
| 2025-05-23 |
Dual Ascent Diffusion for Inverse Problems |
Minseo Kim et.al. |
2505.17353 |
null |
| 2025-05-22 |
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models |
Pushkar Shukla et.al. |
2505.17280 |
null |
| 2025-05-22 |
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning |
Chengqi Duan et.al. |
2505.17022 |
link |
| 2025-05-23 |
Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation |
Ofir Yaish et.al. |
2505.16911 |
null |
| 2025-05-22 |
Perceptual Quality Assessment for Embodied AI |
Chunyi Li et.al. |
2505.16815 |
link |
| 2025-05-22 |
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining |
Shangquan Sun et.al. |
2505.16811 |
null |
| 2025-05-22 |
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training |
Ziqiao Wang et.al. |
2505.16792 |
link |
| 2025-05-22 |
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning |
Jian Liu et.al. |
2505.16761 |
null |
| 2025-05-22 |
One-Step Diffusion-Based Image Compression with Semantic Distillation |
Naifu Xue et.al. |
2505.16687 |
null |
| 2025-05-22 |
Performance of Objective Speech Quality Metrics on Languages Beyond Validation Data: A Study of Turkish and Korean |
Javier Perez et.al. |
2505.16616 |
null |
| 2025-05-22 |
ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts |
Dongwon Noh et.al. |
2505.16566 |
null |
| 2025-05-22 |
SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion |
Asrar Alruwayqi et.al. |
2505.16535 |
null |
| 2025-05-22 |
Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics |
Włodzimierz Lewoniewski et.al. |
2505.16506 |
null |
| 2025-05-22 |
InspectionV3: Enhancing Tobacco Quality Assessment with Deep Convolutional Neural Networks for Automated Workshop Management |
Yao Wei et.al. |
2505.16485 |
null |
| 2025-05-22 |
UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension |
Kishan Gupta et.al. |
2505.16404 |
null |
| 2025-05-22 |
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design |
Renjie Wei et.al. |
2505.16335 |
link |
| 2025-05-22 |
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment |
Shuhao Han et.al. |
2505.16314 |
null |
| 2025-05-22 |
A Shape-Aware Total Body Photography System for In-focus Surface Coverage Optimization |
Wei-Lun Huang et.al. |
2505.16228 |
null |
| 2025-05-22 |
Generative Latent Coding for Ultra-Low Bitrate Image and Video Compression |
Linfeng Qi et.al. |
2505.16177 |
null |
| 2025-05-22 |
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods |
Hyang Cui et.al. |
2505.16129 |
link |
| 2025-05-22 |
OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates |
Jinpei Guo et.al. |
2505.16091 |
link |
| 2025-05-21 |
CP-LLM: Context and Pixel Aware Large Language Model for Video Quality Assessment |
Wen Wen et.al. |
2505.16025 |
null |
| 2025-05-21 |
Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization |
Satoshi Kosugi et.al. |
2505.15812 |
link |
| 2025-05-21 |
RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction |
Zhuodong Jiang et.al. |
2505.15737 |
null |
| 2025-05-21 |
Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks |
Nick Kocher et.al. |
2505.15631 |
null |
| 2025-05-21 |
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding |
Zijian Lin et.al. |
2505.15380 |
null |
| 2025-05-21 |
SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit |
Wen-Chin Huang et.al. |
2505.15061 |
link |
| 2025-05-20 |
Training-Free Watermarking for Autoregressive Image Generation |
Yu Tong et.al. |
2505.14673 |
link |
| 2025-05-20 |
Neural Inverse Scattering with Score-based Regularization |
Yuan Gao et.al. |
2505.14560 |
null |
| 2025-05-20 |
Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality Degradation |
Fan Yi et.al. |
2505.14540 |
null |
| 2025-05-20 |
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank |
Tianhe Wu et.al. |
2505.14460 |
link |
| 2025-05-20 |
Accuracy and Fairness of Facial Recognition Technology in Low-Quality Police Images: An Experiment With Synthetic Faces |
Maria Cuellar et.al. |
2505.14320 |
null |
| 2025-05-20 |
Towards Generating Realistic Underwater Images |
Abdul-Kazeem Shamba et.al. |
2505.14296 |
null |
| 2025-05-20 |
High-energy X-ray phase-contrast CT of an adult human chest phantom |
Jannis N. Ahlers et.al. |
2505.14075 |
null |
| 2025-05-21 |
Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings |
Owais Mujtaba Khanday et.al. |
2505.14074 |
link |
| 2025-05-20 |
OmniStyle: Filtering High Quality Style Transfer Data at Scale |
Ye Wang et.al. |
2505.14028 |
null |
| 2025-05-20 |
RLVR-World: Training World Models with Reinforcement Learning |
Jialong Wu et.al. |
2505.13934 |
link |
| 2025-05-20 |
Automated Quality Evaluation of Cervical Cytopathology Whole Slide Images Based on Content Analysis |
Lanlan Kang et.al. |
2505.13875 |
null |
| 2025-05-20 |
Exploring Image Quality Assessment from a New Perspective: Pupil Size |
Yixuan Gao et.al. |
2505.13841 |
null |
| 2025-05-19 |
Learning Wavelet-Sparse FDK for 3D Cone-Beam CT Reconstruction |
Yipeng Sun et.al. |
2505.13579 |
null |
| 2025-05-19 |
Neural-Enhanced Rate Adaptation and Computation Distribution for Emerging mmWave Multi-User 3D Video Streaming Systems |
Babak Badnava et.al. |
2505.13337 |
null |
| 2025-05-19 |
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models |
Lucas Berry et.al. |
2505.13273 |
null |
| 2025-05-19 |
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation |
Seungjun Oh et.al. |
2505.13215 |
link |
| 2025-05-19 |
Combinatorial Sample-and Back-Focal-Plane (BFP) Imaging. Pt. I: Instrument and acquisition parameters affecting BFP images and their analysis |
Omer Shavit et.al. |
2505.13190 |
null |
| 2025-05-19 |
Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model |
Jonas Brenig et.al. |
2505.13152 |
link |
| 2025-05-19 |
ARIW-Framework: Adaptive Robust Iterative Watermarking Framework |
Shaowu Wu et.al. |
2505.13101 |
null |
| 2025-05-19 |
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking |
Zihan Su et.al. |
2505.12667 |
null |
| 2025-05-18 |
DPCD: A Quality Assessment Database for Dynamic Point Clouds |
Yating Liu et.al. |
2505.12431 |
null |
| 2025-05-18 |
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning |
Qianyue Hu et.al. |
2505.12332 |
null |
| 2025-05-18 |
Beyond Single-Point Judgment: Distribution Alignment for LLM-as-a-Judge |
Luyu Chen et.al. |
2505.12301 |
null |
| 2025-05-18 |
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind |
Qingmei Li et.al. |
2505.12207 |
null |
| 2025-05-18 |
CTLformer: A Hybrid Denoising Model Combining Convolutional Layers and Self-Attention for Enhanced CT Image Reconstruction |
Zhiting Zheng et.al. |
2505.12203 |
null |
| 2025-05-17 |
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation |
Jiarui Wang et.al. |
2505.12098 |
link |
| 2025-05-17 |
Accelerating Diffusion-based Super-Resolution with Dynamic Time-Spatial Sampling |
Rui Qin et.al. |
2505.12048 |
null |
| 2025-05-17 |
Towards Comprehensive Argument Analysis in Education: Dataset, Tasks, and Method |
Yupei Ren et.al. |
2505.12028 |
null |
| 2025-05-17 |
BINAQUAL: A Full-Reference Objective Localization Similarity Metric for Binaural Audio |
Davoud Shariat Panah et.al. |
2505.11915 |
link |
| 2025-05-16 |
Semantically-Aware Game Image Quality Assessment |
Kai Zhu et.al. |
2505.11724 |
null |
| 2025-05-16 |
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration |
Haipeng Fang et.al. |
2505.11707 |
null |
| 2025-05-16 |
No Gold Standard, No Problem: Reference-Free Evaluation of Taxonomies |
Pascal Wullschleger et.al. |
2505.11470 |
null |
| 2025-05-16 |
Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement |
Nirjhor Datta et.al. |
2505.11246 |
link |
| 2025-05-16 |
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling |
Yuang Ai et.al. |
2505.11196 |
link |
| 2025-05-16 |
Controlling spatial correlation in k-space interpolation networks for MRI reconstruction: denoising versus apparent blurring |
Istvan Homolya et.al. |
2505.11155 |
null |
| 2025-05-16 |
Pseudo-Label Quality Decoupling and Correction for Semi-Supervised Instance Segmentation |
Jianghang Lin et.al. |
2505.11075 |
null |
| 2025-05-16 |
Shackled Dancing: A Bit-Locked Diffusion Algorithm for Lossless and Controllable Image Steganography |
Tianshuo Zhang et.al. |
2505.10950 |
null |
| 2025-05-16 |
ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications |
Li Qiao et.al. |
2505.10946 |
null |
| 2025-05-16 |
Textured mesh Quality Assessment using Geometry and Color Field Similarity |
Kaifa Yang et.al. |
2505.10824 |
link |
| 2025-05-14 |
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography |
Germani Elodie et.al. |
2505.10579 |
null |
| 2025-05-15 |
Learned Lightweight Smartphone ISP with Unpaired Data |
Andrei Arhire et.al. |
2505.10420 |
link |
| 2025-05-15 |
Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding |
Jianhao Huang et.al. |
2505.10405 |
null |
| 2025-05-15 |
Comparative Analysis of Richardson-Lucy Deconvolution and Data Unfolding with Mean Integrated Square Error Optimization |
Nikolay D. Gagunashvili et.al. |
2505.10283 |
null |
| 2025-05-15 |
Ordered-subsets Multi-diffusion Model for Sparse-view CT Reconstruction |
Pengfei Yu et.al. |
2505.09985 |
null |
| 2025-05-14 |
Conformal Bounds on Full-Reference Image Quality for Imaging Inverse Problems |
Jeffrey Wen et.al. |
2505.09528 |
link |
| 2025-05-14 |
PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement |
Tong Li et.al. |
2505.09196 |
null |
| 2025-05-14 |
TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation |
Zechao Guan et.al. |
2505.09140 |
link |
| 2025-05-13 |
Learning Cocoercive Conservative Denoisers via Helmholtz Decomposition for Poisson Inverse Problems |
Deliang Wei et.al. |
2505.08909 |
null |
| 2025-05-12 |
Towards SFW sampling for diffusion models via external conditioning |
Camilo Carvajal Reyes et.al. |
2505.08817 |
link |
| 2025-05-14 |
WaveGuard: Robust Deepfake Detection and Source Tracing via Dual-Tree Complex Wavelet and Graph Neural Networks |
Ziyuan He et.al. |
2505.08614 |
link |
| 2025-05-14 |
The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning |
Mohamed Lamine Mekhalfi et.al. |
2505.08537 |
null |
| 2025-05-13 |
A Deep Learning-Driven Framework for Inhalation Injury Grading Using Bronchoscopy Images |
Yifan Li et.al. |
2505.08517 |
null |
| 2025-05-13 |
The Evolutionary Map of the Universe: A new radio atlas for the southern hemisphere sky |
A. M. Hopkins et.al. |
2505.08271 |
null |
| 2025-05-13 |
LLM-Based Detection of Tangled Code Changes for Higher-Quality Method-Level Bug Datasets |
Md Nahidul Islam Opu et.al. |
2505.08263 |
null |
| 2025-05-13 |
Removing Watermarks with Partial Regeneration using Semantic Information |
Krti Tallam et.al. |
2505.08234 |
link |
| 2025-05-13 |
Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations |
Petrus H. Zwart et.al. |
2505.08176 |
null |
| 2025-05-12 |
Asymptotically Efficient Data-adaptive Penalized Shrinkage Estimation with Application to Causal Inference |
Herbert P. Susmann et.al. |
2505.08065 |
link |
| 2025-05-12 |
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach |
Ruikun Hou et.al. |
2505.07902 |
null |
| 2025-05-12 |
PtyRAD: A High-performance and Flexible Ptychographic Reconstruction Framework with Automatic Differentiation |
Chia-Hao Lee et.al. |
2505.07814 |
link |
| 2025-05-12 |
Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation |
Dragos-Patru Covei et.al. |
2505.07699 |
null |
| 2025-05-12 |
A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development |
Werner Geyer et.al. |
2505.07664 |
null |
| 2025-05-12 |
Addressing degeneracies in latent interpolation for diffusion models |
Erik Landolsi et.al. |
2505.07481 |
null |
| 2025-05-13 |
Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model |
Wei Li et.al. |
2505.07449 |
link |
| 2025-05-12 |
Few-shot Semantic Encoding and Decoding for Video Surveillance |
Baoping Cheng et.al. |
2505.07381 |
null |
| 2025-05-12 |
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data |
David de-Fitero-Dominguez et.al. |
2505.07372 |
null |
| 2025-05-12 |
GAN-based synthetic FDG PET images from T1 brain MRI can serve to improve performance of deep unsupervised anomaly detection models |
Daria Zotova et.al. |
2505.07364 |
null |
| 2025-05-12 |
Synthetic Similarity Search in Automotive Production |
Christoph Huber et.al. |
2505.07256 |
null |
| 2025-05-12 |
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding |
Dianwen Ng et.al. |
2505.07235 |
link |
| 2025-05-12 |
Metrics that matter: Evaluating image quality metrics for medical image generation |
Yash Deo et.al. |
2505.07175 |
link |
| 2025-05-11 |
Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution |
Zihang Liu et.al. |
2505.07071 |
link |
| 2025-05-11 |
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models |
Junhao Xia et.al. |
2505.07057 |
null |
| 2025-05-11 |
Replay-Based Continual Learning with Dual-Layered Distillation and a Streamlined U-Net for Efficient Text-to-Image Generation |
Md. Naimur Asif Borno et.al. |
2505.06995 |
null |
| 2025-05-10 |
Tuning Butterworth filter's parameters in SPECT reconstructions via kernel-based Bayesian optimization with a no-reference image evaluation metric |
Luca Pastrello et.al. |
2505.06692 |
null |
| 2025-05-10 |
MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning |
Zixian Zhao et.al. |
2505.06665 |
null |
| 2025-05-10 |
Two-Stage Random Alternation Framework for Zero-Shot Pansharpening |
Haorui Chen et.al. |
2505.06576 |
null |
| 2025-05-10 |
Good Things Come in Pairs: Paired Autoencoders for Inverse Problems |
Matthias Chung et.al. |
2505.06549 |
null |
| 2025-05-10 |
HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models |
Shuhan Zhuang et.al. |
2505.06543 |
null |
| 2025-05-10 |
Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes |
Xijie Yang et.al. |
2505.06523 |
null |
| 2025-05-09 |
Towards Facial Image Compression with Consistency Preserving Diffusion Prior |
Yimin Zhou et.al. |
2505.05870 |
null |
| 2025-05-09 |
PICD: Versatile Perceptual Image Compression with Diffusion Rendering |
Tongda Xu et.al. |
2505.05853 |
null |
| 2025-05-09 |
Towards order of magnitude X-ray dose reduction in breast cancer imaging using phase contrast and deep denoising |
Ashkan Pakzad et.al. |
2505.05812 |
link |
| 2025-05-09 |
Hybrid Learning: A Novel Combination of Self-Supervised and Supervised Learning for MRI Reconstruction without High-Quality Training Reference |
Haoyang Pei et.al. |
2505.05703 |
null |
| 2025-05-08 |
A New k-Space Model for Non-Cartesian Fourier Imaging |
Chin-Cheng Chan et.al. |
2505.05647 |
null |
| 2025-05-08 |
Score-based Self-supervised MRI Denoising |
Jiachen Tu et.al. |
2505.05631 |
null |
| 2025-05-08 |
OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours |
Hanie Moghaddasi et.al. |
2505.05531 |
null |
| 2025-05-08 |
Flow-GRPO: Training Flow Matching Models via Online RL |
Jie Liu et.al. |
2505.05470 |
link |
| 2025-05-08 |
A Study on Improvement of Image Quality in Quantum Polarized Microscopy using an Entangled-Photon Source |
Mousume Samad et.al. |
2505.05457 |
null |
| 2025-05-09 |
LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering |
Ran Zhang et.al. |
2505.05423 |
link |
| 2025-05-08 |
EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution |
Haizhen Xie et.al. |
2505.05209 |
null |
| 2025-05-08 |
PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting |
Elad Feldman et.al. |
2505.05183 |
null |
| 2025-05-08 |
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models |
Hongyang Zhu et.al. |
2505.05101 |
null |
| 2025-05-08 |
MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation |
Tengya Peng et.al. |
2505.04959 |
null |
| 2025-05-07 |
Assessing Suburban Air Quality Constraints on Free Cooling in an Irish City |
Paul D. O Sullivan et.al. |
2505.04746 |
null |
| 2025-05-07 |
Securing Immersive 360 Video Streams through Attribute-Based Selective Encryption |
Mohammad Waquas Usmani et.al. |
2505.04466 |
null |
| 2025-05-06 |
dfreproject: A Python package for astronomical reprojection |
Carter Lee Rhea et.al. |
2505.03932 |
link |
| 2025-05-06 |
DISARM++: Beyond scanner-free harmonization |
Luca Caldera et.al. |
2505.03715 |
link |
| 2025-05-06 |
Towards Smart Point-and-Shoot Photography |
Jiawan Li et.al. |
2505.03638 |
null |
| 2025-05-07 |
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision |
Linhan Cao et.al. |
2505.03631 |
link |
| 2025-05-07 |
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model |
Y. B. Wang et.al. |
2505.03603 |
null |
| 2025-05-06 |
Real-Time Person Image Synthesis Using a Flow Matching Model |
Jiwoo Jeong et.al. |
2505.03562 |
link |
| 2025-05-06 |
MRI motion correction via efficient residual-guided denoising diffusion probabilistic models |
Mojtaba Safari et.al. |
2505.03498 |
null |
| 2025-05-06 |
EOPose : Exemplar-based object reposing using Generalized Pose Correspondences |
Sarthak Mehrotra et.al. |
2505.03394 |
null |
| 2025-05-06 |
DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor |
Wei-Ting Chen et.al. |
2505.03261 |
null |
| 2025-05-06 |
Is AI currently capable of identifying wild oysters? A comparison of human annotators against the AI model, ODYSSEE |
Brendan Campbell et.al. |
2505.03108 |
null |
| 2025-05-05 |
NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results |
Nikolay Safonov et.al. |
2505.03007 |
link |
| 2025-05-05 |
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing |
Zinan Guo et.al. |
2505.02823 |
link |
| 2025-05-08 |
Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge |
Vladyslav Zalevskyi et.al. |
2505.02784 |
null |
| 2025-05-05 |
DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction |
Yiqun Lin et.al. |
2505.02628 |
null |
| 2025-05-05 |
Deep learning of personalized priors from past MRI scans enables fast, quality-enhanced point-of-care MRI with low-cost systems |
Tal Oved et.al. |
2505.02470 |
null |
| 2025-05-05 |
MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection |
Jiaqi Zhang et.al. |
2505.02441 |
link |
| 2025-05-04 |
Saliency-Guided Training for Fingerprint Presentation Attack Detection |
Samuel Webster et.al. |
2505.02176 |
null |
| 2025-05-04 |
HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement |
Xiaorui Zhao et.al. |
2505.02134 |
null |
| 2025-05-06 |
Regression is all you need for medical image translation |
Sebastian Rassmann et.al. |
2505.02048 |
link |
| 2025-05-04 |
Hybrid Image Resolution Quality Metric (HIRQM):A Comprehensive Perceptual Image Quality Assessment Framework |
Vineesh Kumar Reddy Mondem et.al. |
2505.02001 |
null |
| 2025-05-03 |
GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting |
Anushka Agarwal et.al. |
2505.01928 |
null |
| 2025-05-03 |
ResiTok: A Resilient Tokenization-Enabled Framework for Ultra-Low-Rate and Robust Image Transmission |
Zhenyu Liu et.al. |
2505.01870 |
null |
| 2025-05-03 |
Continuous Filtered Backprojection by Learnable Interpolation Network |
Hui Lin et.al. |
2505.01768 |
null |
| 2025-05-03 |
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models |
Tobias Domhan et.al. |
2505.01761 |
null |
| 2025-05-03 |
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study |
Tamim Ahmed et.al. |
2505.01680 |
null |
| 2025-05-02 |
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models |
Mohammadreza Teymoorianfard et.al. |
2505.01406 |
link |
| 2025-05-02 |
Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes |
Wallace Peaslee et.al. |
2505.01388 |
link |
| 2025-05-02 |
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis |
Jiangtong Tan et.al. |
2505.01172 |
link |
| 2025-05-02 |
VSC: Visual Search Compositional Text-to-Image Diffusion Model |
Do Huu Dat et.al. |
2505.01104 |
null |
| 2025-05-02 |
Diffuse Optical Ptychography |
Mingwei He et.al. |
2505.01090 |
null |
| 2025-05-02 |
Enhancing Realism in Holographic Augmented Reality Displays through Occlusion Handling |
Woongseob Han et.al. |
2505.00942 |
null |
| 2025-05-01 |
The Comparability of Model Fusion to Measured Data in Confuser Rejection |
Conor Flynn et.al. |
2505.00836 |
null |
| 2025-05-01 |
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution |
Aditya Arora et.al. |
2505.00687 |
null |
| 2025-05-01 |
Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI |
Merve Gülle et.al. |
2505.00643 |
null |
| 2025-05-01 |
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution |
Antoni Bigata et.al. |
2505.00497 |
null |
| 2025-05-01 |
Self-supervised surface-related multiple suppression with multidimensional convolution |
Shijun Cheng et.al. |
2505.00419 |
null |
| 2025-05-01 |
Efficient Neural Video Representation with Temporally Coherent Modulation |
Seungjun Shin et.al. |
2505.00335 |
null |
| 2025-05-01 |
Quaternion Wavelet-Conditioned Diffusion Models for Image Super-Resolution |
Luigi Sigillo et.al. |
2505.00334 |
null |
| 2025-05-01 |
AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality |
Biling Wang et.al. |
2505.00308 |
null |
| 2025-04-30 |
Efficient and robust 3D blind harmonization for large domain gaps |
Hwihun Jeong et.al. |
2505.00133 |
null |
| 2025-04-30 |
From Aesthetics to Human Preferences: Comparative Perspectives of Evaluating Text-to-Music Systems |
Huan Zhang et.al. |
2504.21815 |
null |
| 2025-04-30 |
Anatomical Similarity as a New Metric to Evaluate Brain Generative Models |
Bahram Jafrasteh et.al. |
2504.21771 |
null |
| 2025-04-30 |
Visual Text Processing: A Comprehensive Review and Unified Evaluation |
Yan Shu et.al. |
2504.21682 |
link |
| 2025-04-30 |
Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection |
Liqin Wang et.al. |
2504.21646 |
null |
| 2025-04-30 |
RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations |
Jonas Gwozdz et.al. |
2504.21605 |
null |
| 2025-04-30 |
Latent Feature-Guided Conditional Diffusion for High-Fidelity Generative Image Semantic Communication |
Zehao Chen et.al. |
2504.21577 |
null |
| 2025-04-30 |
The First Theoretical Approximation Guarantees for the Non-Dominated Sorting Genetic Algorithm III (NSGA-III) |
Renzhong Deng et.al. |
2504.21552 |
null |
| 2025-04-30 |
Impairments are Clustered in Latents of Deep Neural Network-based Speech Quality Models |
Fredrik Cumlin et.al. |
2504.21528 |
link |
| 2025-04-30 |
Simple Visual Artifact Detection in Sora-Generated Videos |
Misora Sugiyama et.al. |
2504.21334 |
null |
| 2025-04-30 |
Redundancy Analysis and Mitigation for Machine Learning-Based Process Monitoring of Additive Manufacturing |
Jiarui Xie et.al. |
2504.21317 |
null |
| 2025-04-30 |
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images |
Yunhao Li et.al. |
2504.21308 |
null |
| 2025-04-30 |
High-Fidelity Single-Pixel Imaging at Ultra-Low Sampling Ratios via Physically Enhanced Laguerre Gaussian Encoding |
JunYi Xiong et.al. |
2504.21290 |
null |
| 2025-04-29 |
GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction |
Yuhan Xie et.al. |
2504.21067 |
link |
| 2025-04-29 |
YoChameleon: Personalized Vision and Language Generation |
Thao Nguyen et.al. |
2504.20998 |
null |
| 2025-04-29 |
All-dielectric metasurface polarization scrambler for imaging applications |
Edith Hartmann et.al. |
2504.20727 |
null |
| 2025-04-29 |
Histogram-Probabilistic Multi-Hypothesis Tracking with Integrated Target Existence |
Lukas Herrmann et.al. |
2504.20526 |
null |
| 2025-04-29 |
LMM4Gen3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs |
Woo Yi Yang et.al. |
2504.20466 |
null |
| 2025-04-29 |
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech |
Zhicheng Lian et.al. |
2504.20447 |
null |
| 2025-04-28 |
CompleteMe: Reference-based Human Image Completion |
Yu-Ju Tsai et.al. |
2504.20042 |
null |
| 2025-04-28 |
Enhancing Quality for VVC Compressed Videos with Omniscient Quality Enhancement Model |
Xiem HoangVan et.al. |
2504.19935 |
null |
| 2025-04-28 |
Accelerated 3D-3D rigid registration of echocardiographic images obtained from apical window using particle filter |
Thanuja Uruththirakodeeswaran et.al. |
2504.19930 |
null |
| 2025-04-28 |
Prompt Guiding Multi-Scale Adaptive Sparse Representation-driven Network for Low-Dose CT MAR |
Baoshun Shi et.al. |
2504.19687 |
null |
| 2025-04-27 |
HumMorph: Generalized Dynamic Human Neural Fields from Few Views |
Jakub Zadrożny et.al. |
2504.19390 |
null |
| 2025-04-27 |
Machine Learning-Based Modeling of the Anode Heel Effect in X-ray Beam Monte Carlo Simulations |
Hussein Harb et.al. |
2504.19155 |
null |
| 2025-04-26 |
Calibrating Translation Decoding with Quality Estimation on LLMs |
Di Wu et.al. |
2504.19044 |
link |
| 2025-04-26 |
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models |
Gal Almog et.al. |
2504.18989 |
link |
| 2025-04-26 |
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning |
Yifan Xie et.al. |
2504.18810 |
null |
| 2025-04-25 |
Augmenting Perceptual Super-Resolution via Image Quality Predictors |
Fengjia Zhang et.al. |
2504.18524 |
null |
| 2025-04-25 |
NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration |
Haotian Dong et.al. |
2504.18448 |
null |
| 2025-04-25 |
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation |
Weipeng Tan et.al. |
2504.18087 |
null |
| 2025-04-28 |
From Cluster to Desktop: A Cache-Accelerated INR framework for Interactive Visualization of Tera-Scale Data |
Daniel Zavorotny et.al. |
2504.18001 |
null |
| 2025-04-23 |
Learning Underwater Active Perception in Simulation |
Alexandre Cardaillac et.al. |
2504.17817 |
null |
| 2025-04-24 |
Self-Supervised Noise Adaptive MRI Denoising via Repetition to Repetition (Rep2Rep) Learning |
Nikola Janjušević et.al. |
2504.17698 |
null |
| 2025-04-24 |
Tamper-evident Image using JPEG Fixed Points |
Zhaofeng Si et.al. |
2504.17594 |
null |
| 2025-04-24 |
ESDiff: Encoding Strategy-inspired Diffusion Model with Few-shot Learning for Color Image Inpainting |
Junyan Zhang et.al. |
2504.17524 |
null |
| 2025-04-24 |
Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems |
Jaegang Jo et.al. |
2504.17368 |
null |
| 2025-04-24 |
Scene Perceived Image Perceptual Score (SPIPS): combining global and local perception for image quality assessment |
Zhiqiang Lao et.al. |
2504.17234 |
null |
| 2025-04-23 |
Distilling semantically aware orders for autoregressive image generation |
Rishav Pramanik et.al. |
2504.17069 |
null |
| 2025-04-23 |
Diffusion Probabilistic Models for Compressive SAR Imaging |
Odysseas Pappas et.al. |
2504.17053 |
null |
| 2025-04-23 |
Dense Air Pollution Estimation from Sparse in-situ Measurements and Satellite Data |
Ruben Gonzalez Avilés et.al. |
2504.17039 |
null |
| 2025-04-22 |
TVC: Tokenized Video Compression with Ultra-Low Bitrate |
Lebin Zhou et.al. |
2504.16953 |
null |
| 2025-04-23 |
Beyond Anonymization: Object Scrubbing for Privacy-Preserving 2D and 3D Vision Tasks |
Murat Bilgehan Ertan et.al. |
2504.16557 |
null |
| 2025-04-23 |
ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance |
Ying Li et.al. |
2504.16464 |
null |
| 2025-04-23 |
VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models |
Xuming Hu et.al. |
2504.16359 |
null |
| 2025-04-23 |
Property-Preserving Hashing for $\ell_1$ -Distance Predicates: Applications to Countering Adversarial Input Attacks |
Hassan Asghar et.al. |
2504.16355 |
null |
| 2025-04-22 |
Analytic Fourier ptychotomography for volumetric refractive index imaging |
Zhenyu Dong et.al. |
2504.16247 |
link |
| 2025-04-22 |
Survey of Video Diffusion Models: Foundations, Implementations, and Applications |
Yimu Wang et.al. |
2504.16081 |
link |
| 2025-04-22 |
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning |
Le Zhuo et.al. |
2504.16080 |
null |
| 2025-04-22 |
Boosting Generative Image Modeling via Joint Image-Feature Synthesis |
Theodoros Kouzelis et.al. |
2504.16064 |
null |
| 2025-04-22 |
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment |
Yachun Mi et.al. |
2504.16003 |
null |
| 2025-04-22 |
DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy |
Qirui Yang et.al. |
2504.15756 |
null |
| 2025-04-22 |
You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection |
Jun Dong et.al. |
2504.15694 |
null |
| 2025-04-23 |
Iris: A Next Generation Digital Pathology Rendering Engine |
Ryan Erik Landvater et.al. |
2504.15437 |
link |
| 2025-04-21 |
Plug-and-Play Versatile Compressed Video Enhancement |
Huimin Zeng et.al. |
2504.15380 |
null |
| 2025-04-20 |
Enhancing DR Classification with Swin Transformer and Shifted Window Attention |
Meher Boulaabi et.al. |
2504.15317 |
null |
| 2025-04-21 |
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians |
Cailin Zhuang et.al. |
2504.15281 |
null |
| 2025-04-21 |
Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform |
Xianpan Zhou et.al. |
2504.15182 |
null |
| 2025-04-21 |
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration |
Junyuan Deng et.al. |
2504.15159 |
null |
| 2025-04-21 |
Structure-guided Diffusion Transformer for Low-Light Image Enhancement |
Xiangchen Yin et.al. |
2504.15054 |
null |
| 2025-04-21 |
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: KwaiSR Dataset and Study |
Xin Li et.al. |
2504.15003 |
null |
| 2025-04-21 |
K-DRIFT Preparation: Experimental Verification of an Observation Strategy for Accurate Dark-Sky Flats |
Woowon Byun et.al. |
2504.14914 |
null |
| 2025-04-20 |
Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations |
Yuri Balashov et.al. |
2504.14619 |
null |
| 2025-04-20 |
NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results |
Zheng Chen et.al. |
2504.14600 |
link |
| 2025-04-20 |
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization |
Liang Peng et.al. |
2504.14534 |
link |
| 2025-04-20 |
Pulmonary electrical impedance tomography based on deep recurrent neural networks |
Zhenzhong Song et.al. |
2504.14521 |
null |
| 2025-04-19 |
Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network |
Lu Pan et.al. |
2504.14238 |
null |
| 2025-04-19 |
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models |
Xinlin Zhuang et.al. |
2504.14194 |
null |
| 2025-04-18 |
Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models |
Zhenyu Yu et.al. |
2504.14108 |
null |
| 2025-04-18 |
Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design |
Wei Dong et.al. |
2504.14075 |
link |
| 2025-04-18 |
Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation |
Fulvio Sanguigni et.al. |
2504.14011 |
null |
| 2025-04-18 |
Entropy Rectifying Guidance for Diffusion and Flow Models |
Tariq Berrada Ifriqi et.al. |
2504.13987 |
null |
| 2025-04-18 |
SupResDiffGAN a new approach for the Super-Resolution task |
Dawid Kopeć et.al. |
2504.13622 |
null |
| 2025-04-18 |
Entropic Time Schedulers for Generative Diffusion Models |
Dejan Stancevic et.al. |
2504.13612 |
null |
| 2025-04-18 |
U-Shape Mamba: State Space Model for faster diffusion |
Alex Ergasti et.al. |
2504.13499 |
link |
| 2025-04-17 |
Auto-FEDUS: Autoregressive Generative Modeling of Doppler Ultrasound Signals from Fetal Electrocardiograms |
Alireza Rafiei et.al. |
2504.13233 |
null |
| 2025-04-17 |
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results |
Xin Li et.al. |
2504.13131 |
link |
| 2025-04-18 |
SkyReels-V2: Infinite-length Film Generative Model |
Guibin Chen et.al. |
2504.13074 |
link |
| 2025-04-17 |
Imaging for All-Day Wearable Smart Glasses |
Michael Goesele et.al. |
2504.13060 |
null |
| 2025-04-17 |
TTRD3: Texture Transfer Residual Denoising Dual Diffusion Model for Remote Sensing Image Super-Resolution |
Yide Liu et.al. |
2504.13026 |
link |
| 2025-04-17 |
Efficient Chebyshev Reconstruction for the Anisotropic Equilibrium Model in Magnetic Particle Imaging |
Christine Droigk et.al. |
2504.12981 |
null |
| 2025-04-17 |
Sparks of Science: Hypothesis Generation Using Structured Paper Data |
Charles O'Neill et.al. |
2504.12976 |
null |
| 2025-04-17 |
MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection |
Long Qian et.al. |
2504.12970 |
null |
| 2025-04-17 |
Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal |
Inzamamul Alam et.al. |
2504.12809 |
link |
| 2025-04-17 |
ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior |
Xiao Han et.al. |
2504.12788 |
null |
| 2025-04-17 |
Mask Image Watermarking |
Runyi Hu et.al. |
2504.12739 |
link |
| 2025-04-17 |
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding |
Qianqian Sun et.al. |
2504.12704 |
null |
| 2025-04-17 |
Autonomous Drone for Dynamic Smoke Plume Tracking |
Srijan Kumar Pal et.al. |
2504.12664 |
null |
| 2025-04-17 |
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation |
Lvmin Zhang et.al. |
2504.12626 |
link |
| 2025-04-17 |
AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting |
Xin Su et.al. |
2504.12605 |
null |
| 2025-04-16 |
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework |
Jiale Tao et.al. |
2504.12395 |
link |
| 2025-04-16 |
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction |
Xia Wang et.al. |
2504.12245 |
null |
| 2025-04-16 |
Coding-Prior Guided Diffusion Network for Video Deblurring |
Yike Liu et.al. |
2504.12222 |
null |
| 2025-04-16 |
Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis |
Songping Wang et.al. |
2504.12129 |
null |
| 2025-04-17 |
Understanding Attention Mechanism in Video Diffusion Models |
Bingyan Liu et.al. |
2504.12027 |
null |
| 2025-04-16 |
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching |
Xinli Yue et.al. |
2504.12018 |
null |
| 2025-04-16 |
PCDiff: Proactive Control for Ownership Protection in Diffusion Models with Watermark Compatibility |
Keke Gai et.al. |
2504.11774 |
null |
| 2025-04-17 |
DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment |
Li Yu et.al. |
2504.11733 |
null |
| 2025-04-16 |
Measuring Global Migration Flows using Online Data |
Guanghua Chi et.al. |
2504.11691 |
null |
| 2025-04-15 |
AskQE: Question Answering as Automatic Evaluation for Machine Translation |
Dayeon Ki et.al. |
2504.11582 |
null |
| 2025-04-15 |
A Comparative Evaluation of CT Global Noise Calculation Methods for Clinical Image Quality Assessment |
Charles M Weaver et.al. |
2504.11578 |
null |
| 2025-04-15 |
ADT: Tuning Diffusion Models with Adversarial Supervision |
Dazhong Shen et.al. |
2504.11423 |
null |
| 2025-04-15 |
Ring Artifacts Correction Based on Global-Local Features Interaction Guidance in the Projection Domain |
Yunze Liu et.al. |
2504.11375 |
null |
| 2025-04-16 |
Seedream 3.0 Technical Report |
Yu Gao et.al. |
2504.11346 |
null |
| 2025-04-15 |
Distillation-Supervised Convolutional Low-Rank Adaptation for Efficient Image Super-Resolution |
Xinning Chai et.al. |
2504.11271 |
link |
| 2025-04-15 |
SAR-to-RGB Translation with Latent Diffusion for Earth Observation |
Kaan Aydin et.al. |
2504.11154 |
null |
| 2025-04-15 |
Taming Consistency Distillation for Accelerated Human Image Animation |
Xiang Wang et.al. |
2504.11143 |
null |
| 2025-04-15 |
Document Quality Scoring for Web Crawling |
Francesca Pezzuti et.al. |
2504.11011 |
link |
| 2025-04-15 |
AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent |
Pu Wang et.al. |
2504.10978 |
null |
| 2025-04-15 |
Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models |
Karan Jain et.al. |
2504.10883 |
null |
| 2025-04-15 |
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation |
Hengyu Shi et.al. |
2504.10829 |
null |
| 2025-04-15 |
Efficient and Robust Remote Sensing Image Denoising Using Randomized Approximation of Geodesics' Gramian on the Manifold Underlying the Patch Space |
Kelum Gajamannage et.al. |
2504.10820 |
null |
| 2025-04-15 |
The Art of Audience Engagement: LLM-Based Thin-Slicing of Scientific Talks |
Ralf Schmälzle et.al. |
2504.10768 |
null |
| 2025-04-15 |
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking Study |
Mengdi Wang et.al. |
2504.10267 |
link |
| 2025-04-14 |
Aligning Anime Video Generation with Human Feedback |
Bingwen Zhu et.al. |
2504.10044 |
null |
| 2025-04-14 |
Progressive Transfer Learning for Multi-Pass Fundus Image Restoration |
Uyen Phan et.al. |
2504.10025 |
null |
| 2025-04-14 |
A Theory of Universal Rate-Distortion-Classification Representations for Lossy Compression |
Nam Nguyen et.al. |
2504.09932 |
null |
| 2025-04-14 |
EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise |
Chao Liu et.al. |
2504.09789 |
null |
| 2025-04-13 |
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow |
Kenan Tang et.al. |
2504.09697 |
link |
| 2025-04-13 |
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation |
Xingrui Wang et.al. |
2504.09656 |
null |
| 2025-04-13 |
Trajectory-guided Motion Perception for Facial Expression Quality Assessment in Neurological Disorders |
Shuchao Duan et.al. |
2504.09530 |
link |
| 2025-04-13 |
A Secure Communication Protocol for Remote Keyless Entry System with Adaptive Adjustment of Transmission Parameters |
Jingjing Guo et.al. |
2504.09527 |
null |
| 2025-04-13 |
Enhancing Wide-Angle Image Using Narrow-Angle View of the Same Scene |
Hussain Md. Safwan et.al. |
2504.09455 |
link |
| 2025-04-12 |
Towards Explainable Partial-AIGC Image Quality Assessment |
Jiaying Qian et.al. |
2504.09291 |
null |
| 2025-04-12 |
FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment |
Sijing Wu et.al. |
2504.09255 |
link |
| 2025-04-12 |
Universal Rate-Distortion-Classification Representations for Lossy Compression |
Nam Nguyen et.al. |
2504.09025 |
null |
| 2025-04-11 |
End-to-End Demonstration of Quantum Generative Adversarial Networks for Steel Microstructure Image Augmentation on a Trapped-Ion Quantum Computer |
Samwel Sekwao et.al. |
2504.08728 |
null |
| 2025-04-11 |
Generating Fine Details of Entity Interactions |
Xinyi Gu et.al. |
2504.08714 |
null |
| 2025-04-11 |
Quality evaluation of Tabby coding assistant using real source code snippets |
Marta Borek et.al. |
2504.08650 |
link |
| 2025-04-11 |
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization |
Jialu Li et.al. |
2504.08641 |
null |
| 2025-04-11 |
Shadow Erosion and Nighttime Adaptability for Camera-Based Automated Driving Applications |
Mohamed Sabry et.al. |
2504.08551 |
null |
| 2025-04-11 |
Quality Diversity for Variational Quantum Circuit Optimization |
Maximilian Zorn et.al. |
2504.08459 |
link |
| 2025-04-11 |
A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation |
Dawei Zhou et.al. |
2504.08411 |
null |
| 2025-04-11 |
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft |
Junliang Guo et.al. |
2504.08388 |
null |
| 2025-04-11 |
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs |
Jiarui Wang et.al. |
2504.08358 |
link |
| 2025-04-11 |
All-in-Memory Stochastic Computing using ReRAM |
João Paulo C. de Lima et.al. |
2504.08340 |
null |
| 2025-04-10 |
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects |
Shalini Maiti et.al. |
2504.08125 |
null |
| 2025-04-10 |
SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion |
Yuseon Kim et.al. |
2504.08012 |
link |
| 2025-04-10 |
PixelFlow: Pixel-Space Generative Models with Flow |
Shoufa Chen et.al. |
2504.07963 |
link |
| 2025-04-10 |
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs |
Zijian Zhang et.al. |
2504.07556 |
null |
| 2025-04-10 |
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation |
Tuhin Chakrabarty et.al. |
2504.07532 |
link |
| 2025-04-10 |
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery |
Amirhossein Abaskohi et.al. |
2504.07421 |
link |
| 2025-04-09 |
Dependency Update Adoption Patterns in the Maven Software Ecosystem |
Baltasar Berretta et.al. |
2504.07310 |
null |
| 2025-04-09 |
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution |
Zhe Wang et.al. |
2504.07308 |
link |
| 2025-04-09 |
Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model |
Yingjie Zhou et.al. |
2504.07148 |
null |
| 2025-04-09 |
End2end-ALARA: Approaching the ALARA Law in CT Imaging with End-to-end Learning |
Xi Tao et.al. |
2504.06777 |
null |
| 2025-04-09 |
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism |
Elia Peruzzo et.al. |
2504.06672 |
null |
| 2025-04-10 |
Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image Compression |
Mohsen Jenadeleh et.al. |
2504.06301 |
link |
| 2025-04-08 |
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance |
Jiazi Bu et.al. |
2504.06232 |
null |
| 2025-04-08 |
CamContextI2V: Context-aware Controllable Video Generation |
Luis Denninger et.al. |
2504.06022 |
link |
| 2025-04-08 |
ViralQC: A Tool for Assessing Completeness and Contamination of Predicted Viral Contigs |
Cheng Peng et.al. |
2504.05790 |
link |
| 2025-04-08 |
A Lightweight Multi-Module Fusion Approach for Korean Character Recognition |
Inho Jake Park et.al. |
2504.05770 |
null |
| 2025-04-08 |
STRIVE: A Think & Improve Approach with Iterative Refinement for Enhancing Question Quality Estimation |
Aniket Deroy et.al. |
2504.05693 |
null |
| 2025-04-07 |
Improved Stochastic Texture Filtering Through Sample Reuse |
Bartlomiej Wronski et.al. |
2504.05562 |
null |
| 2025-04-07 |
Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling |
Tasmiah Haque et.al. |
2504.05537 |
null |
| 2025-04-07 |
L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery |
Yi-Zhen Tsai et.al. |
2504.05517 |
link |
| 2025-04-07 |
Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects |
Gal Fiebelman et.al. |
2504.05296 |
null |
| 2025-04-07 |
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion |
Xingyu Hu et.al. |
2504.05164 |
null |
| 2025-04-07 |
Content-Distortion High-Order Interaction for Blind Image Quality Assessment |
Shuai Liu et.al. |
2504.05076 |
null |
| 2025-04-07 |
Low-Rate Semantic Communication with Codebook-based Conditional Generative Models |
Kailang Ye et.al. |
2504.04977 |
null |
| 2025-04-07 |
Video-Bench: Human-Aligned Video Generation Benchmark |
Hui Han et.al. |
2504.04907 |
null |
| 2025-04-07 |
Bidirectional Hierarchical Protein Multi-Modal Representation Learning |
Xuefeng Liu et.al. |
2504.04770 |
null |
| 2025-04-06 |
BrainMRDiff: A Diffusion Model for Anatomically Consistent Brain MRI Synthesis |
Moinak Bhattacharya et.al. |
2504.04532 |
null |
| 2025-04-06 |
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency |
Shiyan Liu et.al. |
2504.04427 |
null |
| 2025-04-05 |
Multi-identity Human Image Animation with Structural Video Diffusion |
Zhenzhi Wang et.al. |
2504.04126 |
null |
| 2025-04-05 |
Mapping at First Sense: A Lightweight Neural Network-Based Indoor Structures Prediction Method for Robot Autonomous Exploration |
Haojia Gao et.al. |
2504.04061 |
null |
| 2025-04-05 |
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs |
Wasi Uddin Ahmad et.al. |
2504.04030 |
null |
| 2025-04-05 |
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion |
Maksim Siniukov et.al. |
2504.04010 |
null |
| 2025-04-04 |
From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images |
Maliheh Toozandehjani et.al. |
2504.03807 |
null |
| 2025-04-04 |
Quantifying the uncertainty of model-based synthetic image quality metrics |
Ciaran Bench et.al. |
2504.03623 |
null |
| 2025-04-04 |
Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal |
Yuyang Hu et.al. |
2504.03607 |
null |
| 2025-04-04 |
BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution |
Zihao He et.al. |
2504.03490 |
null |
| 2025-04-04 |
NeRFlex: Resource-aware Real-time High-quality Rendering of Complex Scenes on Mobile Devices |
Zhe Wang et.al. |
2504.03415 |
null |
| 2025-04-04 |
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation |
Joao Prazeres et.al. |
2504.03381 |
null |
| 2025-04-04 |
Space-Time Encoded Modulation for High-Fidelity Diffuse Optical Imaging |
Ben Wiesel et.al. |
2504.03246 |
null |
| 2025-04-04 |
Three Forensic Cues for JPEG AI Images |
Sandra Bergmann et.al. |
2504.03191 |
null |
| 2025-04-04 |
FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge |
Kahim Wong et.al. |
2504.03128 |
link |
| 2025-04-03 |
Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization |
Haishan Wang et.al. |
2504.03059 |
link |
| 2025-04-03 |
Fuzzy Implicative Rules: A Unified Approach |
Raquel Fernandez-Peralta et.al. |
2504.03000 |
null |
| 2025-04-03 |
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization |
Kangle Deng et.al. |
2504.02817 |
null |
| 2025-04-03 |
Development of Automated Data Quality Assessment and Evaluation Indices by Analytical Experience |
Yuka Haruki et.al. |
2504.02663 |
null |
| 2025-04-03 |
Charm: The Missing Piece in ViT fine-tuning for Image Aesthetic Assessment |
Fatemeh Behrad et.al. |
2504.02522 |
link |
| 2025-04-03 |
MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields |
Yash Kulthe et.al. |
2504.02517 |
null |
| 2025-04-03 |
Translation of Fetal Brain Ultrasound Images into Pseudo-MRI Images using Artificial Intelligence |
Naomi Silverstein et.al. |
2504.02408 |
null |
| 2025-04-03 |
SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW |
Masakazu Yoshimura et.al. |
2504.02345 |
null |
| 2025-04-03 |
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation |
Yuan Zhou et.al. |
2504.02316 |
link |
| 2025-04-03 |
Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization |
Samuel Fernández-Menduiña et.al. |
2504.02216 |
null |
| 2025-04-02 |
Foreground Focus: Enhancing Coherence and Fidelity in Camouflaged Image Generation |
Pei-Chi Chen et.al. |
2504.02180 |
null |
| 2025-04-02 |
BioAtt: Anatomical Prior Driven Low-Dose CT Denoising |
Namhun Kim et.al. |
2504.01662 |
null |
| 2025-04-02 |
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning |
Yiting Lu et.al. |
2504.01655 |
link |
| 2025-04-02 |
RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars |
Yahui Li et.al. |
2504.01559 |
null |
| 2025-04-02 |
Multi-Marker Similarity enables reduced-reference and interpretable image quality assessment in optical microscopy |
Elena Corbetta et.al. |
2504.01537 |
null |
| 2025-04-02 |
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment |
Ziteng Cui et.al. |
2504.01503 |
link |
| 2025-04-02 |
FlowMotion: Target-Predictive Flow Matching for Realistic Text-Driven Human Motion Generation |
Manolo Canales Cuba et.al. |
2504.01338 |
null |
| 2025-04-01 |
FUSION: Frequency-guided Underwater Spatial Image recOnstructioN |
Jaskaran Singh Walia et.al. |
2504.01243 |
null |
| 2025-04-01 |
A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates |
Gonçalo Gomes et.al. |
2504.01225 |
null |
| 2025-04-01 |
Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery |
Nicholas Clark et.al. |
2504.01205 |
null |
| 2025-04-01 |
Video Quality Assessment for Resolution Cross-Over in Live Sports |
Jingwen Zhu et.al. |
2504.01190 |
null |
| 2025-04-01 |
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations |
Yubo Wang et.al. |
2504.00824 |
null |
| 2025-04-01 |
The GLASS-JWST Early Release Science Programme: The NIRISS Spectroscopic Catalogue |
Peter J. Watson et.al. |
2504.00823 |
link |
| 2025-04-01 |
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting |
Hyunwoo Park et.al. |
2504.00773 |
null |
| 2025-04-01 |
Enhancing Fundus Image-based Glaucoma Screening via Dynamic Global-Local Feature Integration |
Yuzhuo Zhou et.al. |
2504.00431 |
null |
| 2025-03-31 |
Bayesian Imaging of Interferometric Data from Polarized Electromagnetic Signals |
Philipp Arras et.al. |
2504.00227 |
null |
| 2025-03-31 |
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation |
Shengqiong Wu et.al. |
2503.24379 |
null |
| 2025-03-31 |
ERUPT: Efficient Rendering with Unposed Patch Transformer |
Maxim V. Shugaev et.al. |
2503.24374 |
null |
| 2025-03-31 |
StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting |
Shakiba Kheradmand et.al. |
2503.24366 |
null |
| 2025-03-31 |
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting |
Seungjun Lee et.al. |
2503.24210 |
null |
| 2025-03-31 |
FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment |
Ruisheng Han et.al. |
2503.23911 |
link |
| 2025-03-31 |
Training-Free Text-Guided Image Editing with Visual Autoregressive Model |
Yufei Wang et.al. |
2503.23897 |
link |
| 2025-04-01 |
Learned Image Compression and Restoration for Digital Pathology |
SeonYeong Lee et.al. |
2503.23862 |
link |
| 2025-03-30 |
What Makes an Evaluation Useful? Common Pitfalls and Best Practices |
Gil Gekker et.al. |
2503.23424 |
null |
| 2025-03-30 |
Improving underwater semantic segmentation with underwater image quality attention and muti-scale aggregation attention |
Xin Zuo et.al. |
2503.23422 |
link |
| 2025-03-30 |
Visual Acuity Consistent Foveated Rendering towards Retinal Resolution |
Zhi Zhang et.al. |
2503.23410 |
null |
| 2025-03-30 |
Map Feature Perception Metric for Map Generation Quality Assessment and Loss Optimization |
Chenxing Sun et.al. |
2503.23370 |
null |
| 2025-03-29 |
NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations |
Zhenyu Tang et.al. |
2503.23162 |
null |
| 2025-03-29 |
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing |
Zijun Ding et.al. |
2503.23039 |
link |
| 2025-03-28 |
Concept and Demonstration of a Low-cost Compact Electron Microscope Enabled by a Photothermionic Carbon Nanotube Cathode |
Casimir Kuzyk et.al. |
2503.22910 |
null |
| 2025-03-28 |
Learning to Reason for Long-Form Story Generation |
Alexander Gurung et.al. |
2503.22828 |
link |
| 2025-03-28 |
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning |
Weiqi Li et.al. |
2503.22679 |
link |
| 2025-03-28 |
Evaluation of Machine-generated Biomedical Images via A Tally-based Similarity Measure |
Frank J. Brooks et.al. |
2503.22658 |
null |
| 2025-03-28 |
RELD: Regularization by Latent Diffusion Models for Image Restoration |
Pasquale Cascarano et.al. |
2503.22563 |
null |
| 2025-03-28 |
Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance |
Christian Steinhauser et.al. |
2503.22375 |
null |
| 2025-03-28 |
Imperceptible but Forgeable: Practical Invisible Watermark Forgery via Diffusion Models |
Ziping Dong et.al. |
2503.22330 |
null |
| 2025-03-28 |
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion |
Songsong Yu et.al. |
2503.22262 |
null |
| 2025-03-27 |
Multispectral Demosaicing via Dual Cameras |
SaiKiran Tedla et.al. |
2503.22026 |
null |
| 2025-03-27 |
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video |
David Yifan Yao et.al. |
2503.21761 |
link |
| 2025-03-27 |
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework |
Qi Qin et.al. |
2503.21758 |
link |
| 2025-03-27 |
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models |
Yuhan Zhang et.al. |
2503.21745 |
null |
| 2025-03-27 |
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance |
Jaywon Koo et.al. |
2503.21721 |
null |
| 2025-03-27 |
Audio-driven Gesture Generation via Deviation Feature in the Latent Space |
Jiahui Chen et.al. |
2503.21616 |
null |
| 2025-03-27 |
In vivo dynamic optical coherence tomography of human skin with hardware- and software-based motion correction |
Yu Guo et.al. |
2503.21384 |
link |
| 2025-03-27 |
Zero-Shot Visual Concept Blending Without Text Guidance |
Hiroya Makino et.al. |
2503.21277 |
link |
| 2025-03-27 |
Reducing CT Metal Artifacts by Learning Latent Space Alignment with Gemstone Spectral Imaging Data |
Wencheng Han et.al. |
2503.21259 |
null |
| 2025-03-26 |
Generalized Ray Tracing with Basis functions for Tomographic Projections |
Youssef Haouchat et.al. |
2503.20907 |
null |
| 2025-03-26 |
Debiasing Kernel-Based Generative Models |
Tian Qin et.al. |
2503.20825 |
null |
| 2025-03-27 |
Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy |
Yinan Sun et.al. |
2503.20673 |
null |
| 2025-03-26 |
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization |
Jiale Cheng et.al. |
2503.20491 |
link |
| 2025-03-26 |
Adaptive Local Clustering over Attributed Graphs |
Haoran Zheng et.al. |
2503.20488 |
link |
| 2025-03-26 |
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability |
Yingdong Shi et.al. |
2503.20483 |
null |
| 2025-03-26 |
3D Convolutional Neural Networks for Improved Detection of Intracranial bleeding in CT Imaging |
Bargava Subramanian et.al. |
2503.20306 |
null |
| 2025-03-26 |
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model |
Yuhan Wang et.al. |
2503.20297 |
null |
| 2025-03-26 |
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions |
Siyin Wang et.al. |
2503.20290 |
null |
| 2025-03-26 |
EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation |
Ziran Zhang et.al. |
2503.20268 |
link |
| 2025-03-25 |
Scaling Down Text Encoders of Text-to-Image Diffusion Models |
Lifu Wang et.al. |
2503.19897 |
link |
| 2025-03-25 |
LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset |
Manjushree Aithal et.al. |
2503.19804 |
null |
| 2025-03-25 |
SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation |
Jingdan Kang et.al. |
2503.19791 |
link |
| 2025-03-25 |
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction |
Chengjie Ge et.al. |
2503.19721 |
null |
| 2025-03-25 |
Improved tissue sodium concentration quantification in breast cancer by reducing partial volume effects: a preliminary study |
Olgica Zaric et.al. |
2503.19570 |
null |
| 2025-03-25 |
Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution |
Xiaohui Sun et.al. |
2503.19505 |
null |
| 2025-03-25 |
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset |
Haiyu Zhang et.al. |
2503.19462 |
null |
| 2025-03-26 |
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting |
Jiaxin Zhang et.al. |
2503.19443 |
link |
| 2025-03-25 |
Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment |
Guanglu Dong et.al. |
2503.19295 |
link |
| 2025-03-25 |
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing |
Ruiyi Wang et.al. |
2503.19262 |
link |
| 2025-03-24 |
Latent Space Class Dispersion: Effective Test Data Quality Assessment for DNNs |
Vivek Vekariya et.al. |
2503.18799 |
null |
| 2025-03-24 |
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition |
Yifei Zhang et.al. |
2503.18746 |
link |
| 2025-03-24 |
Generative Dataset Distillation using Min-Max Diffusion Model |
Junqiao Fan et.al. |
2503.18626 |
null |
| 2025-03-25 |
AMD-Hummingbird: Towards an Efficient Text-to-Video Model |
Takashi Isobe et.al. |
2503.18559 |
link |
| 2025-03-24 |
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation |
Qiang Qu et.al. |
2503.18552 |
null |
| 2025-03-24 |
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model |
Leheng Zhang et.al. |
2503.18512 |
null |
| 2025-03-24 |
MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing |
Lingting Zhu et.al. |
2503.18461 |
null |
| 2025-03-24 |
Panorama Generation From NFoV Image Done Right |
Dian Zheng et.al. |
2503.18420 |
link |
| 2025-03-24 |
Limited-angle SPECT image reconstruction using deep image prior |
Kensuke Hori et.al. |
2503.18342 |
null |
| 2025-03-23 |
Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance |
Harang Ju et.al. |
2503.18238 |
link |
| 2025-03-23 |
TCFG: Tangential Damping Classifier-free Guidance |
Mingi Kwon et.al. |
2503.18137 |
null |
| 2025-03-23 |
Real-World Remote Sensing Image Dehazing: Benchmark and Baseline |
Zeng-Hui Zhu et.al. |
2503.17966 |
link |
| 2025-03-23 |
Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach |
Zhi Zhang et.al. |
2503.17937 |
null |
| 2025-03-23 |
Guided Diffusion for the Extension of Machine Vision to Human Visual Perception |
Takahiro Shindo et.al. |
2503.17907 |
null |
| 2025-03-22 |
DVG-Diffusion: Dual-View Guided Diffusion Model for CT Reconstruction from X-Rays |
Xing Xie et.al. |
2503.17804 |
null |
| 2025-03-22 |
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes |
Sharan Maiya et.al. |
2503.17755 |
null |
| 2025-03-22 |
MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability |
Paul Hill et.al. |
2503.17700 |
null |
| 2025-03-22 |
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion |
Jinyuan Liu et.al. |
2503.17673 |
link |
| 2025-03-21 |
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks |
Bhishma Dedhia et.al. |
2503.17539 |
null |
| 2025-03-21 |
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing |
Tianwen Zhou et.al. |
2503.17488 |
link |
| 2025-03-21 |
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images |
Jie Mei et.al. |
2503.17261 |
link |
| 2025-03-21 |
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields |
Kwan Yun et.al. |
2503.17095 |
link |
| 2025-03-21 |
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation |
Tao Feng et.al. |
2503.16989 |
null |
| 2025-03-21 |
Uncertainty-Driven Modeling of Microporosity and Permeability in Clastic Reservoirs Using Random Forest |
Muhammad Risha et.al. |
2503.16957 |
null |
| 2025-03-21 |
MagicColor: Multi-Instance Sketch Colorization |
Yinhan Zhang et.al. |
2503.16948 |
null |
| 2025-03-21 |
Design of 3D Non-Cartesian Trajectories for Fast Volumetric MRI via Analytic Coordinate Discretization |
Kwang Eun Jang et.al. |
2503.16918 |
null |
| 2025-03-21 |
Depth-Aided Color Image Inpainting in Quaternion Domain |
Shunki Tatsumi et.al. |
2503.16818 |
null |
| 2025-03-21 |
A-IDE : Agent-Integrated Denoising Experts |
Uihyun Cho et.al. |
2503.16780 |
null |
| 2025-03-21 |
On Explaining (Large) Language Models For Code Using Global Code-Based Explanations |
David N. Palacio et.al. |
2503.16771 |
null |
| 2025-03-20 |
SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality |
Chiara Schiavo et.al. |
2503.16747 |
null |
| 2025-03-20 |
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention |
Philipp Becker et.al. |
2503.16726 |
null |
| 2025-03-20 |
Euclid: Star clusters in IC 342, NGC 2403, and Holmberg II |
S. S. Larsen et.al. |
2503.16637 |
null |
| 2025-03-20 |
Fed-NDIF: A Noise-Embedded Federated Diffusion Model For Low-Count Whole-Body PET Denoising |
Yinchi Zhou et.al. |
2503.16635 |
null |
| 2025-03-20 |
A Recipe for Generating 3D Worlds From a Single Image |
Katja Schwarz et.al. |
2503.16611 |
null |
| 2025-03-20 |
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering |
Yuheng Yuan et.al. |
2503.16422 |
null |
| 2025-03-20 |
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance |
Quanhao Li et.al. |
2503.16421 |
null |
| 2025-03-20 |
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity |
Liming Jiang et.al. |
2503.16418 |
link |
| 2025-03-20 |
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos |
Haolin Yang et.al. |
2503.16400 |
null |
| 2025-03-20 |
Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images |
Shengjun Zhang et.al. |
2503.16338 |
null |
| 2025-03-20 |
Enhancing Software Quality Assurance with an Adaptive Differential Evolution based Quantum Variational Autoencoder-Transformer Model |
Seshu Babu Barma et.al. |
2503.16335 |
null |
| 2025-03-20 |
Do image and video quality metrics model low-level human vision? |
Dounia Hammou et.al. |
2503.16264 |
null |
| 2025-03-20 |
Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal |
Xiangyu Li et.al. |
2503.16165 |
link |
| 2025-03-20 |
Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems |
Shenbin Qian et.al. |
2503.16158 |
link |
| 2025-03-20 |
3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge |
Marek Wodzinski et.al. |
2503.16075 |
null |
| 2025-03-20 |
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion |
Longbin Ji et.al. |
2503.16068 |
null |
| 2025-03-20 |
Single Image Iterative Subject-driven Generation and Editing |
Yair Shpitzer et.al. |
2503.16025 |
link |
| 2025-03-20 |
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli |
Pengyu Liu et.al. |
2503.15978 |
null |
| 2025-03-20 |
GraPLUS: Graph-based Placement Using Semantics for Image Composition |
Mir Mohammad Khaleghi et.al. |
2503.15761 |
null |
| 2025-03-19 |
5D free-running, reconstruction, variable projection, ADMM, VPAL |
Yitong Yang et.al. |
2503.15711 |
null |
| 2025-03-19 |
Toward task-driven satellite image super-resolution |
Maciej Ziaja et.al. |
2503.15474 |
null |
| 2025-03-19 |
Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis |
Fereshteh Forghani et.al. |
2503.15412 |
null |
| 2025-03-19 |
Boosting HDR Image Reconstruction via Semantic Knowledge Transfer |
Qingsen Yan et.al. |
2503.15361 |
null |
| 2025-03-19 |
Euclid Quick Data Release (Q1): VIS processing and data products |
Euclid Collaboration et.al. |
2503.15303 |
null |
| 2025-03-19 |
Automated Non-Functional Requirements Generation in Software Engineering with Large Language Models: A Comparative Study |
Jomar Thomas Almonte et.al. |
2503.15248 |
null |
| 2025-03-19 |
3D Engine-ready Photorealistic Avatars via Dynamic Textures |
Yifan Wang et.al. |
2503.14943 |
null |
| 2025-03-19 |
FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis |
Yaofei Duan et.al. |
2503.14906 |
null |
| 2025-03-19 |
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models |
Hengkang Wang et.al. |
2503.14863 |
null |
| 2025-03-19 |
ClimateGS: Real-Time Climate Simulation with 3D Gaussian Style Transfer |
Yuezhen Xie et.al. |
2503.14845 |
null |
| 2025-03-18 |
Involution and BSConv Multi-Depth Distillation Network for Lightweight Image Super-Resolution |
Akram Khatami-Rizi et.al. |
2503.14779 |
null |
| 2025-03-18 |
A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising |
Jonas Dornbusch et.al. |
2503.14654 |
null |
| 2025-03-18 |
The Power of Context: How Multimodality Improves Image Super-Resolution |
Kangfu Mei et.al. |
2503.14503 |
null |
| 2025-03-18 |
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing |
Yulin Pan et.al. |
2503.14482 |
null |
| 2025-03-18 |
Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation |
Umar Farooq et.al. |
2503.14475 |
null |
| 2025-03-18 |
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment |
Chao Wang et.al. |
2503.14358 |
null |
| 2025-03-18 |
Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation |
Gillian M Raab et.al. |
2503.14211 |
null |
| 2025-03-18 |
RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation |
Zhang Chen et.al. |
2503.14154 |
null |
| 2025-03-18 |
Towards properties of adversarial image perturbations |
Egor Kuznetsov et.al. |
2503.14111 |
null |
| 2025-03-18 |
Image-Based Metrics in Ultrasound for Estimation of Global Speed-of-Sound |
Roman Denkin et.al. |
2503.14094 |
null |
| 2025-03-18 |
Fast Autoregressive Video Generation with Diagonal Decoding |
Yang Ye et.al. |
2503.14070 |
null |
| 2025-03-18 |
YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction |
Ziyu Lin et.al. |
2503.13883 |
null |
| 2025-03-17 |
Zero-Shot Denoising for Fluorescence Lifetime Imaging Microscopy with Intensity-Guided Learning |
Hao Chen et.al. |
2503.13779 |
link |
| 2025-03-17 |
FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models |
Minghan Li et.al. |
2503.13684 |
null |
| 2025-03-17 |
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation |
Daniil Selikhanovych et.al. |
2503.13358 |
null |
| 2025-03-17 |
MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis |
Shitong Shao et.al. |
2503.13319 |
null |
| 2025-03-19 |
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis |
Luxi Chen et.al. |
2503.13265 |
null |
| 2025-03-17 |
Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks |
Amit Zalcher et.al. |
2503.13260 |
null |
| 2025-03-17 |
MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis |
Marvin Seyfarth et.al. |
2503.13211 |
null |
| 2025-03-18 |
Rethinking Image Evaluation in Super-Resolution |
Shaolin Su et.al. |
2503.13074 |
null |
| 2025-03-17 |
DehazeMamba: SAR-guided Optical Remote Sensing Image Dehazing with Adaptive State Space Model |
Zhicheng Zhao et.al. |
2503.13073 |
null |
| 2025-03-17 |
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models |
Dewei Zhou et.al. |
2503.12885 |
null |
| 2025-03-17 |
CompMarkGS: Robust Watermarking for Compression 3D Gaussian Splatting |
Sumin In et.al. |
2503.12836 |
null |
| 2025-03-17 |
R3-Avatar: Record and Retrieve Temporal Codebook for Reconstructing Photorealistic Human Avatars |
Yifan Zhan et.al. |
2503.12751 |
null |
| 2025-03-17 |
GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching |
Feng Qiao et.al. |
2503.12720 |
link |
| 2025-03-16 |
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation |
Tao Feng et.al. |
2503.12600 |
null |
| 2025-03-16 |
BalancedDPO: Adaptive Multi-Metric Alignment |
Dipesh Tamboli et.al. |
2503.12575 |
null |
| 2025-03-16 |
Segment Any-Quality Images with Generative Latent Space Enhancement |
Guangqian Guo et.al. |
2503.12507 |
null |
| 2025-03-16 |
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models |
Jiakang Chen et.al. |
2503.12484 |
null |
| 2025-03-16 |
Pathology Image Restoration via Mixture of Prompts |
Jiangdong Cai et.al. |
2503.12399 |
link |
| 2025-03-15 |
DLA-Count: Dynamic Label Assignment Network for Dense Cell Distribution Counting |
Yuqing Yan et.al. |
2503.12063 |
null |
| 2025-03-15 |
MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models |
Yuchen Xia et.al. |
2503.11972 |
null |
| 2025-03-14 |
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation |
Hongxiang Zhao et.al. |
2503.11423 |
null |
| 2025-03-14 |
TransiT: Transient Transformer for Non-line-of-sight Videography |
Ruiqian Li et.al. |
2503.11328 |
null |
| 2025-03-14 |
Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking |
Ziyi Wang et.al. |
2503.11324 |
null |
| 2025-03-14 |
Leveraging Diffusion Knowledge for Generative Image Compression with Fractal Frequency-Aware Band Learning |
Lingyu Zhu et.al. |
2503.11321 |
null |
| 2025-03-14 |
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption |
Du Chen et.al. |
2503.11221 |
null |
| 2025-03-14 |
Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement |
Yini Li et.al. |
2503.11175 |
link |
| 2025-03-14 |
Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective |
Guanhua Zheng et.al. |
2503.11160 |
null |
| 2025-03-14 |
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior |
Zichen Tang et.al. |
2503.11143 |
link |
| 2025-03-14 |
MobiVital: Self-supervised Time-series Quality Estimation for Contactless Respiration Monitoring Using UWB Radar |
Ziqi Wang et.al. |
2503.11064 |
link |
| 2025-03-14 |
Comparative Analysis of Advanced AI-based Object Detection Models for Pavement Marking Quality Assessment during Daytime |
Gian Antariksa et.al. |
2503.11008 |
null |
| 2025-03-13 |
Statistical Analysis of Sentence Structures through ASCII, Lexical Alignment and PCA |
Abhijeet Sahdev et.al. |
2503.10470 |
null |
| 2025-03-13 |
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models |
Yijing Lin et.al. |
2503.10406 |
null |
| 2025-03-13 |
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment |
Hao Zhou et.al. |
2503.10287 |
null |
| 2025-03-13 |
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception |
Yunpeng Qu et.al. |
2503.10259 |
link |
| 2025-03-13 |
Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction |
Thomas Sanchez et.al. |
2503.10156 |
link |
| 2025-03-13 |
Image Quality Assessment: From Human to Machine Preference |
Chunyi Li et.al. |
2503.10078 |
link |
| 2025-03-12 |
Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos |
Riku Takahashi et.al. |
2503.09787 |
null |
| 2025-03-12 |
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models |
Sangwon Jang et.al. |
2503.09669 |
null |
| 2025-03-12 |
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster |
Shitong Shao et.al. |
2503.09662 |
link |
| 2025-03-12 |
Fair Federated Medical Image Classification Against Quality Shift via Inter-Client Progressive State Matching |
Nannan Wu et.al. |
2503.09587 |
link |
| 2025-03-12 |
FCaS: Fine-grained Cardiac Image Synthesis based on 3D Template Conditional Diffusion Model |
Jiahao Xia et.al. |
2503.09560 |
null |
| 2025-03-12 |
Multi-Agent Image Restoration |
Xu Jiang et.al. |
2503.09403 |
null |
| 2025-03-12 |
Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models |
Xiaozhen Qiao et.al. |
2503.09394 |
null |
| 2025-03-12 |
PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling |
Nikolai Körber et.al. |
2503.09368 |
link |
| 2025-03-12 |
Fully-Synthetic Training for Visual Quality Inspection in Automotive Production |
Christoph Huber et.al. |
2503.09354 |
null |
| 2025-03-12 |
Unified Dense Prediction of Video Diffusion |
Lehan Yang et.al. |
2503.09344 |
null |
| 2025-03-12 |
Experimental study of the first telescope with a toroidal curved detector |
Eduard Muslimov et.al. |
2503.09300 |
null |
| 2025-03-12 |
IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond |
Peng Hu et.al. |
2503.09294 |
null |
| 2025-03-12 |
Better Together: Unified Motion Capture and 3D Avatar Reconstruction |
Arthur Moreau et.al. |
2503.09293 |
null |
| 2025-03-12 |
Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets |
Hannah Kniesel et.al. |
2503.09221 |
null |
| 2025-03-12 |
Teaching LMMs for Image Quality Scoring and Interpreting |
Zicheng Zhang et.al. |
2503.09197 |
link |
| 2025-03-11 |
Residual Learning and Filtering Networks for End-to-End Lossless Video Compression |
Md baharul Islam et.al. |
2503.08819 |
null |
| 2025-03-11 |
Posterior-Mean Denoising Diffusion Model for Realistic PET Image Reconstruction |
Yiran Sun et.al. |
2503.08546 |
null |
| 2025-03-11 |
Segmentation-Guided CT Synthesis with Pixel-Wise Conformal Uncertainty Bounds |
David Vallmanya Poch et.al. |
2503.08515 |
null |
| 2025-03-11 |
NullFace: Training-Free Localized Face Anonymization |
Han-Wei Kung et.al. |
2503.08478 |
link |
| 2025-03-11 |
DG16M: A Large-Scale Dataset for Dual-Arm Grasping with Force-Optimized Grasps |
Md Faizal Karim et.al. |
2503.08358 |
null |
| 2025-03-11 |
Pathology-Aware Adaptive Watermarking for Text-Driven Medical Image Synthesis |
Chanyoung Kim et.al. |
2503.08346 |
null |
| 2025-03-11 |
Diffusion Transformer Meets Random Masks: An Advanced PET Reconstruction Framework |
Bin Huang et.al. |
2503.08339 |
null |
| 2025-03-11 |
Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution |
Xinyi Liu et.al. |
2503.08300 |
null |
| 2025-03-11 |
PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net |
Jun Yin et.al. |
2503.08276 |
null |
| 2025-03-11 |
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting |
Junfu Guo et.al. |
2503.08135 |
null |
| 2025-03-10 |
Artificial Intelligence in Deliberation: The AI Penalty and the Emergence of a New Deliberative Divide |
Andreas Jungherr et.al. |
2503.07690 |
null |
| 2025-03-10 |
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts |
Minwen Liao et.al. |
2503.07417 |
null |
| 2025-03-10 |
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models |
Ouxiang Li et.al. |
2503.07392 |
link |
| 2025-03-10 |
Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation |
Zhi Qin et.al. |
2503.07032 |
null |
| 2025-03-10 |
Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition |
Xinyu Xi et.al. |
2503.06978 |
null |
| 2025-03-09 |
GenDR: Lightning Generative Detail Restorator |
Yan Wang et.al. |
2503.06790 |
null |
| 2025-03-09 |
Unsupervised Multi-Clustering and Decision-Making Strategies for 4D-STEM Orientation Mapping |
Junhao Cao et.al. |
2503.06699 |
null |
| 2025-03-09 |
PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation |
Yanjie Pan et.al. |
2503.06684 |
null |
| 2025-03-09 |
Learning Few-Step Diffusion Models by Trajectory Distribution Matching |
Yihong Luo et.al. |
2503.06674 |
link |
| 2025-03-09 |
The New CMS Measure of Excessive Radiation Dose or Inadequate CT Image Quality: Methods for Size-Adjusted Dose and Their Variabilities |
Gary Y Ge et.al. |
2503.06644 |
null |
| 2025-03-09 |
One-Step Diffusion Model for Image Motion-Deblurring |
Xiaoyang Liu et.al. |
2503.06537 |
link |
| 2025-03-08 |
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model |
Xiang Gao et.al. |
2503.06186 |
null |
| 2025-03-08 |
BioMoDiffuse: Physics-Guided Biomechanical Diffusion for Controllable and Authentic Human Motion Synthesis |
Zixi Kang et.al. |
2503.06151 |
null |
| 2025-03-08 |
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model |
Mingxing Li et.al. |
2503.06141 |
null |
| 2025-03-08 |
Viewport-Unaware Blind Omnidirectional Image Quality Assessment: A Flexible and Effective Paradigm |
Jiebin Yan et.al. |
2503.06129 |
link |
| 2025-03-08 |
Feature Fusion Attention Network with CycleGAN for Image Dehazing, De-Snowing and De-Raining |
Akshat Jain et.al. |
2503.06107 |
null |
| 2025-03-07 |
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice |
Hongwei Yi et.al. |
2503.05978 |
null |
| 2025-03-07 |
LapLoss: Laplacian Pyramid-based Multiscale loss for Image Translation |
Krish Didwania et.al. |
2503.05974 |
null |
| 2025-03-10 |
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control |
Yuxuan Bian et.al. |
2503.05639 |
link |
| 2025-03-07 |
A-SEE2.0: Active-Sensing End-Effector for Robotic Ultrasound Systems with Dense Contact Surface Perception Enabled Probe Orientation Adjustment |
Yernar Zhetpissov et.al. |
2503.05569 |
null |
| 2025-03-07 |
Development and Enhancement of Text-to-Image Diffusion Models |
Rajdeep Roshan Sahu et.al. |
2503.05149 |
null |
| 2025-03-07 |
SMILENet: Unleashing Extra-Large Capacity Image Steganography via a Synergistic Mosaic InvertibLE Hiding Network |
Jun-Jie Huang et.al. |
2503.05118 |
null |
| 2025-03-06 |
Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation |
Alexey Buzovkin et.al. |
2503.04871 |
link |
| 2025-03-08 |
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation |
Aoxiong Yin et.al. |
2503.04606 |
link |
| 2025-03-06 |
In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality without Ground-Truth |
Matias Cosarinsky et.al. |
2503.04522 |
null |
| 2025-03-06 |
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement |
Zhihao Shi et.al. |
2503.04501 |
null |
| 2025-03-07 |
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding |
Shen Zhang et.al. |
2503.04344 |
null |
| 2025-03-05 |
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation |
Hiroshi Takahashi et.al. |
2503.03789 |
null |
| 2025-03-05 |
DO-IQS: Dynamics-Aware Offline Inverse Q-Learning for Optimal Stopping with Unknown Gain Functions |
Anna Kuchko et.al. |
2503.03515 |
null |
| 2025-03-05 |
Automatic Drywall Analysis for Progress Tracking and Quality Control in Construction |
Mariusz Trzeciakiewicz et.al. |
2503.03422 |
null |
| 2025-03-05 |
On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs |
Mhd Modar Halimeh et.al. |
2503.03304 |
null |
| 2025-03-05 |
Computational Analysis of Degradation Modeling in Blind Panoramic Image Quality Assessment |
Jiebin Yan et.al. |
2503.03255 |
null |
| 2025-03-05 |
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models |
YiQiu Guo et.al. |
2503.03149 |
null |
| 2025-03-04 |
QE4PE: Word-level Quality Estimation for Human Post-Editing |
Gabriele Sarti et.al. |
2503.03044 |
link |
| 2025-03-04 |
A Causal Framework for Aligning Image Quality Metrics and Deep Neural Network Robustness |
Nathan Drenkow et.al. |
2503.02797 |
null |
| 2025-03-04 |
LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs |
Jianghao Chen et.al. |
2503.02502 |
null |
| 2025-03-04 |
Deep Robust Reversible Watermarking |
Jiale Chen et.al. |
2503.02490 |
null |
| 2025-03-04 |
ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement |
Xuejian Guo et.al. |
2503.02484 |
link |
| 2025-03-05 |
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content |
Zicheng Zhang et.al. |
2503.02357 |
link |
| 2025-03-04 |
Exploring Simple Siamese Network for High-Resolution Video Quality Assessment |
Guotao Shen et.al. |
2503.02330 |
null |
| 2025-03-04 |
Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration |
Pengchen Liang et.al. |
2503.02321 |
null |
| 2025-03-04 |
Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation |
Zhichao Yang et.al. |
2503.02206 |
null |
| 2025-03-04 |
DarkDeblur: Learning single-shot image deblurring in low-light condition |
S M A Sharif et.al. |
2503.02194 |
link |
| 2025-03-03 |
Integrating Misclassified EHR Outcomes with Validated Outcomes from a Non-probability Sample |
Jenny Shen et.al. |
2503.02071 |
null |
| 2025-03-03 |
Quality Measures for Dynamic Graph Generative Models |
Ryien Hosseini et.al. |
2503.01720 |
link |
| 2025-03-03 |
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization |
Siya Qi et.al. |
2503.01670 |
link |
| 2025-03-03 |
MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting |
Mojtaba Safari et.al. |
2503.01576 |
link |
| 2025-03-03 |
Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey |
Katerina Korre et.al. |
2503.01513 |
null |
| 2025-03-03 |
FlowDec: A flow-based full-band general audio codec with high perceptual quality |
Simon Welker et.al. |
2503.01485 |
link |
| 2025-03-03 |
Improving the Efficiency of VVC using Partitioning of Reference Frames |
Kamran Qureshi et.al. |
2503.01415 |
null |
| 2025-03-03 |
Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions |
Zihan Shen et.al. |
2503.01339 |
null |
| 2025-03-03 |
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual |
Chong Wang et.al. |
2503.01288 |
link |
| 2025-03-03 |
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond |
Guanyao Wu et.al. |
2503.01210 |
null |
| 2025-03-03 |
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution |
Xingyuan Li et.al. |
2503.01187 |
link |
| 2025-02-28 |
Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos |
Zhiyu Tan et.al. |
2502.21314 |
null |
| 2025-02-28 |
Bilevel Optimized Implicit Neural Representation for Scan-Specific Accelerated MRI Reconstruction |
Hongze Yu et.al. |
2502.21292 |
null |
| 2025-02-28 |
Back to the Future Cyclopean Stereo: a human perception approach unifying deep and geometric constraints |
Sherlon Almeida da Silva et.al. |
2502.21280 |
null |
| 2025-02-28 |
Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion |
Kulin Shah et.al. |
2502.21278 |
null |
| 2025-02-28 |
PET Image Denoising via Text-Guided Diffusion: Integrating Anatomical Priors through Text Prompts |
Boxiao Yu et.al. |
2502.21260 |
null |
| 2025-02-28 |
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation |
Yifei Xia et.al. |
2502.21079 |
null |
| 2025-02-28 |
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal |
Haonan An et.al. |
2502.20924 |
null |
| 2025-02-28 |
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision |
Dawei Zhu et.al. |
2502.20790 |
null |
| 2025-02-28 |
WorldModelBench: Judging Video Generation Models As World Models |
Dacheng Li et.al. |
2502.20694 |
null |
| 2025-02-28 |
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA |
Ojonugwa Oluwafemi Ejiga Peter et.al. |
2502.20667 |
null |
| 2025-02-27 |
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction |
Siyu Jiao et.al. |
2502.20313 |
link |
| 2025-02-27 |
Mobius: Text to Seamless Looping Video Generation via Latent Shift |
Xiuli Bi et.al. |
2502.20307 |
link |
| 2025-02-27 |
Low-rank tensor completion via a novel minimax $p$ -th order concave penalty function |
Hongbing Zhang et.al. |
2502.19979 |
null |
| 2025-02-28 |
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation |
Xiang Geng et.al. |
2502.19941 |
null |
| 2025-02-27 |
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents |
Zhenyu Liu et.al. |
2502.19917 |
link |
| 2025-02-27 |
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model |
Mingtao Guo et.al. |
2502.19894 |
link |
| 2025-02-27 |
Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement |
Nan An et.al. |
2502.19867 |
null |
| 2025-02-27 |
LMHLD: A Large-scale Multi-source High-resolution Landslide Dataset for Landslide Detection based on Deep Learning |
Guanting Liu et.al. |
2502.19866 |
null |
| 2025-02-27 |
Adaptive Score Alignment Learning for Continual Perceptual Quality Assessment of 360-Degree Videos in Virtual Reality |
Kanglei Zhou et.al. |
2502.19644 |
link |
| 2025-02-26 |
3D Nephrographic Image Synthesis in CT Urography with the Diffusion Model and Swin Transformer |
Hongkun Yu et.al. |
2502.19623 |
null |
| 2025-02-26 |
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? |
Yudi Zhang et.al. |
2502.19557 |
null |
| 2025-02-26 |
CLIP-Optimized Multimodal Image Enhancement via ISP-CNN Fusion for Coal Mine IoVT under Uneven Illumination |
Shuai Wang et.al. |
2502.19450 |
null |
| 2025-02-26 |
Does 3D Gaussian Splatting Need Accurate Volumetric Rendering? |
Adam Celarek et.al. |
2502.19318 |
link |
| 2025-02-27 |
RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images |
Yuhan Tang et.al. |
2502.19153 |
null |
| 2025-02-26 |
Max360IQ: Blind Omnidirectional Image Quality Assessment with Multi-axis Attention |
Jiebin Yan et.al. |
2502.19046 |
link |
| 2025-02-26 |
InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model |
Fengbin Guan et.al. |
2502.19026 |
null |
| 2025-02-26 |
Hyperspectral image reconstruction by deep learning with super-Rayleigh speckles |
Ziyan Chen et.al. |
2502.18777 |
null |
| 2025-02-25 |
Is OpenAlex Suitable for Research Quality Evaluation and Which Citation Indicator is Best? |
Mike Thelwall et.al. |
2502.18427 |
null |
| 2025-02-25 |
LAG: LLM agents for Leaderboard Auto Generation on Demanding |
Jian Wu et.al. |
2502.18209 |
null |
| 2025-02-25 |
OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation |
Yunpeng Gao et.al. |
2502.18041 |
null |
| 2025-02-25 |
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments |
Patomporn Payoungkhamdee et.al. |
2502.17956 |
null |
| 2025-02-25 |
Integrating Boosted learning with Differential Evolution (DE) Optimizer: A Prediction of Groundwater Quality Risk Assessment in Odisha |
Sonalika Subudhi et.al. |
2502.17929 |
null |
| 2025-02-24 |
Optimized Memory System Architecture for VESA VDC-M Decoder with Multi-Slice Support |
Hannah Yang et.al. |
2502.17729 |
null |
| 2025-02-24 |
Requirements for Quality Assurance of AI Models for Early Detection of Lung Cancer |
Horst K. Hahn et.al. |
2502.17639 |
null |
| 2025-02-25 |
KV-Edit: Training-Free Image Editing for Precise Background Preservation |
Tianrui Zhu et.al. |
2502.17363 |
link |
| 2025-02-24 |
Motion-Robust T2 Quantification from Gradient Echo MRI with Physics-Informed Deep Learning* |
Hannah Eichhorn et.al. |
2502.17209 |
null |
| 2025-02-24 |
SFLD: Reducing the content bias for AI-generated Image Detection |
Seoyeon Gye et.al. |
2502.17105 |
null |
| 2025-02-24 |
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence |
Bolin Chen et.al. |
2502.17085 |
null |
| 2025-02-24 |
PQDAST: Depth-Aware Arbitrary Style Transfer for Games via Perceptual Quality-Guided Distillation |
Eleftherios Ioannou et.al. |
2502.16996 |
null |
| 2025-02-24 |
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model |
Kang Fu et.al. |
2502.16915 |
link |
| 2025-02-24 |
CRTrack: Low-Light Semi-Supervised Multi-object Tracking Based on Consistency Regularization |
Zijing Zhao et.al. |
2502.16809 |
null |
| 2025-02-23 |
Automatic Input Rewriting Improves Translation with Large Language Models |
Dayeon Ki et.al. |
2502.16682 |
link |
| 2025-02-23 |
AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs |
Francisco Caetano et.al. |
2502.16610 |
link |
| 2025-02-22 |
Multi-Party Data Pricing for Complex Data Trading Markets: A Rubinstein Bargaining Approach |
Bing Mi et.al. |
2502.16363 |
null |
| 2025-02-21 |
Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement |
Uche A. Nnolim et.al. |
2502.15986 |
null |
| 2025-02-21 |
Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution |
Carlos Eiras-Franco et.al. |
2502.15403 |
null |
| 2025-02-21 |
Super-Resolution for Interferometric Imaging: Model Comparisons and Performance Analysis |
Hasan Berkay Abdioglu et.al. |
2502.15397 |
null |
| 2025-02-21 |
Ultrasound Phase Aberrated Point Spread Function Estimation with Convolutional Neural Network: Simulation Study |
Wei-Hsiang Shen et.al. |
2502.15298 |
null |
| 2025-02-21 |
Omnidirectional Image Quality Captioning: A Large-scale Database and A New Model |
Jiebin Yan et.al. |
2502.15271 |
link |
| 2025-02-21 |
Lung-DDPM: Semantic Layout-guided Diffusion Models for Thoracic CT Image Synthesis |
Yifan Jiang et.al. |
2502.15204 |
link |
| 2025-02-21 |
LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement |
Namrah Siddiqua et.al. |
2502.15186 |
null |
| 2025-02-21 |
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment |
Chuan Cui et.al. |
2502.15167 |
link |
| 2025-02-21 |
Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization |
Ach Khozaimi et.al. |
2502.15156 |
null |
| 2025-02-20 |
Hardware-Friendly Static Quantization Method for Video Diffusion Transformers |
Sanghyun Yi et.al. |
2502.15077 |
null |
| 2025-02-20 |
Multi-Source Static CT with Adaptive Fluence Modulation to Minimize Hallucinations in Generative Reconstructions |
Matthew Tivnan et.al. |
2502.15060 |
null |
| 2025-02-20 |
GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models |
Miao Tao et.al. |
2502.14938 |
null |
| 2025-02-20 |
Compact Latent Representation for Image Compression (CLRIC) |
Ayman A. Ameen et.al. |
2502.14937 |
null |
| 2025-02-20 |
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework |
Yuming Yang et.al. |
2502.14864 |
link |
| 2025-02-20 |
Towards a Perspectivist Turn in Argument Quality Assessment |
Julia Romberg et.al. |
2502.14501 |
link |
| 2025-02-20 |
Early-Exit and Instant Confidence Translation Quality Estimation |
Vilém Zouhar et.al. |
2502.14429 |
link |
| 2025-02-20 |
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis |
Xiaoxing Liu et.al. |
2502.14178 |
null |
| 2025-02-19 |
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior |
Hengyue Liang et.al. |
2502.13998 |
link |
| 2025-02-19 |
Remote Sensing Semantic Segmentation Quality Assessment based on Vision Language Model |
Huiying Shi et.al. |
2502.13990 |
null |
| 2025-02-19 |
A Lightweight Model for Perceptual Image Compression via Implicit Priors |
Hao Wei et.al. |
2502.13988 |
null |
| 2025-02-19 |
An Overall Real-Time Mechanism for Classification and Quality Evaluation of Rice |
Wanke Xia et.al. |
2502.13764 |
null |
| 2025-02-19 |
HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks |
Hongjin Qian et.al. |
2502.13465 |
null |
| 2025-02-19 |
OGBoost: A Python Package for Ordinal Gradient Boosting |
Mansour T. A. Sharabiani et.al. |
2502.13456 |
null |
| 2025-02-18 |
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection |
Paul Boniol et.al. |
2502.13318 |
link |
| 2025-02-18 |
Optimal covering of rectangular grid graphs with tours of constrained length |
Sergey Bereg et.al. |
2502.13306 |
null |
| 2025-02-18 |
Application of Context-dependent Interpretation of Biosignals Recognition to Control a Bionic Multifunctional Hand Prosthesis |
Pawel Trajdos et.al. |
2502.13301 |
null |
| 2025-02-18 |
Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework |
Manal Rahal et.al. |
2502.13198 |
null |
| 2025-02-18 |
GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis |
Pedro Martin et.al. |
2502.13196 |
null |
| 2025-02-18 |
Language Barriers: Evaluating Cross-Lingual Performance of CNN and Transformer Architectures for Speech Quality Estimation |
Wafaa Wardah et.al. |
2502.13004 |
null |
| 2025-02-18 |
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation |
Xinlong Chen et.al. |
2502.12782 |
link |
| 2025-02-18 |
Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models |
Kamer Ali Yuksel et.al. |
2502.12755 |
link |
| 2025-02-18 |
3D Shape-to-Image Brownian Bridge Diffusion for Brain MRI Synthesis from Cortical Surfaces |
Fabian Bongratz et.al. |
2502.12742 |
null |
| 2025-02-18 |
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral |
António Farinhas et.al. |
2502.12701 |
null |
| 2025-02-19 |
Spherical Dense Text-to-Image Synthesis |
Timon Winter et.al. |
2502.12691 |
null |
| 2025-02-18 |
Design and Implementation of a Dual Uncrewed Surface Vessel Platform for Bathymetry Research under High-flow Conditions |
Dinesh Kumar et.al. |
2502.12539 |
null |
| 2025-02-18 |
Comprehensive Assessment and Analysis for NSFW Content Erasure in Text-to-Image Diffusion Models |
Die Chen et.al. |
2502.12527 |
null |
| 2025-02-18 |
Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs |
Siyu You et.al. |
2502.12512 |
null |
| 2025-02-17 |
Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications |
Li Qiao et.al. |
2502.12096 |
null |
| 2025-02-17 |
Low-Rank Thinning |
Annabelle Michael Carrell et.al. |
2502.12063 |
link |
| 2025-02-17 |
MultiFlow: A unified deep learning framework for multi-vessel classification, segmentation and clustering of phase-contrast MRI validated on a multi-site single ventricle patient cohort |
Tina Yao et.al. |
2502.11993 |
null |
| 2025-02-17 |
Deep Spatio-Temporal Neural Network for Air Quality Reanalysis |
Ammar Kheder et.al. |
2502.11941 |
link |
| 2025-02-17 |
No-reference geometry quality assessment for colorless point clouds via list-wise rank learning |
Zheng Li et.al. |
2502.11726 |
link |
| 2025-02-17 |
The Worse The Better: Content-Aware Viewpoint Generation Network for Projection-related Point Cloud Quality Assessment |
Zhiyong Su et.al. |
2502.11710 |
link |
| 2025-02-17 |
Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation |
Arindam Sharma et.al. |
2502.11620 |
null |
| 2025-02-17 |
Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku |
Chunan Yu et.al. |
2502.11586 |
null |
| 2025-02-18 |
AI-Assisted Thin Section Image Processing for Pore-Throat Characterization in Tight Clastic Rocks |
Muhammad Risha et.al. |
2502.11523 |
null |
| 2025-02-17 |
Semantically Robust Unsupervised Image Translation for Paired Remote Sensing Images |
Sheng Fang et.al. |
2502.11468 |
null |
| 2025-02-17 |
HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning |
Xiaoyuan Li et.al. |
2502.11393 |
null |
| 2025-02-17 |
A Physics-Informed Blur Learning Framework for Imaging Systems |
Liqun Chen et.al. |
2502.11382 |
link |
| 2025-02-17 |
LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing |
Zhengxiang Wang et.al. |
2502.11368 |
link |
| 2025-02-16 |
Generating Skyline Datasets for Data Science Models |
Mengying Wang et.al. |
2502.11262 |
null |
| 2025-02-16 |
Exploiting network optimization stability for enhanced PET image denoising using deep image prior |
Fumio Hashimoto et.al. |
2502.11259 |
null |
| 2025-02-16 |
Are Generative Models Underconfident? An Embarrassingly Simple Quality Estimation Approach |
Tu Anh Dinh et.al. |
2502.11115 |
null |
| 2025-02-16 |
Imaging current flow and injection in scalable graphene devices through NV-magnetometry |
Kaj Dockx et.al. |
2502.11076 |
null |
| 2025-02-15 |
Automatic Quality Assessment of First Trimester Crown-Rump-Length Ultrasound Images |
Sevim Cengiz et.al. |
2502.10908 |
null |
| 2025-02-15 |
AquaScope: Reliable Underwater Image Transmission on Mobile Devices |
Beitong Tian et.al. |
2502.10891 |
null |
| 2025-02-15 |
E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting |
Sohaib Zahid et.al. |
2502.10827 |
null |
| 2025-02-14 |
Large Language Models and Synthetic Data for Monitoring Dataset Mentions in Research Papers |
Aivin V. Solatorio et.al. |
2502.10263 |
link |
| 2025-02-14 |
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model |
Guoqing Ma et.al. |
2502.10248 |
link |
| 2025-02-14 |
ProReco: A Process Discovery Recommender System |
Tsung-Hao Huang et.al. |
2502.10230 |
null |
| 2025-02-14 |
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control |
Teng Li et.al. |
2502.10059 |
null |
| 2025-02-14 |
AffectSRNet : Facial Emotion-Aware Super-Resolution Network |
Syed Sameen Ahmad Rizvi et.al. |
2502.09932 |
null |
| 2025-02-14 |
A Deep Learning Approach to Interface Color Quality Assessment in HCI |
Shixiao Wang et.al. |
2502.09914 |
null |
| 2025-02-14 |
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal |
Jinpei Guo et.al. |
2502.09873 |
link |
| 2025-02-14 |
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering |
Mark Beliaev et.al. |
2502.09573 |
null |
| 2025-02-13 |
Learned Correction Methods for Ultrasound Computed Tomography Imaging Using Simplified Physics Models |
Luke Lozenski et.al. |
2502.09546 |
null |
| 2025-02-13 |
SQ-GAN: Semantic Image Communications Using Masked Vector Quantization |
Francesco Pezone et.al. |
2502.09520 |
link |
| 2025-02-13 |
A Physics-Informed Deep Learning Model for MRI Brain Motion Correction |
Mojtaba Safari et.al. |
2502.09296 |
link |
| 2025-02-13 |
ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization |
Onat Şahin et.al. |
2502.09278 |
null |
| 2025-02-13 |
PixLift: Accelerating Web Browsing via AI Upscaling |
Yonas Atinafu et.al. |
2502.08995 |
null |
| 2025-02-13 |
Some problems of developing astrophysical equipment and combining it with optical telescopes |
Edward Emelianov et.al. |
2502.08992 |
null |
| 2025-02-13 |
Dynamic watermarks in images generated by diffusion models |
Yunzhuo Chen et.al. |
2502.08927 |
null |
| 2025-02-12 |
A procedure for assessing of machine health index data prediction quality |
Daniel Kuzio et.al. |
2502.08837 |
null |
| 2025-02-12 |
Ultrasound imaging of cortical bone: cortex geometry and measurement of porosity based on wave speed for bone remodeling estimation |
Amadou S. Dia et.al. |
2502.08824 |
null |
| 2025-02-12 |
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation |
Hoigi Seo et.al. |
2502.08690 |
null |
| 2025-02-12 |
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion |
Yujie Zhou et.al. |
2502.08590 |
link |
| 2025-02-12 |
Quality-Aware Decoding: Unifying Quality Estimation and Decoding |
Sai Koneru et.al. |
2502.08561 |
null |
| 2025-02-12 |
A Survey on Image Quality Assessment: Insights, Analysis, and Future Outlook |
Chengqian Ma et.al. |
2502.08540 |
null |
| 2025-02-12 |
TuMag: the tunable magnetograph for the Sunrise III mission |
J. C. del Toro Iniesta et.al. |
2502.08268 |
null |
| 2025-02-12 |
Forward and Inverse Problems in Nonlinear Acoustics |
Barbara Kaltenbacher et.al. |
2502.08194 |
null |
| 2025-02-11 |
Automatic Prostate Volume Estimation in Transabdominal Ultrasound Images |
Tiziano Natali et.al. |
2502.07859 |
null |
| 2025-02-11 |
Magic 1-For-1: Generating One Minute Video Clips within One Minute |
Hongwei Yi et.al. |
2502.07701 |
link |
| 2025-02-11 |
An Improved Optimal Proximal Gradient Algorithm for Non-Blind Image Deblurring |
Qingsong Wang et.al. |
2502.07602 |
null |
| 2025-02-13 |
Enhance-A-Video: Better Generated Video for Free |
Yang Luo et.al. |
2502.07508 |
link |
| 2025-02-11 |
Compound Mask for Divergent Wave Imaging in Medical Ultrasound |
Zahraa Alzein et.al. |
2502.07453 |
null |
| 2025-02-11 |
On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o |
Rundong Liu et.al. |
2502.07399 |
link |
| 2025-02-11 |
USRNet: Unified Scene Recovery Network for Enhancing Traffic Imaging under Multiple Adverse Weather Conditions |
Yuxu Lu et.al. |
2502.07372 |
link |
| 2025-02-11 |
Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems |
Ai Chen et.al. |
2502.07351 |
link |
| 2025-02-11 |
Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion |
Xingpei Ma et.al. |
2502.07203 |
null |
| 2025-02-11 |
HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates |
Lei Lu et.al. |
2502.07160 |
null |
| 2025-02-10 |
Evaluation of Multilingual Image Captioning: How far can we get with CLIP models? |
Gonçalo Gomes et.al. |
2502.06600 |
link |
| 2025-02-10 |
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution |
Vlad Hosu et.al. |
2502.06476 |
null |
| 2025-02-10 |
How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators |
Shang Liu et.al. |
2502.06387 |
null |
| 2025-02-10 |
Guidance-base Diffusion Models for Improving Photoacoustic Image Quality |
Tatsuhiro Eguchi et.al. |
2502.06354 |
null |
| 2025-02-10 |
LANTERN++: Enhanced Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models |
Sihwan Park et.al. |
2502.06352 |
link |
| 2025-02-10 |
A CT Geometry With Multiple Centers Of Rotation For Solving Sparse View Problem |
Jiayu Duan et.al. |
2502.06125 |
null |
| 2025-02-10 |
Token-Domain Multiple Access: Exploiting Semantic Orthogonality for Collision Mitigation |
Li Qiao et.al. |
2502.06118 |
null |
| 2025-02-09 |
Dual Caption Preference Optimization for Diffusion Models |
Amir Saeidi et.al. |
2502.06023 |
link |
| 2025-02-09 |
A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement |
Muhammad Turab et.al. |
2502.05995 |
null |
| 2025-02-09 |
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search |
Hengzhu Tang et.al. |
2502.05924 |
null |
| 2025-02-09 |
Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models |
Rafał Karczewski et.al. |
2502.05807 |
null |
| 2025-02-08 |
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks |
Zijiang Yan et.al. |
2502.05695 |
null |
| 2025-02-08 |
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion |
Yufan Zhou et.al. |
2502.05606 |
null |
| 2025-02-07 |
Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment |
Benjamin Stahl et.al. |
2502.05356 |
link |
| 2025-02-07 |
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting |
Chung-Ho Wu et.al. |
2502.05176 |
null |
| 2025-02-07 |
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound |
Andros Tjandra et.al. |
2502.05139 |
link |
| 2025-02-07 |
Cached Multi-Lora Composition for Multi-Concept Image Generation |
Xiandong Zou et.al. |
2502.04923 |
link |
| 2025-02-07 |
Integration Concept of the CBM Micro Vertex Detector |
Franz Matejcek et.al. |
2502.04858 |
null |
| 2025-02-06 |
ADIFF: Explaining audio difference using natural language |
Soham Deshmukh et.al. |
2502.04476 |
link |
| 2025-02-05 |
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization |
Zhenglin Zhou et.al. |
2502.04370 |
null |
| 2025-02-06 |
BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation |
The Omnilingual MT Team et.al. |
2502.04314 |
null |
| 2025-02-06 |
Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency |
Shangkun Sun et.al. |
2502.04076 |
link |
| 2025-02-06 |
DICE: Distilling Classifier-Free Guidance into Text Embeddings |
Zhenyu Zhou et.al. |
2502.03726 |
null |
| 2025-02-05 |
Quasi-Monte Carlo Methods: What, Why, and How? |
Fred J. Hickernell et.al. |
2502.03644 |
null |
| 2025-02-05 |
Efficient Image Restoration via Latent Consistency Flow Matching |
Elad Cohen et.al. |
2502.03500 |
null |
| 2025-02-05 |
A new method for structural diagnostics with muon tomography and deep learning |
Lorenzo Pezzotti et.al. |
2502.03339 |
null |
| 2025-02-05 |
A Framework for Measuring the Quality of Infrastructure-as-Code Scripts |
Pandu Ranga Reddy Konala et.al. |
2502.03127 |
null |
| 2025-02-05 |
Poisson Flow Joint Model for Multiphase contrast-enhanced CT |
Rongjun Ge et.al. |
2502.03079 |
null |
| 2025-02-05 |
A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions |
Hao Yin et.al. |
2502.02817 |
null |
| 2025-02-04 |
Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications |
William O'Donnell et.al. |
2502.02624 |
null |
| 2025-02-04 |
A comparison of translation performance between DeepL and Supertext |
Alex Flückiger et.al. |
2502.02577 |
link |
| 2025-02-04 |
Privacy Attacks on Image AutoRegressive Models |
Antoni Kowalczuk et.al. |
2502.02514 |
link |
| 2025-02-04 |
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models |
Hila Chefer et.al. |
2502.02492 |
null |
| 2025-02-04 |
High-Fidelity Human Avatars from Laptop Webcams using Edge Compute |
Akash Haridas et.al. |
2502.02468 |
null |
| 2025-02-04 |
Exploring the Feasibility of AI-Assisted Spine MRI Protocol Optimization Using DICOM Image Metadata |
Alice Vian et.al. |
2502.02351 |
null |
| 2025-02-04 |
When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks |
Felix Drinkall et.al. |
2502.02199 |
link |
| 2025-02-04 |
PALQA: A Novel Parameterized Position-Aware Lossy Quantum Autoencoder using LSB Control Qubit for Efficient Image Compression |
Ershadul Haque et.al. |
2502.02188 |
null |
| 2025-02-05 |
IPO: Iterative Preference Optimization for Text-to-Video Generation |
Xiaomeng Yang et.al. |
2502.02088 |
null |
| 2025-02-03 |
Spectra of He isotopes and the $^3$He/$^4$ He ratio |
M. J. Boschini et.al. |
2502.01887 |
null |
| 2025-02-03 |
Sparse Measurement Medical CT Reconstruction using Multi-Fused Block Matching Denoising Priors |
Maliha Hossain et.al. |
2502.01832 |
null |
| 2025-02-03 |
Generating Multi-Image Synthetic Data for Text-to-Image Customization |
Nupur Kumari et.al. |
2502.01720 |
null |
| 2025-02-03 |
CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP |
Yirui Zeng et.al. |
2502.01707 |
null |
| 2025-02-03 |
Proposal and Evaluation of a Practical CBCT Dose Optimization Method |
S. Gros et.al. |
2502.01509 |
null |
| 2025-02-03 |
Human Body Restoration with One-Step Diffusion Model and A New Benchmark |
Jue Gong et.al. |
2502.01411 |
null |
| 2025-02-03 |
Explainability-Driven Quality Assessment for Rule-Based Systems |
Oshani Seneviratne et.al. |
2502.01253 |
null |
| 2025-02-03 |
Imaging simulation of a dual-panel PET geometry with ultrafast TOF detectors |
Taiyo Ishikawa et.al. |
2502.01006 |
null |
| 2025-02-02 |
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models |
Julian Perry et.al. |
2502.00826 |
null |
| 2025-02-02 |
EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis |
Junuk Cha et.al. |
2502.00654 |
null |
| 2025-02-01 |
Deep Task-Based Beamforming and Channel Data Augmentations for Enhanced Ultrasound Imaging |
Ariel Amar et.al. |
2502.00524 |
null |
| 2025-02-01 |
A framework for river connectivity classification using temporal image processing and attention based neural networks |
Timothy James Becker et.al. |
2502.00474 |
null |
| 2025-01-31 |
Trust and Trustworthiness from Human-Centered Perspective in HRI -- A Systematic Literature Review |
Debora Firmino de Souza et.al. |
2501.19323 |
null |
| 2025-01-31 |
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search |
Yuta Oshima et.al. |
2501.19252 |
null |
| 2025-01-31 |
Ambient Denoising Diffusion Generative Adversarial Networks for Establishing Stochastic Object Models from Noisy Image Data |
Xichen Xu et.al. |
2501.19094 |
null |
| 2025-01-31 |
OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation |
Yuchen Lin et.al. |
2501.18982 |
null |
| 2025-01-31 |
Distorting Embedding Space for Safety: A Defense Mechanism for Adversarially Robust Diffusion Models |
Jaesin Ahn et.al. |
2501.18877 |
link |
| 2025-01-29 |
Fake News Detection After LLM Laundering: Measurement and Explanation |
Rupak Kumar Das et.al. |
2501.18649 |
link |
| 2025-01-31 |
Task-based Regularization in Penalized Least-Squares for Binary Signal Detection Tasks in Medical Image Denoising |
Wentao Chen et.al. |
2501.18418 |
null |
| 2025-01-30 |
Adaptive Video Streaming with AI-Based Optimization for Dynamic Network Conditions |
Mohammad Tarik et.al. |
2501.18332 |
null |
| 2025-01-30 |
AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment |
Yuqin Cao et.al. |
2501.18314 |
null |
| 2025-02-03 |
Efficient Feature Fusion for UAV Object Detection |
Xudong Wang et.al. |
2501.17983 |
link |
| 2025-01-29 |
Discrete Dielectric Coatings for Length Control and Tunability of Half-Wave Dipole Antennas at 300 MHz Magnetic Resonance Imaging Applications |
Aditya A Bhosale et.al. |
2501.17954 |
null |
| 2025-01-29 |
Leveraging In-Context Learning and Retrieval-Augmented Generation for Automatic Question Generation in Educational Domains |
Subhankar Maity et.al. |
2501.17397 |
null |
| 2025-01-29 |
On the Coexistence and Ensembling of Watermarks |
Aleksandar Petrov et.al. |
2501.17356 |
link |
| 2025-01-28 |
Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing |
Sourabh Deoghare et.al. |
2501.17265 |
null |
| 2025-01-27 |
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators |
Chen Chen et.al. |
2501.17202 |
null |
| 2025-01-31 |
IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait |
Han Yang et.al. |
2501.17159 |
null |
| 2025-01-28 |
Three-Dimensional Diffusion-Weighted Multi-Slab MRI With Slice Profile Compensation Using Deep Energy Model |
Reza Ghorbani et.al. |
2501.17152 |
null |
| 2025-01-28 |
Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds |
Xiaohan Sun et.al. |
2501.17085 |
null |
| 2025-01-28 |
EdgeMLOps: Operationalizing ML models with Cumulocity IoT and thin-edge.io for Visual quality Inspection |
Kanishk Chaturvedi et.al. |
2501.17062 |
null |
| 2025-01-28 |
EZOA: Nançay HI follow-up observations in the Zone of Avoidance |
A. C. Schröder et.al. |
2501.17038 |
null |
| 2025-01-28 |
Image-Space Gridding for Nonrigid Motion-Corrected MR Image Reconstruction |
Kwang Eun Jang et.al. |
2501.16713 |
null |
| 2025-01-25 |
MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling |
Sai Tarun Inaganti et.al. |
2501.16384 |
null |
| 2025-01-27 |
Adaptive Iterative Compression for High-Resolution Files: an Approach Focused on Preserving Visual Quality in Cinematic Workflows |
Leonardo Melo et.al. |
2501.16319 |
null |
| 2025-01-27 |
UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images |
Tatiana Taís Schein et.al. |
2501.16211 |
link |
| 2025-01-27 |
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation |
Xing Zhang et.al. |
2501.16050 |
null |
| 2025-01-30 |
Can Location Embeddings Enhance Super-Resolution of Satellite Imagery? |
Daniel Panangian et.al. |
2501.15847 |
null |
| 2025-01-26 |
Advancing quantum imaging through learning theory |
Yunkai Wang et.al. |
2501.15685 |
null |
| 2025-01-26 |
Radiologist-in-the-Loop Self-Training for Generalizable CT Metal Artifact Reduction |
Chenglong Ma et.al. |
2501.15610 |
link |
| 2025-01-26 |
Differentiable Low-computation Global Correlation Loss for Monotonicity Evaluation in Quality Assessment |
Yipeng Liu et.al. |
2501.15485 |
null |
| 2025-01-25 |
Image formation theory of optical coherence tomography with optical aberrations and its application for computational aberration correction |
Shuichi Makita et.al. |
2501.15011 |
null |
| 2025-01-24 |
SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation |
Yujian Liu et.al. |
2501.14646 |
null |
| 2025-01-24 |
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages |
Jia Yu et.al. |
2501.14506 |
link |
| 2025-01-24 |
Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR |
Hao Ma et.al. |
2501.14477 |
null |
| 2025-01-24 |
Deep Learning-Powered Classification of Thoracic Diseases in Chest X-Rays |
Yiming Lei et.al. |
2501.14279 |
null |
| 2025-01-24 |
CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image |
Xiaojun Tang et.al. |
2501.14264 |
null |
| 2025-01-24 |
GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm |
Hanrui Wang et.al. |
2501.14230 |
null |
| 2025-01-24 |
Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images |
Zeyun Deng et.al. |
2501.14198 |
null |
| 2025-01-24 |
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking |
Runyi Hu et.al. |
2501.14195 |
link |
| 2025-01-23 |
AdEval: Alignment-based Dynamic Evaluation to Mitigate Data Contamination in Large Language Models |
Yang Fan et.al. |
2501.13983 |
null |
| 2025-01-23 |
Improving Video Generation with Human Feedback |
Jie Liu et.al. |
2501.13918 |
null |
| 2025-01-23 |
VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing |
Qiang Hu et.al. |
2501.13630 |
link |
| 2025-01-23 |
Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse |
Wenzhuo Ma et.al. |
2501.13528 |
null |
| 2025-01-23 |
LDR-Net: A Novel Framework for AI-generated Image Detection via Localized Discrepancy Representation |
JiaXin Chen et.al. |
2501.13475 |
null |
| 2025-01-23 |
From Images to Point Clouds: An Efficient Solution for Cross-media Blind Quality Assessment without Annotated Training |
Yipeng Liu et.al. |
2501.13387 |
null |
| 2025-01-23 |
Enhanced Extractor-Selector Framework and Symmetrization Weighted Binary Cross-Entropy for Edge Detections |
Hao Shu et.al. |
2501.13365 |
null |
| 2025-01-22 |
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior |
I-Hsiang Chen et.al. |
2501.13134 |
null |
| 2025-01-23 |
Accelerate High-Quality Diffusion Models with Inner Loop Feedback |
Matthew Gwilliam et.al. |
2501.13107 |
null |
| 2025-01-22 |
Real-time Terahertz Compressive Optical-Digital Neural Network Imaging |
Shao-Hsuan Wu et.al. |
2501.13065 |
null |
| 2025-01-22 |
Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes |
Yuang Shi et.al. |
2501.13045 |
null |
| 2025-01-22 |
Characterizing Collective Efforts in Content Sharing and Quality Control for ADHD-relevant Content on Video-sharing Platforms |
Hanxiu 'Hazel' Zhu et.al. |
2501.13020 |
null |
| 2025-01-22 |
Paper Quality Assessment based on Individual Wisdom Metrics from Open Peer Review |
Andrii Zahorodnii et.al. |
2501.13014 |
null |
| 2025-01-22 |
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling |
Shengshi Yao et.al. |
2501.12696 |
null |
| 2025-01-22 |
Approximate Puzzlepiece Compositing |
Xuan Huang et.al. |
2501.12581 |
null |
| 2025-01-21 |
Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs |
Zheng Li et.al. |
2501.12536 |
null |
| 2025-01-21 |
Bidirectional Brain Image Translation using Transfer Learning from Generic Pre-trained Models |
Fatima Haimour et.al. |
2501.12488 |
null |
| 2025-01-21 |
DiffDoctor: Diagnosing Image Diffusion Models Before Treating |
Yiyang Wang et.al. |
2501.12382 |
null |
| 2025-01-21 |
Regressor-Guided Image Editing Regulates Emotional Response to Reduce Online Engagement |
Christoph Gebhardt et.al. |
2501.12289 |
null |
| 2025-01-21 |
A Dynamic Programming Framework for Generating Approximately Diverse and Optimal Solutions |
Waldo Gálvez et.al. |
2501.12261 |
null |
| 2025-01-21 |
Joint Reconstruction and Motion Estimation in Sparse-View 4DCT Using Diffusion Models within a Blind Inverse Problem Framework |
Antoine De Paepe et.al. |
2501.12249 |
null |
| 2025-01-21 |
DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains |
Junyu Xia et.al. |
2501.12235 |
null |
| 2025-01-21 |
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression |
Uri Gadot et.al. |
2501.12216 |
null |
| 2025-01-21 |
Fast-RF-Shimming: Accelerate RF Shimming in 7T MRI using Deep Learning |
Zhengyi Lu et.al. |
2501.12157 |
null |
| 2025-01-21 |
A Multi-annotated and Multi-modal Dataset for Wide-angle Video Quality Assessment |
Bo Hu et.al. |
2501.12082 |
null |
| 2025-01-22 |
GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting |
Longan Wang et.al. |
2501.12060 |
null |
| 2025-01-21 |
Power Amplifier-Aware Transmit Power Optimization for OFDM and SC-FDMA Systems |
Pawel Kryszkiewicz et.al. |
2501.11994 |
null |
| 2025-01-21 |
Bayesian Despeckling of Structured Sources |
Ali Zafari et.al. |
2501.11860 |
null |
| 2025-01-20 |
EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process |
Mostafa Atef et.al. |
2501.11776 |
null |
| 2025-01-20 |
Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution |
Zhiyuan You et.al. |
2501.11561 |
null |
| 2025-01-20 |
Fundus Image Quality Assessment and Enhancement: a Systematic Review |
Heng Li et.al. |
2501.11520 |
null |
| 2025-01-20 |
Multitask Auxiliary Network for Perceptual Quality Assessment of Non-Uniformly Distorted Omnidirectional Images |
Jiebin Yan et.al. |
2501.11512 |
link |
| 2025-01-20 |
Subjective and Objective Quality Assessment of Non-Uniformly Distorted Omnidirectional Images |
Jiebin Yan et.al. |
2501.11511 |
link |
| 2025-01-20 |
See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization |
Zongqi He et.al. |
2501.11508 |
null |
| 2025-01-20 |
Advancing Oyster Phenotype Segmentation with Multi-Network Ensemble and Multi-Scale mechanism |
Wenli Yang et.al. |
2501.11203 |
null |
| 2025-01-19 |
Unit Region Encoding: A Unified and Compact Geometry-aware Representation for Floorplan Applications |
Huichao Zhang et.al. |
2501.11097 |
null |
| 2025-01-18 |
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation |
Linrui Tian et.al. |
2501.10687 |
null |
| 2025-01-17 |
Fundamental mode power estimation through a $M^2$ -measurement |
Filipp Lausch et.al. |
2501.10345 |
null |
| 2025-01-17 |
DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration |
Huiyun Cao et.al. |
2501.10325 |
null |
| 2025-01-17 |
CSHNet: A Novel Information Asymmetric Image Translation Method |
Xi Yang et.al. |
2501.10197 |
link |
| 2025-01-17 |
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency |
Xiaohui Li et.al. |
2501.10110 |
null |
| 2025-01-17 |
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment |
Yating Liu et.al. |
2501.10071 |
link |
| 2025-01-17 |
One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression |
Keita Miwa et.al. |
2501.10064 |
null |
| 2025-01-17 |
CaFA: Cost-aware, Feasible Attacks With Database Constraints Against Neural Tabular Classifiers |
Matan Ben-Tov et.al. |
2501.10013 |
link |
| 2025-01-17 |
IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment |
Shangkun Sun et.al. |
2501.09927 |
null |
| 2025-01-17 |
Decoding Patterns of Data Generation Teams for Clinical and Scientific Success: Insights from the Bridge2AI Talent Knowledge Graph |
Jiawei Xu et.al. |
2501.09897 |
null |
| 2025-01-16 |
EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques |
Ibtihel Amara et.al. |
2501.09833 |
null |
| 2025-01-16 |
Scan-Adaptive MRI Undersampling Using Neighbor-based Optimization (SUNO) |
Siddhant Gautam et.al. |
2501.09799 |
link |
| 2025-01-16 |
Evaluating Conversational Recommender Systems with Large Language Models: A User-Centric Evaluation Framework |
Nuo Chen et.al. |
2501.09493 |
null |
| 2025-01-16 |
Joint Transmission and Deblurring: A Semantic Communication Approach Using Events |
Pujing Yang et.al. |
2501.09396 |
null |
| 2025-01-16 |
PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion Serving |
Desen Sun et.al. |
2501.09253 |
null |
| 2025-01-16 |
Estimating Task-based Performance Bounds for Accelerated MRI Image Reconstruction Methods by Use of Learned-Ideal Observers |
Kaiyan Li et.al. |
2501.09224 |
null |
| 2025-01-15 |
UNIR-Net: A Novel Approach for Restoring Underwater Images with Non-Uniform Illumination Using Synthetic Data |
Ezequiel Perez-Zarate et.al. |
2501.09053 |
link |
| 2025-01-15 |
Lights, Camera, Matching: The Role of Image Illumination in Fair Face Recognition |
Gabriella Pangelinan et.al. |
2501.08910 |
null |
| 2025-01-15 |
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework |
Sida Tian et.al. |
2501.08809 |
null |
| 2025-01-16 |
Holoview: Interactive 3D visualization of medical data in AR |
Pankaj Kaushik et.al. |
2501.08736 |
null |
| 2025-01-15 |
DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors |
Runqi Wang et.al. |
2501.08553 |
null |
| 2025-01-15 |
Comprehensive Subjective and Objective Evaluation Method for Text-generated Video |
Zelu Qi et.al. |
2501.08545 |
null |
| 2025-01-14 |
Head Motion Degrades Machine Learning Classification of Alzheimer's Disease from Positron Emission Tomography |
Eléonore V. Lieffrig et.al. |
2501.08459 |
null |
| 2025-01-14 |
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models |
Weichen Fan et.al. |
2501.08453 |
null |
| 2025-01-14 |
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics |
Georgii Gotin et.al. |
2501.08415 |
link |
| 2025-01-14 |
Rolling phase modulation regime for dynamic full field OCT |
Tual Monfort et.al. |
2501.08359 |
null |
| 2025-01-15 |
Optical information encryption using general temporal ghost imaging with practical experimental condition |
Juan Wu et.al. |
2501.08136 |
null |
| 2025-01-13 |
Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes |
Yuhang Zhang et.al. |
2501.08072 |
null |
| 2025-01-14 |
VENOM: Text-driven Unrestricted Adversarial Example Generation with Diffusion Models |
Hui Kuurila-Zhang et.al. |
2501.07922 |
link |
| 2025-01-14 |
Demographic Variability in Face Image Quality Measures |
Wassim Kabbani et.al. |
2501.07898 |
null |
| 2025-01-14 |
State-of-the-Art Transformer Models for Image Super-Resolution: Techniques, Challenges, and Applications |
Debasish Dutta et.al. |
2501.07855 |
null |
| 2025-01-13 |
FaceOracle: Chat with a Face Image Oracle |
Wassim Kabbani et.al. |
2501.07202 |
null |
| 2025-01-13 |
Radial Distortion in Face Images: Detection and Impact |
Wassim Kabbani et.al. |
2501.07179 |
null |
| 2025-01-13 |
Eye Sclera for Fair Face Image Quality Assessment |
Wassim Kabbani et.al. |
2501.07158 |
null |
| 2025-01-13 |
Privacy-Preserving Data Quality Assessment for Time-Series IoT Sensors |
Novoneel Chakraborty et.al. |
2501.07154 |
null |
| 2025-01-13 |
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling |
Jiebin Yan et.al. |
2501.07087 |
null |
| 2025-01-12 |
Real-Time Neural-Enhancement for Online Cloud Gaming |
Shan Jiang et.al. |
2501.06880 |
null |
| 2025-01-14 |
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution |
Du Chen et.al. |
2501.06838 |
link |
| 2025-01-11 |
NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References |
Qiang Qu et.al. |
2501.06488 |
link |
| 2025-01-10 |
VideoAuteur: Towards Long Narrative Video Generation |
Junfei Xiao et.al. |
2501.06173 |
null |
| 2025-01-10 |
CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control |
Stefan Popov et.al. |
2501.06006 |
null |
| 2025-01-10 |
Universal-2-TF: Robust All-Neural Text Formatting for ASR |
Yash Khare et.al. |
2501.05948 |
null |
| 2025-01-10 |
UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation |
Felix Duelmer et.al. |
2501.05828 |
null |
| 2025-01-13 |
AI-Driven Diabetic Retinopathy Screening: Multicentric Validation of AIDRSS in India |
Amit Kr Dey et.al. |
2501.05826 |
null |
| 2025-01-10 |
Conditional Diffusion Model for Electrical Impedance Tomography |
Duanpeng Shi et.al. |
2501.05769 |
null |
| 2025-01-10 |
LLVD: LSTM-based Explicit Motion Modeling in Latent Space for Blind Video Denoising |
Loay Rashid et.al. |
2501.05744 |
null |
| 2025-01-10 |
FIRM: Federated Image Reconstruction using Multimodal Tomographic Data |
Geunyeong Byeon et.al. |
2501.05642 |
null |
| 2025-01-09 |
Interpretable deep learning illuminates multiple structures fluorescence imaging: a path toward trustworthy artificial intelligence in microscopy |
Mingyang Chen et.al. |
2501.05490 |
null |
| 2025-01-09 |
Consistent Flow Distillation for Text-to-3D Generation |
Runjie Yan et.al. |
2501.05445 |
null |
| 2025-01-09 |
Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and Photorealistic Mapping |
Wen Tianci et.al. |
2501.05242 |
null |
| 2025-01-09 |
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering |
Dewei Zhou et.al. |
2501.05131 |
null |
| 2025-01-09 |
TipSegNet: Fingertip Segmentation in Contactless Fingerprint Imaging |
Laurenz Ruzicka et.al. |
2501.05076 |
null |
| 2025-01-09 |
Towards Fingerprint Mosaicking Artifact Detection: A Self-Supervised Deep Learning Approach |
Laurenz Ruzicka et.al. |
2501.05034 |
null |
| 2025-01-08 |
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling |
Nannan Li et.al. |
2501.04666 |
null |
| 2025-01-08 |
Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion |
Yangfan He et.al. |
2501.04606 |
link |
| 2025-01-08 |
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages |
Archchana Sindhujan et.al. |
2501.04473 |
null |
| 2025-01-08 |
Enhancing kidney quality assessment: Power Doppler during normothermic machine perfusion |
Yitian Fang et.al. |
2501.04457 |
null |
| 2025-01-08 |
iFADIT: Invertible Face Anonymization via Disentangled Identity Transform |
Lin Yuan et.al. |
2501.04390 |
null |
| 2025-01-08 |
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models |
Hyogon Ryu et.al. |
2501.04304 |
link |
| 2025-01-07 |
Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections |
Yabo Fu et.al. |
2501.04140 |
link |
| 2025-01-07 |
Motion-Aware Generative Frame Interpolation |
Guozhen Zhang et.al. |
2501.03699 |
null |
| 2025-01-07 |
Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression |
Mengshi Qi et.al. |
2501.03674 |
link |
| 2025-01-07 |
Deep Learning-based Compression Detection for explainable Face Image Quality Assessment |
Laurin Jonientz et.al. |
2501.03619 |
link |
| 2025-01-07 |
A generative approach for lensless imaging in low-light conditions |
Ziyang Liu et.al. |
2501.03511 |
null |
| 2025-01-07 |
Can Deep Learning Trigger Alerts from Mobile-Captured Images? |
Pritisha Sarkar et.al. |
2501.03499 |
null |
| 2025-01-06 |
A Trust-Guided Approach to MR Image Reconstruction with Side Information |
Arda Atalık et.al. |
2501.03021 |
link |
| 2025-01-06 |
Quality Estimation based Feedback Training for Improving Pronoun Translation |
Harshit Dhankhar et.al. |
2501.03008 |
null |
| 2025-01-06 |
GLFC: Unified Global-Local Feature and Contrast Learning with Mamba-Enhanced UNet for Synthetic CT Generation from CBCT |
Xianhao Zhou et.al. |
2501.02992 |
link |
| 2025-01-06 |
Region of Interest based Medical Image Compression |
Utkarsh Prakash Srivastava et.al. |
2501.02895 |
null |
| 2025-01-06 |
COph100: A comprehensive fundus image registration dataset from infants constituting the "RIDIRP" database |
Yan Hu et.al. |
2501.02800 |
null |
| 2025-01-06 |
Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? |
Hongyi Miao et.al. |
2501.02751 |
null |
| 2025-01-06 |
Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising |
Yunlong Yuan et.al. |
2501.02741 |
null |
| 2025-01-06 |
Artificial Intelligence in Creative Industries: Advances Prior to 2025 |
Nantheera Anantrasirichai et.al. |
2501.02725 |
null |
| 2025-01-06 |
Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment |
Jiaze Li et.al. |
2501.02706 |
null |
| 2025-01-05 |
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation |
Ziyang Song et.al. |
2501.02576 |
link |
| 2025-01-05 |
Multi-LLM Collaborative Caption Generation in Scientific Documents |
Jaeyoung Kim et.al. |
2501.02552 |
link |
| 2025-01-05 |
Pixel-Wise Feature Selection for Perceptual Edge Detection without post-processing |
Hao Shu et.al. |
2501.02534 |
null |
| 2025-01-07 |
ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling |
Chaojie Mao et.al. |
2501.02487 |
null |
| 2025-01-05 |
Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module |
Zhongjian Cui et.al. |
2501.02452 |
null |
| 2025-01-05 |
Journey into Automation: Image-Derived Pavement Texture Extraction and Evaluation |
Bingjie Lu et.al. |
2501.02414 |
null |
| 2025-01-04 |
Optimizing Audio Compression Through Entropy-Controlled Dithering |
Ellison Murray et.al. |
2501.02293 |
null |
| 2025-01-04 |
TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration |
Yizhou Li et.al. |
2501.02269 |
null |
| 2025-01-04 |
Exploring Secure Machine Learning Through Payload Injection and FGSM Attacks on ResNet-50 |
Umesh Yadav et.al. |
2501.02147 |
null |
| 2025-01-03 |
JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing |
Qili Wang et.al. |
2501.01798 |
link |
| 2025-01-03 |
Multi-modal classification of forest biodiversity potential from 2D orthophotos and 3D airborne laser scanning point clouds |
Simon B. Jensen et.al. |
2501.01728 |
null |
| 2025-01-03 |
Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation |
Junjie Xu et.al. |
2501.01700 |
null |
| 2025-01-02 |
A Metasemantic-Metapragmatic Framework for Taxonomizing Multimodal Communicative Alignment |
Eugene Yu Ji et.al. |
2501.01535 |
null |
| 2025-01-02 |
Embedding Similarity Guided License Plate Super Resolution |
Abderrezzaq Sendjasni et.al. |
2501.01483 |
null |
| 2024-12-31 |
Estimation of 3T MR images from 1.5T images regularized with Physics based Constraint |
Prabhjot Kaur et.al. |
2501.01464 |
null |
| 2024-12-31 |
GDSR: Global-Detail Integration through Dual-Branch Network with Wavelet Losses for Remote Sensing Image Super-Resolution |
Qiwei Zhu et.al. |
2501.01460 |
null |
| 2025-01-02 |
ScarNet: A Novel Foundation Model for Automated Myocardial Scar Quantification from LGE in Cardiac MRI |
Neda Tavakoli et.al. |
2501.01372 |
link |
| 2025-01-02 |
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions |
Vriksha Srihari et.al. |
2501.01156 |
null |
| 2025-01-02 |
HarmonyIQA: Pioneering Benchmark and Model for Image Harmonization Quality Assessment |
Zitong Xu et.al. |
2501.01116 |
null |
| 2025-01-02 |
Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion |
Dong Zhang et.al. |
2501.01114 |
null |
| 2025-01-02 |
EliGen: Entity-Level Controlled Image Generation with Regional Attention |
Hong Zhang et.al. |
2501.01097 |
link |
| 2025-01-02 |
Enhancing Precision of Automated Teller Machines Network Quality Assessment: Machine Learning and Multi Classifier Fusion Approaches |
Alireza Safarzadeh et.al. |
2501.01067 |
null |
| 2025-01-01 |
Deconstructing the emission order of protons, neutrons and $α$-particles following fusion in $^{28,30,32}$Si + $^{28}$ Si |
Rohit Kumar et.al. |
2501.00963 |
null |
| 2025-01-01 |
Enhancing Early Diabetic Retinopathy Detection through Synthetic DR1 Image Generation: A StyleGAN3 Approach |
Sagarnil Das et.al. |
2501.00954 |
null |
| 2025-01-01 |
SPADE: Enhancing Adaptive Cyber Deception Strategies with Generative AI and Structured Prompt Engineering |
Shihab Ahmed et.al. |
2501.00940 |
null |
| 2025-01-01 |
Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models |
Emily Johnson et.al. |
2501.00917 |
null |
| 2025-01-01 |
Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model |
Chenyang Liu et.al. |
2501.00895 |
null |
| 2025-01-01 |
RORem: Training a Robust Object Remover with Human-in-the-Loop |
Ruibin Li et.al. |
2501.00740 |
link |
| 2024-12-31 |
Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free |
Evelyn Zhang et.al. |
2501.00375 |
link |
| 2024-12-31 |
SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians |
Yiwen Wang et.al. |
2501.00342 |
null |
| 2024-12-31 |
Improving image quality of the Solar Disk Imager (SDI) of the Lyman-alpha Solar Telescope (LST) onboard the ASO-S mission |
Hui Liu et.al. |
2501.00231 |
null |
| 2024-12-30 |
What Makes for a Good Stereoscopic Image? |
Netanel Y. Tamir et.al. |
2412.21127 |
null |
| 2024-12-30 |
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation |
Jiazheng Xu et.al. |
2412.21059 |
link |
| 2024-12-30 |
DDIM sampling for Generative AIBIM, a faster intelligent structural design framework |
Zhili He et.al. |
2412.20899 |
null |
| 2024-12-30 |
Acquisition-Independent Deep Learning for Quantitative MRI Parameter Estimation using Neural Controlled Differential Equations |
Daan Kuppens et.al. |
2412.20844 |
null |
| 2024-12-30 |
4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives |
Zeyu Yang et.al. |
2412.20720 |
null |
| 2024-12-29 |
Single-image reflection removal via self-supervised diffusion models |
Zhengyang Lu et.al. |
2412.20466 |
null |
| 2024-12-29 |
ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos |
Xilei Zhu et.al. |
2412.20423 |
null |
| 2024-12-29 |
Bringing Objects to Life: 4D generation from 3D objects |
Ohad Rahamim et.al. |
2412.20422 |
null |
| 2024-12-28 |
An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models |
Yuang Wang et.al. |
2412.19992 |
null |
| 2024-12-27 |
Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference |
Keke Zhang et.al. |
2412.19553 |
null |
| 2024-12-30 |
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT |
Xiaotao Hu et.al. |
2412.19505 |
link |
| 2024-12-27 |
RAIN: Real-time Animation of Infinite Video Stream |
Zhilei Shu et.al. |
2412.19489 |
null |
| 2024-12-27 |
Generative Adversarial Network on Motion-Blur Image Restoration |
Zhengdong Li et.al. |
2412.19479 |
null |
| 2024-12-27 |
Adrenaline: Adaptive Rendering Optimization System for Scalable Cloud Gaming |
Jin Heo et.al. |
2412.19446 |
null |
| 2024-12-27 |
The Hobby-Eberly Telescope Dark Energy Experiment Survey (HETDEX) Active Galactic Nuclei Catalog: the Fourth Data Release |
Chenxu Liu et.al. |
2412.19414 |
null |
| 2024-12-26 |
Reflective Gaussian Splatting |
Yuxuan Yao et.al. |
2412.19282 |
null |
| 2024-12-26 |
FineVQ: Fine-Grained User Generated Content Video Quality Assessment |
Huiyu Duan et.al. |
2412.19238 |
null |
| 2024-12-26 |
FACEMUG: A Multimodal Generative and Fusion Framework for Local Facial Editing |
Wanglong Lu et.al. |
2412.19009 |
null |
| 2024-12-25 |
TINQ: Temporal Inconsistency Guided Blind Video Quality Assessment |
Yixiao Li et.al. |
2412.18933 |
link |
| 2024-12-25 |
ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization |
Zixiao Gu et.al. |
[2412.18783](http |
|