Telecomunicaciones, información y comunicación
Leveraging Deep Reinforcement Learning for Clustered Cell-Free Networking Over User Mobility
-
Clustered cell-free networking paves a new way for enabling scalable joint transmission among access points (APs) by partitioning the whole network into non-overlapping subnetworks. Previous works...
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
-
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on...
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research
-
We present MobileGym, a browser-hosted, lightweight, fully controllable environment for everyday mobile use, targeting interaction fidelity without replicating proprietary backends. It enables two...
From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
-
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation...
Helix4D: Complex 4D Mesh Generation
-
Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting...
Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning
-
Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment...
Looped Diffusion Language Models
-
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains...
On-Policy Adversarial Flow Distillation for Autoregressive Video Generation
-
Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student...
Global Structure-from-Motion Meets Feedforward Reconstruction
-
Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open...
EVIDENT: Routing MLLM Adaptation through Entity-Grounded Visual Evidence for Cross-Domain Video Temporal Grounding
-
Fine-tuning MLLMs for Video Temporal Grounding (VTG) often improves in-domain performance but degrades sharply under domain shift. In this work, we find that this failure is primarily driven not just...
InstructSAM: Segment Any Instance with Any Instructions
-
In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance...
Language Models Need Sleep
-
Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like...
Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models
-
Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, together with the widespread adoption of AI code assistants, make...
Pixel-Level Pavement Distress Assessment Using Instance Segmentation
-
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to...
Quantum Domain Decomposition for Preconditioning the Finite Element Method
-
Even in cases where quantum linear solvers provide significant speedup compared to their classical counterparts, their performance depends on some of the same parameters. In particular, the condition...
OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
-
The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limitations and the critical timing bottlenecks introduced by...
Channel-wise Vector Quantization
-
We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventional vector quantization, which assigns...
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World
-
Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet current systems operate over only narrow...
AI-Powered Sustainable Finance: An Integrative Taxonomy and Framework of AI Applications for Sustainable Investment Decision-Making
-
The integration of Artificial Intelligence into sustainable finance represents a transformative paradigm shift in how Environmental, Social, and Governance factors are analyzed, predicted, and...
StakeBench: Evaluating Language Understanding Grounded in Market Commitment
-
Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committed to in the market. We introduce...
Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark
-
Weakly supervised anomaly detection (WSAD) has developed in three primary directions: incomplete, inexact, and inaccurate supervision. However, these directions remain isolated, lacking a unified...
Actividades asistenciales
Agroalimentación
Automoción y nueva movilidad
Energía sostenible y eficiente
Materiales avanzados
Medio ambiente y sostenibilidad
Patrimonio natural y cultural
Procesos productivos e industria 4.0
Química y biotecnología
Salud y calidad de vida
Transformación digital



