DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech UnitsMaxime Poli, Manel Khentout, Angelo Ortiz Tandazo et al.
cs.CLcs.SDeess.ASMar 19, 2026
We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of p…
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual WorldZiyin Zhang, Zihan Liao, Hang Yu et al.
cs.CLcs.AIMar 19, 2026
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available h…
Rigorous Error Certification for Neural PDE Solvers: From Empirical Residuals to Solution GuaranteesAmartya Mukherjee, Maxwell Fitzsimmons, David C. Del Rey Fernández et al.
cs.LGmath.APmath.FAMar 19, 2026
Uncertainty quantification for partial differential equations is traditionally grounded in discretization theory, where solution error is controlled via mesh/grid refinement. Physics-informed neural n…
Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal AnalysisPronob Kumar Barman, Pronoy Kumar Barman
cs.AIMar 19, 2026
Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial …
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware LimitsEdward Lin, Sahil Modi, Siva Kumar Sastry Hari et al.
cs.LGcs.AIMar 19, 2026
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to h…
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy DistillationZhuolin Yang, Zihan Liu, Yang Chen et al.
cs.CLcs.AIcs.LGMar 19, 2026
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical an…
OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsZehao Li, Zhenyu Wu, Yibo Zhao et al.
cs.AIMar 19, 2026
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing rewar…
NavTrust: Benchmarking Trustworthiness for Embodied NavigationHuaide Jiang, Yash Chaudhary, Yuping Wang et al.
cs.ROcs.AIcs.CVcs.LGMar 19, 2026
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agent…
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent DenoisingTianjiao Yu, Xinzhuo Li, Muntasir Wahed et al.
cs.CVcs.AIcs.LGMar 19, 2026
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional stru…
$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal EquivalenceDimitri Kanevsky, Julian Salazar, Matt Harvey
math.AGcs.AIcs.HCmath.NTMar 19, 2026
Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special t…
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM CreativityQiawen Ella Liu, Marina Dubova, Henry Conklin et al.
cs.AIcs.CLMar 19, 2026
Are large language models (LLMs) creative in the same way humans are, and can the same interventions increase creativity in both? We evaluate a promising but largely untested intervention for creativi…
How Uncertainty Estimation Scales with Sampling in Reasoning ModelsMaksym Del, Markus Kängsepp, Marharyta Domnich et al.
cs.AIcs.CLcs.LGMar 19, 2026
Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box app…
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language NavigationSwagat Padhan, Lakshya Jain, Bhavya Minesh Shah et al.
cs.ROcs.AIcs.CLcs.CVMar 19, 2026
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge"…
Conflict-Based Search for Multi Agent Path Finding with Asynchronous ActionsXuemian Wu, Shizhe Zhao, Zhongqiang Ren
cs.AIMar 19, 2026
Multi-Agent Path Finding (MAPF) seeks collision-free paths for multiple agents from their respective start locations to their respective goal locations while minimizing path costs. Most existing MAPF …
SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic CuesCarlos Hinojosa, Clemens Grange, Bernard Ghanem
cs.CVcs.AIcs.CLcs.LGMar 19, 2026
Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives th…
DaPT: A Dual-Path Framework for Multilingual Multi-hop Question AnsweringYilin Wang, Yuchun Fan, Jiaoyang Li et al.
cs.CLcs.AIMar 19, 2026
Retrieval-augmented generation (RAG) systems have made significant progress in solving complex multi-hop question answering (QA) tasks in the English scenario. However, RAG systems inevitably face the…
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning ControlMohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman
cs.LGcs.AIq-fin.STMar 19, 2026
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states unifor…
CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference CustomizationWeilin Chen, Jiahao Rao, Wenhao Wang et al.
cs.CVcs.AIMar 19, 2026
The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge. While text-driven methods offer flexibility, they lack the precision for fine-grained, instance-le…
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM AgentsHao Zhang, Mingjie Liu, Shaokun Zhang et al.
cs.AIMar 19, 2026
Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL trai…
FinTradeBench: A Financial Reasoning Benchmark for LLMsYogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan et al.
cs.CEcs.AIcs.CLcs.IRMar 19, 2026
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals com…
CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman ProblemFengxiaoxiao Li, Xiao Mao, Mingfeng Fan et al.
cs.ROcs.AIMar 19, 2026
Robotic systems often require a team of robots to collectively visit multiple targets while optimizing competing objectives, such as total travel cost and makespan. This setting can be formulated as t…
Parallelograms Strike Back: LLMs Generate Better Analogies than PeopleQiawen Ella Liu, Raja Marjieh, Jian-Qiao Zhu et al.
cs.CLcs.AIMar 19, 2026
Four-term word analogies (A:B::C:D) are classically modeled geometrically as ''parallelograms,'' yet recent work suggests this model poorly captures how humans produce analogies, with simple local-sim…
cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial OptimizationYuyang Liu
cs.AIcs.DCMar 19, 2026
Combinatorial optimization problems arise in logistics, scheduling, and resource allocation, yet existing approaches face a fundamental trade-off among generality, performance, and usability. We prese…
D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion DecodingJonathan Lys, Vincent Gripon, Bastien Pasdeloup et al.
cs.AIcs.LGMar 19, 2026
Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard decoding methods for autoregressive mod…
Box Maze: A Process-Control Architecture for Reliable LLM ReasoningZou Qiang
cs.AIcs.CLMar 19, 2026
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such …
I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance SystemsVedanta S P, Ponnurangam Kumaraguru
cs.AIcs.MAMar 19, 2026
Large language models are increasingly proposed as autonomous agents for high-stakes public workflows, yet we lack systematic evidence about whether they would follow institutional rules when granted …
VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation ModelsChonghan Liu, Yimin Du, Qi An et al.
cs.CLcs.AIMar 19, 2026
Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we pr…
Man and machine: artificial intelligence and judicial decision makingArthur Dyevre, Ahmad Shahvaroughi
cs.AIMar 19, 2026
The integration of artificial intelligence (AI) technologies into judicial decision-making - particularly in pretrial, sentencing, and parole contexts - has generated substantial concerns about transp…
A Dataset and Resources for Identifying Patient Health Literacy Information from Clinical NotesMadeline Bittner, Dina Demner-Fushman, Yasmeen Shabazz et al.
cs.CLMar 19, 2026
Health literacy is a critical determinant of patient outcomes, yet current screening tools are not always feasible and differ considerably in the number of items, question format, and dimensions of he…
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography AnalysisZhan Jin, Yu Luo, Yizhou Zhang et al.
cs.CVcs.AIMar 19, 2026
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADN…
Track Benchmarks & Evaluation — Get notified when new papers are scored
Sign up free and get daily digests tailored to your research interests.
Sign up free