HiLSVA: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization

Abstract

Large language model (LLM) agents enable natural language interaction for scientific visualization (SciVis). Still, prior systems have essentially prioritized autonomy over human analytical control, thereby limiting transparency and human oversight. We present HiLSVA, a human-in-the-loop agentic system that supports mixed-initiative SciVis workflows. HiLSVA integrates a plan-first multi-agent architecture with explicit human oversight, stepwise provenance tracking, and learn-at-test-time adaptation from user feedback.

The system supports fluid handoff between humans and agents through both natural language and direct manipulation of visualizations, while sandboxed execution ensures safe, reproducible workflows. In doing so, HiLSVA reframes agentic SciVis as a collaborative process that augments, rather than replaces, human analytical reasoning. We evaluate HiLSVA through representative case studies and a controlled user study with twelve participants of varying expertise across multiple autonomy settings.

Results show that mixed-initiative interaction improves task completion, user control, and workflow transparency across different levels of user expertise, while revealing a tradeoff between execution efficiency and human oversight. These findings highlight the importance of human-centered design in agentic SciVis and guide the development of future collaborative visualization systems.

System Overview

HiLSVA adopts a multi-agent architecture centered around a lead orchestrator that coordinates a set of specialized sub-agents, and is designed around five pillars that keep the human in control throughout the SciVis workflow:

(a) Mixed-initiative agentic architecture: An orchestrator interprets user intent, constructs explicit stepwise plans, and coordinates specialized agents with bidirectional human–agent interaction.

(b) Specialized visualization agents: A ParaView agent (direct control via MCP), a code agent (SciVis Python generation with RAG and iterative error correction), a file surfer, and a web surfer execute SciVis actions and report outcomes back to the orchestrator.

(c) Stepwise control & provenance tracking: Workflows are represented as editable steps that record planned and executed actions, software states, and visualization outputs for undo, branching, and reuse.

(d) Sandboxed execution environments: Tool interactions run in isolated Docker containers, enabling safe, parallel execution across sessions.

(e) Learn-at-test-time (LTT) with human feedback: A self-improving agent reflects on execution outcomes, assesses uncertainty, queries users when needed, and updates a knowledge repository during inference.

The current implementation runs ParaView as the visualization engine, uniquely supporting multiple interaction modalities on the same instance — MCP tool calls, generated Python scripts, and direct GUI manipulation. The architecture itself is backend-agnostic.

Overview of HiLSVA. An orchestrator interprets user intent, builds explicit stepwise plans, and coordinates specialized agents (ParaView, code, file surfer, web surfer) with bidirectional human–agent interaction, provenance-aware workflows, sandboxed execution, and a self-improving agent that adapts during inference.

Key Mixed-Initiative Capabilities

HiLSVA introduces fine-grained mechanisms for shared initiative between users and agents while preserving human authority throughout the SciVis workflow. The system supports six core capabilities:

(a) Joint planning: Collaboratively create and approve step-by-step plans; users can reorder, add, or remove steps, or modify step instructions before execution.

(b) Action guards: Require explicit review and approval for critical or potentially irreversible actions, such as executing generated code or sensitive tool calls.

(c) Stepwise rollback: Revert to any previous workflow step to inspect, branch, or revise the visualization, with all software states faithfully restored.

(d) Interactive execution: Interrupt and guide task execution using the visualization engine GUI directly or through chat, with fluid handoff between manual and automated control.

(e) Autonomy control: Adjust the level of system autonomy and user involvement during task execution.

(f) Learn-at-test-time (LTT): Adapt agent behavior during execution through retrieval, self-reflection, and human feedback stored as reusable knowledge.

Key mixed-initiative capabilities of HiLSVA. The system supports (a) collaborative planning, (b) guarded execution, (c) stepwise rollback, (d) interactive user steering, (e) autonomy control, and (f) LTT adaptation through retrieval, reflection, and human feedback.

Interactive Interface

HiLSVA provides an interface that tightly couples planning, execution, and visualization. A mixed-initiative chat widget serves as the primary channel for natural language interaction, where users specify tasks, inspect and edit execution plans, approve or reject actions, review generated code, and respond to agent self-reflection. A workflow monitor panel exposes the current execution state as a stepwise process, supporting undo and return to prior states using recorded provenance. A real-time visualization engine and web browser let users directly intervene by taking control of the GUI at any point and seamlessly resume automated execution. A plan gallery and knowledge base accumulate validated workflows and learned experience to support provenance-aware reuse and LTT adaptation.

The HiLSVA interface: (a) mixed-initiative chat widget for planning, refinement, and uncertainty queries; (b) workflow monitor panel showing execution progress and clickable state transitions; (c) real-time visualization engine and web browser where actions directly operate on the ParaView environment; and (d) plan gallery and knowledge base for reuse and adaptation.

Case Studies

We present five case studies of varying complexity that span three canonical SciVis stages — data exploration, data visualization, and scientific insight. Each example illustrates how mixed-initiative interaction, provenance-aware reuse, and learned knowledge support real SciVis workflows. The full recorded interactions for each task are shown in the videos at the top of this page.

Foot — Basic Action Task

A CT scan of a human foot (125×255×183). HiLSVA generates an initial isosurface visualization that separates bone from skin and soft tissue, retrieves a labeled anatomy reference from the web, and then supports user-driven refinement through direct GUI interaction to reveal the metatarsals and phalanges — demonstrating data exploration through iterative refinement and contextual information retrieval.

Hurricane — Basic Action Task

An atmospheric simulation of Hurricane Isabel (500×500×100). HiLSVA constructs a horizontal slice of the temperature field at z = 49.5 with the Color to Warm colormap, analyzes the value distribution via a histogram, and overlays contour lines from linearly sampled values to better reveal the temperature structure and gradients of the storm.

Tornado — Visualization Workflow Task

A simulated tornado vector field (64×64×64). For uncertain seeding parameters, HiLSVA runs three Stream Tracer trials with point-cloud seeding, lets the user roll back to the preferred trial via the workflow monitor, then uses knowledge-guided glyph visualization — arrows oriented by the velocity vector and scaled by magnitude — followed by a self-reflection that suggests stream tubes for clarity.

Combustion — Visualization Workflow Task

A multivariate, time-varying turbulent reacting flow (240×360×60, 33 timesteps). The file surfer identifies the number of timesteps, then HiLSVA volume-renders the time-varying Y_OH field — reusing a learned colormap configuration from the knowledge base — and exports an animation, before extending the analysis to an isosurface animation of the mixfrac field, highlighting iterative refinement on multivariate time-varying data.

Half-cylinder — Scientific Analysis Task

Incompressible 3D flow around a half-cylinder (Re = 640, 640×240×80). Starting from a glyph-based visualization of the vector field, HiLSVA computes derived vortex-identification fields — the Q-criterion and λ₂ — to reveal coherent vortex structures, saves the successful workflow as reusable knowledge, and supports extending the analysis to additional Reynolds numbers. This analysis-driven exploration exemplifies human-guided scientific discovery.

User Study

We conducted a controlled user study with 12 participants across three expertise levels — three SciVis experts, four domain scientists with limited visualization-tool experience, and five novices with neither domain nor visualization expertise. Each participant used HiLSVA across three autonomy modes (full-autonomous, half-autonomous, and mixed-initiative with LTT) on four visualization tasks spanning basic operations and more complex workflow construction.

Key Findings

Expertise-independent success: All participants completed the tasks, and accuracy on domain-specific interpretation questions averaged 11.75 out of 12. HiLSVA enabled non-experts to achieve expert-comparable performance, with no data hallucinations (fabricated values or non-existent fields) observed in the logged sessions.
Efficiency–oversight tradeoff: Execution time was lowest in full-autonomous mode and highest in mixed-initiative mode. Rather than a reliable speed difference, this reflects a tradeoff in which greater user involvement incurs additional interaction time while enabling verification and intervention.
No single mode fits all: Given a free choice on the final task, seven participants selected full-autonomous mode, two half-autonomous, and three mixed-initiative — many preferred more automation once they trusted the system, while a notable subset still valued greater human control.
Strong usability and transparency: Post-study questionnaire ratings averaged 4.66 / 5 across all questions, with particularly high scores for the ability to revisit previous steps, the usefulness of stepwise provenance, and clarity of system feedback.

Survey responses to 14 post-study questions, along with mean scores (on a 1–5 scale) and standard deviations.

Overall, the study demonstrates that HiLSVA enables users at all levels of expertise, including novices, to complete non-trivial SciVis tasks, while mixed-initiative interaction provides a flexible balance between efficiency and control. Full study materials and the detailed protocol are available in the User Study Document.

BibTeX

@article{ai2026hilsva, author={Ai, Kuangshi and Do, Patrick Phuoc and Wang, Chaoli}, title={{HiLSVA}: Design and Evaluation of a Human-in-the-Loop Agentic System for Scientific Visualization}, journal={arXiv preprint arXiv:2606.26614}, year={2026}, note={In Proceedings of IEEE Transactions on Visualization and Computer Graphics (IEEE VIS 2026)} }