Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Description
Holo3.1 is our latest family of Vision-Language Models (VLMs) for computer use agents. Building on Holo3, it expands support beyond browser and desktop automation to mobile environments, introduces native function-calling support for seamless integration with agent frameworks, and enables local deployment through optimized quantized checkpoints.
The Holo3.1 family spans model sizes from 0.8B to 35B-A3B parameters. Across computer use, UI grounding, mobile automation, and business workflows, Holo3.1 delivers strong performance while improving deployment flexibility and cost efficiency.
- Developed by: H Company
- Model type: Vision-Language Models for Navigation and Computer Use Agents
- Available models: Holo3.1-0.8B, Holo3.1-4B, Holo3.1-9B, Holo3.1-35B-A3B
- Base models: Qwen 3.5 family
- Supported environments: Web, Desktop, Mobile
- Available quantizations for Holo3.1-35B-A3B: BF16, FP8, NVFP4, Q4 GGUF
- Blog Post: hcompany.ai/holo3.1
- Quickstart: hub.hcompany.ai/quickstart
- License: Apache 2.0 License
Performance vs Cost
The figure below compares the overall performance and inference cost of the Holo3.1 and Qwen 3.5 families. Overall performance averages computer use, mobile automation, enterprise workflows, and UI grounding benchmarks.
Holo3.1 establishes a strong Pareto frontier across model sizes, from lightweight local agents to state-of-the-art enterprise deployments.
Benchmark Results
Holo3.1 delivers strong performance across computer use, mobile automation, enterprise workflows, and UI grounding benchmarks.
Table 1: Evaluation results across computer use, mobile automation, enterprise workflows, and grounding benchmarks.
Get Started
Explore our Quickstart guide to learn how to integrate Holo3.1 into your applications, deploy local agents, or run optimized inference on NVIDIA hardware.
Citation
bibtex
@misc{hai2026holo31,title={Holo3.1: Fast & Local Computer Use Agents},author={H Company},year={2026},url={https://huggingface.co/Hcompany/Holo3.1-35B-A3B},}
Model provider
Hcompany
Model tree
Base
Qwen/Qwen3.5-0.8B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information