April 8, 2026
6 min read

Running OpenClaw with NemoClaw and FriendliAI

Q: What is FriendliAI?

FriendliAI is the Frontier Inference Cloud for Agents, delivering high throughput, low latency, and reliability at scale for agentic workloads. Through vertically optimized inference infrastructure, it delivers 2–5× faster output token speed and a 99.99% uptime SLA for high-volume production traffic.

TL;DR

OpenClaw enables powerful autonomous agents, but introduces new security and control challenges.
NemoClaw adds a sandboxed, policy-driven runtime that isolates agents and manages external interactions safely.
FriendliAI fits into this stack as the inference layer, providing fast and scalable open-weight model execution without changing agent code.

Running OpenClaw with NemoClaw and FriendliAI thumbnail

OpenClaw is an open-source agent framework built for always-on AI assistants, ones that operate across messaging platforms, interact with files, and execute real tasks autonomously. Unlike traditional chat interfaces that wait for user input, OpenClaw agents run continuously and take actions in the world.

That autonomy is powerful, but it also creates real security challenges. Agents that can call external APIs, handle credentials, and run code indefinitely are difficult to control. Prompt injection, malicious tool use, and unintended side effects are no longer theoretical when agents interact with live systems around the clock. NemoClaw is NVIDIA’s answer: a controlled runtime that wraps OpenClaw in a sandboxed, policy-enforced environment without changing how the agent itself behaves.

At the same time, many teams are increasingly turning to open models, such as GLM-5 or Minimax M2.5, when building agent systems. While cost is often the primary driver, open models also provide more control over behavior and deployment, which becomes important when agents run continuously and interact with external systems.

This post walks through how NemoClaw is structured and, more importantly, how FriendliAI plugs into that stack as the inference layer powering open model execution, with optimized performance and zero changes to your agent code.

Why NemoClaw Matters for OpenClaw

OpenClaw’s strength is precisely what makes it hard to run safely. Agents that operate autonomously, calling tools, making API requests, executing code, do so without a human in the loop. That creates surface area that traditional request-response systems never had to worry about.

This introduces a few practical concerns:

Uncontrolled network access. OpenClaw agents can call any external API or internal endpoint without restriction. In practice, this means an agent handling a routine task could trigger downstream services, exfiltrate data, or rack up unexpected costs, all without explicit user intent.
Credential exposure risks. When API keys are passed into or stored inside the agent runtime, they become accessible to any tool the agent uses, including malicious ones injected through prompt injection attacks. A compromised agent context means compromised credentials.
Long-running execution risks. Agents that never stop are harder to audit and contain. State can accumulate, context can drift, and a single compromised action early in a session can affect everything that follows, with no natural checkpoint to catch it.

NemoClaw addresses these concerns by wrapping OpenClaw in a more controlled runtime environment.

At its core, NemoClaw is a system that runs OpenClaw agents inside a sandboxed and policy-controlled environment, while routing model access and external interactions through a managed layer.

Importantly, NemoClaw does not replace OpenClaw. It is an open source reference stack designed to make OpenClaw safer to run in real environments.

The diagram above shows how NemoClaw manages the lifecycle of an OpenClaw agent.

On the host side, NemoClaw starts with an onboarding process that sets up the environment using a predefined blueprint. This blueprint acts as a configuration plan that defines how the agent should run, including sandbox settings, inference routing, and security policies.

Once initialized, the OpenClaw agent runs inside an OpenShell sandbox, where its execution is isolated from the host system. Within this environment:

Model inference is routed through a managed layer instead of direct external calls
Network access is governed by default policies
Filesystem access is restricted to a controlled scope

In other words, NemoClaw does not change how the agent behaves. It changes how and where the agent runs, introducing a controlled layer between the agent and the outside world.

This creates a clear boundary between agent logic and external resources. Instead of allowing direct access to models or services, interactions are mediated through policy and routing layers. This enables safer execution by enforcing boundaries on network access and credentials, while still allowing agents to use tools and models through a controlled interface.

Learn more about NemoClaw’s Architecture:

https://docs.nvidia.com/nemoclaw/latest/reference/architecture.html

Where FriendliAI Fits In

Under NemoClaw, inference requests from the agent never leave the sandbox directly. OpenShell intercepts them, manages credentials externally, and forwards requests to the configured upstream provider. This is where FriendliAI comes in. It is not a generic placeholder, but an inference layer purpose-built for demanding workloads: low latency, high throughput, and support for a broad catalog of open models, all through an OpenAI-compatible API that requires no changes to your agent code.

This routing layer is the key integration point between FriendliAI and NemoClaw.

Inside a NemoClaw setup:

The OpenClaw agent sends requests to a local endpoint such as inference.local, which is managed by OpenShell
OpenShell forwards these requests to the configured upstream provider, using credentials such as API keys that are securely stored and managed outside the agent runtime.
The upstream provider processes the request and returns the response.

Configuring FriendliAI as the provider means you don't have to choose between security and performance. NemoClaw controls what the agent can access, while FriendliAI ensures efficient and reliable inference.

Integrating FriendliAI with NemoClaw

In this section, we will walk through configuring FriendliAI as the inference backend in a NemoClaw environment.

1. Create a FriendliAI API Key

First, create a FriendliAI account and generate an API key.

Sign up at friendli.ai
Navigate to the Friendli Suite dashboard.
Generate a FRIENDLI_API_KEY and copy it.

You will use this token when registering FriendliAI as an inference provider.

2. Install NemoClaw

If you have not installed NemoClaw yet, run the following command:

shell

curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

This installs NemoClaw along with its CLI tools.

After installation, you will be guided through the onboarding wizard.

During onboarding:

You can either select a provider directly or follow the default setup
You will be asked to define a sandbox name

3. Register Friendli as an Inference Provider

Now, configure OpenShell to use FriendliAI as the upstream inference provider.

shell

openshell provider create \  
  --name friendli \  
  --type openai \  
  --credential "OPENAI_API_KEY=<FRIENDLI_API_KEY>" \  
  --config "OPENAI_BASE_URL=https://api.friendli.ai/serverless/v1"

Then set the model:

shell

openshell inference set --provider friendli --model zai-org/GLM-5

You can verify the configuration with:

shell

openshell inference get

At this point, OpenShell will route all inference requests to FriendliAI.

4. Verify the Configuration Inside the Sandbox

Now connect to your sandbox:

shell

nemoclaw my-assistant connect

Once inside, check the OpenClaw configuration:

shell

sandbox@my-assistant:~$ cat .openclaw/openclaw.json

You will see something like:

json

"models": {
  "mode": "merge",
  "providers": {
    "inference": {
      "baseUrl": "https://inference.local/v1",
      "apiKey": "unused",
      "api": "openai-completions",
      "models": [
        {
          "id": "zai-org/GLM-5",
          "name": "inference/zai-org/GLM-5",
          "reasoning": false,
          "input": [
            "text"
          ],
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          },
          "contextWindow": 131072,
          "maxTokens": 4096,
          "compat": {
            "supportsStore": false
          }
        }
      ]
    }
  }
}

This is an important detail. You will notice that the baseUrl is set to “https://inference.local/v1”, which is the OpenShell endpoint. OpenClaw sends requests to this local endpoint, and OpenShell then forwards them to FriendliAI.

This abstraction enables secure credential handling by keeping API keys outside the agent runtime, while routing all requests through a centralized layer.

5. Run OpenClaw

You are now ready to start your agent:

shell

sandbox@my-assistant:~$ openclaw tui

This launches an interactive interface directly in your terminal, where you can run an agent enhanced with NemoClaw’s security, routing, and sandboxing capabilities.

Notes

The OpenClaw UI and the actual inference layer operate independently by design. The UI reflects the local configuration in `openclaw.json` inside the sandbox, so even if you change the provider or model using openshell inference set, the UI may still display the previous model name. OpenShell handles the actual routing behind the scenes. If you want the UI to reflect the current model, update the model entry in `openclaw.json` manually. Inference itself is always routed correctly regardless of what the UI shows.

Putting It All Together

NemoClaw solves a real problem: how to run autonomous agents in production without giving them unchecked access to your systems and credentials. FriendliAI fits precisely into that architecture as the inference layer, handling open model execution with the performance that agent workloads demand.

The integration is simple. Whether you're running a proof of concept today or preparing for a production deployment, the same configuration scales with you.

If you are interested in exploring FriendliAI further:

Serverless models we support: https://friendli.ai/model?products=SERVERLESS
Integration with OpenClaw: https://friendli.ai/blog/integrating-friendliai-with-openclaw

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is the Frontier Inference Cloud for Agents, delivering high throughput, low latency, and reliability at scale for agentic workloads. Through vertically optimized inference infrastructure, it delivers 2–5× faster output token speed and a 99.99% uptime SLA for high-volume production traffic.

How does FriendliAI reduce inference costs?

FriendliAI reduces inference costs through higher GPU utilization and optimized inference performance. FriendliAI's patented continuous batching technique, along with quantization, speculative decoding, KV cache offloading, multi-LoRA serving, and autoscaling, helps you serve more tokens with fewer GPUs, lowering your infrastructure costs without sacrificing performance.

Why should I choose FriendliAI over other inference providers?

FriendliAI is built for production AI agents, combining speed, reliability, and efficiency at scale. It delivers low-latency streaming, reliable long-context inference, and robust tool calling without compromising stability. According to independent OpenRouter benchmarks, FriendliAI consistently ranks among the top providers for throughput, latency, and reliability across leading open-weight models. See why customers choose FriendliAI

Which open-weight models does FriendliAI support?

Run today’s frontier open-weight models—including GLM, MiniMax, Kimi, DeepSeek, Qwen, Gemma, and more—with a simple API call. FriendliAI Model API gives you instant access to the latest models with optimized inference performance for production workloads. Explore models and pricing

How do I get started?

Getting started takes just a few minutes. [1] Sign up for FriendliAI, [2] Generate your API key, and [3] Make your first inference request with frontier open-weight models.

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, support@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.