Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

What ships here

  • train_stage1.py for the Stage 1 GPT baseline
  • train_llama_stage2.py for LoRA fine-tuning on LLaMA
  • train_lora.py as a compatibility entry point for the LoRA path
  • train_pipeline.py to run both stages in sequence
  • build_datasets.py to generate topic-sharded synthetic datasets
  • data/ with example JSONL training sets

Model Direction

The project is intended to produce two artifacts:

  • a meaningful internal baseline from Stage 1
  • a higher-quality assistant checkpoint or adapter from Stage 2

The Stage 2 model is the release artifact for normal inference.

Base Model

Stage 2 currently targets:

  • meta-llama/Llama-3.1-8B-Instruct

The earlier LoRA script in this repo also supports the same model family.

Training Flow

Recommended flow:

  1. Run python build_datasets.py --output_dir data/generated --include_docs to generate topic-specific shards and aggregate JSONL files.
  2. Train Stage 1 with data/generated/stage1_sft.jsonl if you want the GPT baseline.
  3. Run python train_stage1.py to build the small baseline.
  4. Train Stage 2 with data/generated/stage2_conrad_sft.jsonl or let python train_pipeline.py --include_docs do the build step automatically.
  5. Merge the Stage 2 adapter with python merge_stage2_lora.py.
  6. Sync the merged checkpoint into the model repo with python sync_checkpoint.py.
  7. Publish the merged checkpoint.

Intended Use

This project is designed for:

  • conversational assistants
  • documentation assistants
  • support routing
  • enterprise workflows
  • knowledge assistants
  • internal tooling
  • structured response generation

Notes

  • Stage 1 and Stage 2 are separate training jobs.
  • Stage 1 is not a wrapper around LLaMA.
  • Stage 2 is the higher-quality assistant tuning step.
  • The repo includes example datasets only; real training data is still required.
  • Set CONRAD_ENDPOINT_URL and HF_TOKEN in the Space secrets or environment to enable production chat.
  • If the endpoint is unavailable, the Space will show a clear fallback instead of generating from the raw checkpoint.

Model provider

deep-conrad

Model tree

Base

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today