Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What ships here
train_stage1.pyfor the Stage 1 GPT baselinetrain_llama_stage2.pyfor LoRA fine-tuning on LLaMAtrain_lora.pyas a compatibility entry point for the LoRA pathtrain_pipeline.pyto run both stages in sequencebuild_datasets.pyto generate topic-sharded synthetic datasetsdata/with example JSONL training sets
Model Direction
The project is intended to produce two artifacts:
- a meaningful internal baseline from Stage 1
- a higher-quality assistant checkpoint or adapter from Stage 2
The Stage 2 model is the release artifact for normal inference.
Base Model
Stage 2 currently targets:
meta-llama/Llama-3.1-8B-Instruct
The earlier LoRA script in this repo also supports the same model family.
Training Flow
Recommended flow:
- Run
python build_datasets.py --output_dir data/generated --include_docsto generate topic-specific shards and aggregate JSONL files. - Train Stage 1 with
data/generated/stage1_sft.jsonlif you want the GPT baseline. - Run
python train_stage1.pyto build the small baseline. - Train Stage 2 with
data/generated/stage2_conrad_sft.jsonlor letpython train_pipeline.py --include_docsdo the build step automatically. - Merge the Stage 2 adapter with
python merge_stage2_lora.py. - Sync the merged checkpoint into the model repo with
python sync_checkpoint.py. - Publish the merged checkpoint.
Intended Use
This project is designed for:
- conversational assistants
- documentation assistants
- support routing
- enterprise workflows
- knowledge assistants
- internal tooling
- structured response generation
Notes
- Stage 1 and Stage 2 are separate training jobs.
- Stage 1 is not a wrapper around LLaMA.
- Stage 2 is the higher-quality assistant tuning step.
- The repo includes example datasets only; real training data is still required.
- Set
CONRAD_ENDPOINT_URLandHF_TOKENin the Space secrets or environment to enable production chat. - If the endpoint is unavailable, the Space will show a clear fallback instead of generating from the raw checkpoint.
Model provider
deep-conrad
Model tree
Base
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information