Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Training Configuration
| Parameter | Value |
|---|---|
| Training Mode | SFT |
| Base Model | Jackrong/Qwopus3.5-9B-Coder |
| Learning Rate | 0.0002 |
| Epochs | 2 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 16 |
| Max Sequence Length | 4096 |
| Optimizer | paged_adamw_8bit |
| LR Scheduler | cosine |
| Warmup Ratio | 0.05 |
| Weight Decay | 0.01 |
| Max Grad Norm | 1.0 |
| Seed | 42 |
| LoRA Rank (r) | 256 |
| LoRA Alpha | 256 |
| LoRA Dropout | 0.05 |
| Target Modules | k_proj, o_proj, q_proj, v_proj, down_proj, gate_proj, up_proj |
| Quantization | 4-bit (NF4) |
| GPU | NVIDIA RTX A6000 |
Datasets
Trained on 3 concatenated datasets:
hemlang/Hemlock2-DPO(split:train)hemlang/hemlock-formulary-SFT(split:train)hemlang/hemlock-codex-SFT(split:train)
Reproduce this training run
This model was trained with Merlina. Save the JSON below to data/configs/<name>.json (or import it via the Load Configuration dialog) to reproduce the exact training setup. Credentials are not included — Merlina will use your own HF_TOKEN and WANDB_API_KEY from .env or the form.
json
{"_metadata": {"name": "Hemlock-Qwopus3.5-9B-Coder","description": "Training configuration shared from a Merlina-trained model.","tags": [],"schema": "merlina/training-config","schema_version": 1,"merlina_version": "2.0.1"},"base_model": "Jackrong/Qwopus3.5-9B-Coder","output_name": "Hemlock-Qwopus3.5-9B-Coder","use_lora": true,"lora_r": 256,"lora_alpha": 256,"lora_dropout": 0.05,"target_modules": ["k_proj","o_proj","q_proj","v_proj","down_proj","gate_proj","up_proj"],"modules_to_save": [],"lora_task_type": "CAUSAL_LM","learning_rate": 0.0002,"num_epochs": 2,"batch_size": 2,"gradient_accumulation_steps": 8,"max_length": 4096,"max_prompt_length": 1024,"model_type": "auto","training_mode": "sft","beta": 0.1,"label_smoothing": 0.0,"gamma": 0.5,"vision_model_id": null,"stage": null,"unfreeze_vision_top_n": null,"image_token_id": null,"min_pixels": null,"max_pixels": null,"image_column": null,"caption_column": null,"instruction": null,"streaming": null,"model_name": null,"image_resolution": 1024,"lora_rank": 32,"lora_target_modules": null,"lora_use_dora": false,"mid_training_samples": true,"dataset_jsonl_path": null,"dataset_name": null,"dataset_split": null,"sample_prompts": null,"sample_num_steps": null,"dataset": {"source": {"source_type": "huggingface","repo_id": "hemlang/Hemlock2-DPO","split": "train","file_path": null,"file_format": null,"dataset_id": null,"streaming": false,"streaming_batch_size": 10000,"column_mapping": null},"additional_sources": [{"source_type": "huggingface","repo_id": "hemlang/hemlock-formulary-SFT","split": "train","file_path": null,"file_format": null,"dataset_id": null,"streaming": false,"streaming_batch_size": 10000,"column_mapping": {"instruction": "prompt","output": "chosen"}},{"source_type": "huggingface","repo_id": "hemlang/hemlock-codex-SFT","split": "train","file_path": null,"file_format": null,"dataset_id": null,"streaming": false,"streaming_batch_size": 10000,"column_mapping": {"instruction": "prompt","output": "chosen"}}],"format": {"format_type": "tokenizer","custom_templates": null,"enable_thinking": true},"model_name": "Jackrong/Qwopus3.5-9B-Coder","column_mapping": {"prompt": "prompt","chosen": "chosen","rejected": "rejected"},"convert_messages_format": true,"deduplicate": false,"dedupe_strategy": "prompt_chosen","test_size": 0.01,"max_samples": null,"system_prompt": null,"system_prompt_mode": "fill_empty","training_mode": "sft"},"seed": 42,"max_grad_norm": 1.0,"warmup_ratio": 0.05,"eval_steps": 0.2,"use_4bit": true,"use_wandb": true,"push_to_hub": true,"merge_lora_before_upload": true,"hf_hub_private": true,"export_gguf": false,"gguf_quant_types": ["Q4_K_M"],"keep_gguf_fp16": false,"shuffle_dataset": true,"weight_decay": 0.01,"lr_scheduler_type": "cosine","gradient_checkpointing": true,"logging_steps": 1,"optimizer_type": "paged_adamw_8bit","adam_beta1": 0.9,"adam_beta2": 0.999,"adam_epsilon": 1e-08,"adafactor_relative_step": false,"adafactor_scale_parameter": false,"adafactor_warmup_init": false,"adafactor_decay_rate": -0.8,"adafactor_beta1": null,"adafactor_clip_threshold": 1.0,"attn_implementation": "sdpa","use_liger": true,"torch_compile": false,"neftune_alpha": null,"eval_on_start": false,"gpu_ids": null,"multi_gpu_strategy": "auto","wandb_project": null,"wandb_run_name": null,"wandb_tags": null,"wandb_notes": null}

Model provider
nbeerbower
Model tree
Base
Jackrong/Qwopus3.5-9B-Coder
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information