Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Model Details
Model Description
- Developed by: Orange
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: [More Information Needed]
- Language(s) (NLP): Wolof
- License: [More Information Needed]
- Finetuned from model [optional]: Orange/Qwen3-8B
- Date [optional]: 2026-06-13 12:10:53
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
This model can be used with the transformers library using pipeline abstraction as follows:
python
import torchfrom transformers import pipelinemodel_id = "Orange/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1"pipe = pipeline("text-generation",model=model_id,torch_dtype=torch.bfloat16,device_map="auto",)messages = [{"role": "system", "content": "You are chatbot specialized on Wolof language."},{"role": "user", "content": "Can you give a sample of your specialized knowledge?"},]outputs = pipe(messages,max_new_tokens=256,)print(outputs[0]["generated_text"][-1])
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
This model was finetuned with Orange internal fine tuning tools with the Docker Image tagged 0.2.6t5alpha in the registry and the following configuration file:
yaml
data:add_dataset_name: falsedataset_name:train:- path: wolof-lm/en_function_calling-instructionsrevision: v10split: train- path: wolof-lm/alpaca_translated_into_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/argilla_databricks-dolly-15k-curated-multilingual_en_translated_into_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/samsung_samsum_instruct_dialogues_translated_in_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/alwaly_french_wolof-instructionssplit: train- path: wolof-lm/alwaly_wolof_to_french-instructionssplit: train- path: wolof-lm/claire_masking_multilogue-instructionssplit: train- path: wolof-lm/huggingfaceh4_helpful-anthropic-raw_train_translated_in_wolof-instructionssplit: train- path: wolof-lm/lodrick-the-lafted-hermes-217k-instructionssplit: train- path: wolof-lm/wisenut-nlp-team_translated_in_wolof-instructionssplit: train- path: wolof-lm/claire_one_to_one_dialogue-instructionsrevision: 2026.01.16split: train- path: wolof-lm/wol-sonatel-2025-conversationsrevision: 2026.02.18split: trainvalidation_conversational_eng:- path: wolof-lm/oasst1-en-instructionsrevision: 2026.03.19split: validationvalidation_conversational_fra:- path: wolof-lm/oasst1-fr-instructionsrevision: 2026.03.19split: validation- path: wolof-lm/synthetic-fr-instructionsrevision: 2026.03.19split: validationvalidation_conversational_wol:- path: wolof-lm/claire_one_to_one_dialogue-instructionsrevision: 2026.01.16split: validation- path: wolof-lm/wol-sonatel-2025-conversationsrevision: 2026.02.18split: validationvalidation_fc:- path: wolof-lm/en_function_calling-instructionsrevision: v10split: validationvalidation_if:- path: wolof-lm/tulu-3-sft-personas-instruction-followingsplit: validationvalidation_it-v2:- path: wolof-lm/alpaca_translated_into_wolof-instructionsrevision: 2025.02.28split: validation- path: wolof-lm/argilla_databricks-dolly-15k-curated-multilingual_en_translated_into_wolof-instructionsrevision: 2026.03.02split: validation- path: wolof-lm/samsung_samsum_instruct_dialogues_translated_in_wolof-instructionsrevision: 2025.02.28split: validation- path: wolof-lm/alwaly_french_wolof-instructionssplit: validation- path: wolof-lm/alwaly_wolof_to_french-instructionssplit: validation- path: wolof-lm/claire_masking_multilogue-instructionssplit: validation- path: wolof-lm/huggingfaceh4_helpful-anthropic-raw_train_translated_in_wolof-instructionssplit: validation- path: wolof-lm/lodrick-the-lafted-hermes-217k-instructionssplit: validation- path: wolof-lm/wisenut-nlp-team_translated_in_wolof-instructionssplit: validationdebug: falseimplementation_name: conversationsdescription:contributors:- email: claire.perroux@orange.comfirst_name: Clairelast_name: Perroux- email: pierre.adam@orange.comfirst_name: Pierrelast_name: Adamdomain: Woloflanguages:- womodel_name: Orange/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1image:name: registry.gitlab.tech.orange/nepal/knowledge/orangelm/lm-adaptation/lm-adaptationversion: 0.2.6t5alphamodel:attn_implementation: flash_attention_2chat_template_tokenizer: Orange/Qwen3-8Bmodel_name_or_path: Orange/Qwen3-8Btrust_remote_code: truesoftware_info:git:branch: 119-cuda-13-and-transformers-5commit_date: '2026-06-10 15:41:20+02:00'commit_hash: 314a8efb580bf36117958e80dc33d7e221aa7d30commit_message: 'build: set same version for kernels library in pyprojectas in requirements.txt'is_dirty: falseshort_hash: 314a8efparameters:accelerate_file: nullclean_output: falsecluster: marcelcodecarbon_log_level: warningcollator_stats: falseconfig_file: resources/configs/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1.ymlconstraints: queue:name=gpu_mem_80,gpu:2,partition:name=multigpu,partition:name=Alldata_seed: 42debug: falsedeepspeed_file: resources/deepspeed/zero3.jsondry_run: falseenvironment_variable:TOKENIZERS_PARALLELISM: 'false'load_only: nulllog_level: INFOlog_output: nullmerge_peft: falseno_requeue: falseoutput_dir: Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28-2gpuprofiler_config: nullpush_to_hub: falseqat_config: nullrequeue_n_epochs: 1time_out: 72htime_requeue: falseuse_qat: falseversion: 0.21.2.dev33+gb8ccade7f.d20260610training:assistant_only_loss: truebf16: truedataloader_num_workers: 4dataloader_persistent_workers: falsedataloader_pin_memory: falsedataloader_prefetch_factor: 2deepspeed: /config/zero3.jsondisable_tqdm: trueeval_accumulation_steps: 1eval_steps: 10eval_strategy: stepsfp16: falsegradient_accumulation_steps: 32gradient_checkpointing: truelearning_rate: 2.0e-05log_level: warninglogging_dir: /outputs/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28-2gpu/logslogging_steps: 1lr_scheduler_type: cosinemax_grad_norm: 1.0max_length: 2048max_steps: -1num_train_epochs: 3optim: paged_adamw_32bitoutput_dir: /outputs/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28-2gpuper_device_eval_batch_size: 16per_device_train_batch_size: 16push_to_hub: falsereport_to: tensorboardsave_steps: 0save_strategy: epochsave_total_limit: 1seed: 42torch_compile: falsetraining_type: instruct-tuninguse_liger_kernel: truewarmup_ratio: 0.05weight_decay: 0.1
The model was trained on 2 gpus with at least 80GB on each gpu.
The model was trained using deepspeed with the following configuration file:
json
{"fp16": {"enabled": "auto","loss_scale": 0,"loss_scale_window": 1000,"initial_scale_power": 16,"hysteresis": 2,"min_loss_scale": 1},"bf16": {"enabled": "auto"},"zero_optimization": {"stage": 3,"offload_optimizer": {"device": "cpu","pin_memory": false},"overlap_comm": true,"contiguous_gradients": true,"sub_group_size": "1e9","reduce_bucket_size": "auto","stage3_prefetch_bucket_size": "auto","stage3_param_persistence_threshold": "auto","stage3_max_live_parameters": "1e9","stage3_max_reuse_distance": "1e9","stage3_gather_16bit_weights_on_model_save": true},"gradient_accumulation_steps": "auto","gradient_clipping": "auto","steps_per_print": 2000,"train_batch_size": "auto","train_micro_batch_size_per_gpu": "auto","wall_clock_breakdown": false}
Training Data
This model was trained on the following datasets:
yaml
- path: wolof-lm/en_function_calling-instructionsrevision: v10split: train- path: wolof-lm/alpaca_translated_into_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/argilla_databricks-dolly-15k-curated-multilingual_en_translated_into_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/samsung_samsum_instruct_dialogues_translated_in_wolof-instructionsrevision: 2025.02.28split: train- path: wolof-lm/alwaly_french_wolof-instructionssplit: train- path: wolof-lm/alwaly_wolof_to_french-instructionssplit: train- path: wolof-lm/claire_masking_multilogue-instructionssplit: train- path: wolof-lm/huggingfaceh4_helpful-anthropic-raw_train_translated_in_wolof-instructionssplit: train- path: wolof-lm/lodrick-the-lafted-hermes-217k-instructionssplit: train- path: wolof-lm/wisenut-nlp-team_translated_in_wolof-instructionssplit: train- path: wolof-lm/claire_one_to_one_dialogue-instructionsrevision: 2026.01.16split: train- path: wolof-lm/wol-sonatel-2025-conversationsrevision: 2026.02.18split: train
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: This model was trained with the following hyperparameters for
SFTTrainer,other parameters were set as default:
yaml
assistant_only_loss: truebf16: truedataloader_num_workers: 4dataloader_persistent_workers: falsedataloader_pin_memory: falsedataloader_prefetch_factor: 2deepspeed: /config/zero3.jsondisable_tqdm: trueeval_accumulation_steps: 1eval_steps: 10eval_strategy: stepsfp16: falsegradient_accumulation_steps: 32gradient_checkpointing: truelearning_rate: 2.0e-05log_level: warninglogging_dir: /outputs/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28-2gpu/logslogging_steps: 1lr_scheduler_type: cosinemax_grad_norm: 1.0max_length: 2048max_steps: -1num_train_epochs: 3optim: paged_adamw_32bitoutput_dir: /outputs/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28-2gpuper_device_eval_batch_size: 16per_device_train_batch_size: 16push_to_hub: falsereport_to: tensorboardsave_steps: 0save_strategy: epochsave_total_limit: 1seed: 42torch_compile: falseuse_liger_kernel: truewarmup_ratio: 0.05weight_decay: 0.1
Number of Tokens vs Steps

Learning Rate Curve

Training Loss

Validation Loss

Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: CPUs: AMD EPYC 9124 16-Core Processor; GPUs: 2 x NVIDIA H100 NVL
- Hours used: 74:02:30
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: 2.6 kg CO2eq, detailed emissions can be found in
emissions.csv(emissions were computed usingcodecarbon)
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
Thanks to Claire Perroux, Pierre Adam for adding this model.
Model provider
ldp72
Model tree
Base
Orange/Qwen3-8B
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information