SOULAMA

qwen2.5-coder-ft

Model Details

Model Description

This model has been fine-tuned using Low-Rank Adaptation (LoRA) and subsequently merged into full 16-bit precision weights. It is optimized to act as a strict code assistant, delivering accurate programming solutions while minimizing conversational overhead.

Developed by: Soulama Haicanama Ismael
Model type: Causal Language Model (Transformer Architecture)
Language(s) (NLP): English, Python
License: Apache 2.0 (inherited from Qwen base model)
Finetuned from model: Qwen/Qwen2.5-Coder-1.5B-Instruct

Model Sources

Repository: SOULAMA/qwen2.5-coder-ft

Uses

Direct Use

This model is intended for direct code generation and answering programming questions. It is designed to work within a Chat Template infrastructure using specific system prompts to isolate python code blocks.

Out-of-Scope Use

The model should not be used for generic non-coding tasks (such as writing creative essays, general chat, or translation), as its attention layers have been heavily adjusted towards script structures and programmatic vocabulary.

Bias, Risks, and Limitations

Due to its 1.5B parameter size, the model can suffer from context-loop repetition if the stopping criteria are not explicitly configured during inference. Users must handle stop tokens (<|im_end|>) strictly in their generation script to ensure execution stability.

Recommendations

It is highly recommended to lower the generation temperature ( $\leq 0.2$ ) and provide clear, standalone system instructions to ensure deterministic code results.

How to Get Started with the Model

Use the code below to get started with the model using proper generation boundaries:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "SOULAMA/qwen2.5-coder-ft"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

question = "Write a Python function that takes two values c and d and returns c+d."

def build_prompt(question: str) -> str:
    return (
        "<|im_start|>system\n"
        "Tu es un expert en programmation. Écris uniquement le code Python qui résout le problème.\n"
        "<|im_end|>\n"
        "<|im_start|>user\n"
        f"{question}\n"
        "<|im_end|>\n"
        "<|im_start|>assistant\n"
    )

messages=build_prompt(question)

inputs = tokenizer(messages, add_generation_prompt=True, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        inputs,
        max_new_tokens=256,
        temperature=0.1,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

new_tokens = output_ids[0][inputs.shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Training Details

Training Data

The model was trained on a custom instruction dataset containing coding exercises, software engineering questions, and structured Python scripts.

Training Procedure

Preprocessing

Training Hyperparameters

Training regime: PEFT (LoRA) followed by a full matrix merge_and_unload() into float16 precision.
Base model precision: 4-bit quantized base setup during training (BitsAndBytes).
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.

Speeds, Sizes, Times

Checkpoint size: ~3.09 GB (Full Safetensors model)
Adaptation layer size: ~73.9 MB (LoRA Weights)

Technical Specifications

Model Architecture and Objective

Based on the Qwen2.5-Coder dense architecture with Grouped-Query Attention (GQA) and RoPE (Rotary Position Embedding) optimized for dense source code token sequences.

Compute Infrastructure

Hardware

GPU Type: 1 x NVIDIA Tesla T4 (via Google Colab Ecosystem)

Software

Libraries: PyTorch, Transformers, PEFT, BitsAndBytes, TRL.

Model Card Authors

markdown
Soulama Haicanama Ismael

Model Card Contact

[More Information Needed]

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

SOULAMA

Model Tree

Base

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Model Description

Developed by: Soulama Haicanama Ismael
Model type: Causal Language Model (Transformer Architecture)
Language(s) (NLP): English, Python
License: Apache 2.0 (inherited from Qwen base model)
Finetuned from model: Qwen/Qwen2.5-Coder-1.5B-Instruct

Model Sources

Repository: SOULAMA/qwen2.5-coder-ft

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

It is highly recommended to lower the generation temperature ( $\leq 0.2$ ) and provide clear, standalone system instructions to ensure deterministic code results.

How to Get Started with the Model

Use the code below to get started with the model using proper generation boundaries:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "SOULAMA/qwen2.5-coder-ft"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)

question = "Write a Python function that takes two values c and d and returns c+d."

def build_prompt(question: str) -> str:
    return (
        "<|im_start|>system\n"
        "Tu es un expert en programmation. Écris uniquement le code Python qui résout le problème.\n"
        "<|im_end|>\n"
        "<|im_start|>user\n"
        f"{question}\n"
        "<|im_end|>\n"
        "<|im_start|>assistant\n"
    )

messages=build_prompt(question)

inputs = tokenizer(messages, add_generation_prompt=True, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        inputs,
        max_new_tokens=256,
        temperature=0.1,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

new_tokens = output_ids[0][inputs.shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Training Details

Training Data

The model was trained on a custom instruction dataset containing coding exercises, software engineering questions, and structured Python scripts.

Training Procedure

Preprocessing

Training Hyperparameters

Training regime: PEFT (LoRA) followed by a full matrix merge_and_unload() into float16 precision.
Base model precision: 4-bit quantized base setup during training (BitsAndBytes).
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj.

Speeds, Sizes, Times

Checkpoint size: ~3.09 GB (Full Safetensors model)
Adaptation layer size: ~73.9 MB (LoRA Weights)

Technical Specifications

Model Architecture and Objective

Based on the Qwen2.5-Coder dense architecture with Grouped-Query Attention (GQA) and RoPE (Rotary Position Embedding) optimized for dense source code token sequences.

Compute Infrastructure

Hardware

GPU Type: 1 x NVIDIA Tesla T4 (via Google Colab Ecosystem)

Software

Libraries: PyTorch, Transformers, PEFT, BitsAndBytes, TRL.

Model Card Authors

markdown
Soulama Haicanama Ismael

Model Card Contact

[More Information Needed]

qwen2.5-coder-ft

README

Model Details

Model Description

Model Sources

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Speeds, Sizes, Times

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Model Card Authors

Model Card Contact

Explore FriendliAI today

README

Model Details

Model Description

Model Sources

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Speeds, Sizes, Times

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Model Card Authors

Model Card Contact