mohar07

qwen3-0.6b-kg-triplets

Motivation

Most instruction-tuned LLMs can extract entities and relations, but their outputs are difficult to ingest directly into graph databases because of:

inconsistent entity naming
out-of-schema relations
poorly calibrated confidence scores
inconsistent JSON formatting

This model was finetuned specifically to produce:

ontology-constrained outputs
normalized entity names
calibrated relation confidence weights
graph-ingestable JSON

Model Details

Table with columns: Property, Value
Property	Value
Base Model	unsloth/qwen3-0.6b
Finetuning	LoRA
Rank (r)	32
Alpha	32
Context Length	2048
Epochs	5
Optimizer	AdamW 8-bit
Framework	Unsloth + TRL
Training Type	Instruction Finetuning
License	MIT

Training Configuration

python
model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=32,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

Table with columns: Parameter, Value
Parameter	Value
Batch size	2
Gradient accumulation	4
Learning rate	5e-5
Epochs	5
Warmup steps	50
Max sequence length	2048
Optimizer	AdamW 8-bit
Seed	42

Dataset

The model was trained on a custom instruction dataset for structured knowledge graph extraction.

Corpus Sources

Wikipedia
arXiv papers

Dataset Statistics

Table with columns: Split, Examples
Split	Examples
Train	2575
Validation	75
Test	700
Total	3350

Additional properties:

20 ontology relations
15% hard negatives in training
Entity-level train/test decontamination
Curriculum ordering (easy → hard)
Zero schema errors

Relation Schema

The model predicts only the following ontology:

markdown
implements
trained_on
evaluates
part_of
introduces
extends
depends_on
contrasts_with
applied_to
measured_by
founded_by
developed_by
defined_as
consists_of
is_type_of
based_on
used_for
created_by
located_in
predecessor_of

Relations outside this ontology are intentionally not generated.

Dataset Creation Pipeline

The training corpus was built using a multi-stage pipeline:

Corpus collection from Wikipedia and arXiv
Language and quality filtering
MinHash deduplication
LLM triplet generation using DeepSeek V4-Flash
Schema validation
Semantic validation
Hard negative generation
Curriculum ordering
Entity-level train/test decontamination
Train / Validation / Test split

Training Example

Input:

json
{
  "role": "user",
  "content": "Extract knowledge graph triplets..."
}

Output:

json
[
  {
    "source": {
      "title": "September",
      "type": "entity"
    },
    "relation": {
      "type": "part_of",
      "weight": 0.92
    },
    "target": {
      "title": "Gregorian calendar",
      "type": "entity"
    }
  },
  {
    "source": {
      "title": "September",
      "type": "entity"
    },
    "relation": {
      "type": "defined_as",
      "weight": 0.92
    },
    "target": {
      "title": "ninth month",
      "type": "concept"
    }
  }
]

Evaluation

Evaluation was performed using a custom triplet extraction benchmark with Hungarian bipartite matching alignment on 700 held-out entries.

Metrics

Table with columns: Metric, Score, Weight
Metric	Score	Weight
Schema score	1.000	0.30
Entity F1	0.179	0.25
Relation accuracy	0.680	0.20
Grounding	0.969	0.15
Weight score	0.526	0.10
Triplet F1 (info only)	0.122

Composite Score: 0.6583

What Finetuning Fixed

Finetuning addressed three major failure modes of the base model.

1. Entity Normalization

Input passage:

Studies of the Cambrian period document the rapid diversification of animal life and the emergence of most major animal phyla, with some researchers proposing that a celestial body impact may have triggered the extinction events that preceded this radiation.

Base entity title extracted:

markdown
After a thorough research on the circumstantial changes and the great evolution of life in the Cambrian period

Finetuned entity title extracted:

markdown
Celestial body impact hypothesis

The finetuned model learns reusable and atomic graph nodes rather than copying passage fragments.

2. Schema Adherence

Base relations generated:

markdown
released
benefited_from

Finetuned relations generated:

markdown
based_on
used_for
applied_to
introduces

All generated relations belong to the predefined ontology.

3. Confidence Calibration

Base weights:

markdown
0.8
0.8
0.8
0.8

Finetuned weights:

markdown
0.23
0.41
0.59
0.77

The model learns meaningful confidence distributions where stronger relations receive higher scores.

Intended Use

This model is intended for:

Knowledge Graph Construction
GraphRAG pipelines
Structured Information Extraction
Entity-Relation Extraction
Automated KG population
Document-to-Graph conversion

Limitations

While the model demonstrates strong schema adherence and grounding, several limitations remain.

Shallow Entity Abstraction

The model favors concise and reusable entities but may miss deeper semantic abstractions or hierarchical entity relationships.

Limited Recall

The model prioritizes schema correctness and grounded extraction over exhaustive triplet recall. Entity F1 of 0.179 reflects strict Hungarian-matching alignment on a 20-relation ontology-constrained task; recall is intentionally traded for precision and schema adherence.

English-Centric Training

Training was primarily conducted on English Wikipedia and arXiv passages.

Ontology Constrained

Only the predefined 20 relation types are supported.

Model Size Constraints

Despite the relatively small size (0.6B parameters) and a modest training corpus (~3K examples), the model learns stable ontology-constrained extraction behavior. Larger models may achieve deeper entity understanding and broader relation coverage.

Repository

Evaluation Pipeline: https://github.com/mohar-xe/HGR-finetuned-model-evaluation-pipeline

Model: https://huggingface.co/mohar07/qwen3-0.6b-kg-triplets

Citation

bibtex
@misc{das2026qwenkgtriplets,
  title={Qwen3-0.6B-KG-Triplets},
  author={Mohar Das},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/mohar07/qwen3-0.6b-kg-triplets}
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

mohar07

Model Tree

Base

this model

Input Modalities

Text

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Explore FriendliAI today

Get started Talk to an engineer

Motivation

Most instruction-tuned LLMs can extract entities and relations, but their outputs are difficult to ingest directly into graph databases because of:

inconsistent entity naming
out-of-schema relations
poorly calibrated confidence scores
inconsistent JSON formatting

This model was finetuned specifically to produce:

ontology-constrained outputs
normalized entity names
calibrated relation confidence weights
graph-ingestable JSON

Model Details

Table with columns: Property, Value
Property	Value
Base Model	unsloth/qwen3-0.6b
Finetuning	LoRA
Rank (r)	32
Alpha	32
Context Length	2048
Epochs	5
Optimizer	AdamW 8-bit
Framework	Unsloth + TRL
Training Type	Instruction Finetuning
License	MIT

Training Configuration

python
model = FastLanguageModel.get_peft_model(
    model,
    r=32,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=32,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

Table with columns: Parameter, Value
Parameter	Value
Batch size	2
Gradient accumulation	4
Learning rate	5e-5
Epochs	5
Warmup steps	50
Max sequence length	2048
Optimizer	AdamW 8-bit
Seed	42

Dataset

The model was trained on a custom instruction dataset for structured knowledge graph extraction.

Corpus Sources

Wikipedia
arXiv papers

Dataset Statistics

Table with columns: Split, Examples
Split	Examples
Train	2575
Validation	75
Test	700
Total	3350

Additional properties:

20 ontology relations
15% hard negatives in training
Entity-level train/test decontamination
Curriculum ordering (easy → hard)
Zero schema errors

Relation Schema

The model predicts only the following ontology:

markdown
implements
trained_on
evaluates
part_of
introduces
extends
depends_on
contrasts_with
applied_to
measured_by
founded_by
developed_by
defined_as
consists_of
is_type_of
based_on
used_for
created_by
located_in
predecessor_of

Relations outside this ontology are intentionally not generated.

Dataset Creation Pipeline

The training corpus was built using a multi-stage pipeline:

Corpus collection from Wikipedia and arXiv
Language and quality filtering
MinHash deduplication
LLM triplet generation using DeepSeek V4-Flash
Schema validation
Semantic validation
Hard negative generation
Curriculum ordering
Entity-level train/test decontamination
Train / Validation / Test split

Training Example

Input:

json
{
  "role": "user",
  "content": "Extract knowledge graph triplets..."
}

Output:

json
[
  {
    "source": {
      "title": "September",
      "type": "entity"
    },
    "relation": {
      "type": "part_of",
      "weight": 0.92
    },
    "target": {
      "title": "Gregorian calendar",
      "type": "entity"
    }
  },
  {
    "source": {
      "title": "September",
      "type": "entity"
    },
    "relation": {
      "type": "defined_as",
      "weight": 0.92
    },
    "target": {
      "title": "ninth month",
      "type": "concept"
    }
  }
]

Evaluation

Evaluation was performed using a custom triplet extraction benchmark with Hungarian bipartite matching alignment on 700 held-out entries.

Metrics

Table with columns: Metric, Score, Weight
Metric	Score	Weight
Schema score	1.000	0.30
Entity F1	0.179	0.25
Relation accuracy	0.680	0.20
Grounding	0.969	0.15
Weight score	0.526	0.10
Triplet F1 (info only)	0.122

Composite Score: 0.6583

What Finetuning Fixed

Finetuning addressed three major failure modes of the base model.

1. Entity Normalization

Input passage:

Studies of the Cambrian period document the rapid diversification of animal life and the emergence of most major animal phyla, with some researchers proposing that a celestial body impact may have triggered the extinction events that preceded this radiation.

Base entity title extracted:

markdown
After a thorough research on the circumstantial changes and the great evolution of life in the Cambrian period

Finetuned entity title extracted:

markdown
Celestial body impact hypothesis

The finetuned model learns reusable and atomic graph nodes rather than copying passage fragments.

2. Schema Adherence

Base relations generated:

markdown
released
benefited_from

Finetuned relations generated:

markdown
based_on
used_for
applied_to
introduces

All generated relations belong to the predefined ontology.

3. Confidence Calibration

Base weights:

markdown
0.8
0.8
0.8
0.8

Finetuned weights:

markdown
0.23
0.41
0.59
0.77

The model learns meaningful confidence distributions where stronger relations receive higher scores.

Intended Use

This model is intended for:

Knowledge Graph Construction
GraphRAG pipelines
Structured Information Extraction
Entity-Relation Extraction
Automated KG population
Document-to-Graph conversion

Limitations

While the model demonstrates strong schema adherence and grounding, several limitations remain.

Shallow Entity Abstraction

The model favors concise and reusable entities but may miss deeper semantic abstractions or hierarchical entity relationships.

Limited Recall

English-Centric Training

Training was primarily conducted on English Wikipedia and arXiv passages.

Ontology Constrained

Only the predefined 20 relation types are supported.

Model Size Constraints

Repository

Evaluation Pipeline: https://github.com/mohar-xe/HGR-finetuned-model-evaluation-pipeline

Model: https://huggingface.co/mohar07/qwen3-0.6b-kg-triplets

Citation

bibtex
@misc{das2026qwenkgtriplets,
  title={Qwen3-0.6B-KG-Triplets},
  author={Mohar Das},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/mohar07/qwen3-0.6b-kg-triplets}
}

qwen3-0.6b-kg-triplets

README

Motivation

Model Details

Training Configuration

Dataset

Corpus Sources

Dataset Statistics

Relation Schema

Dataset Creation Pipeline

Training Example

Evaluation

Metrics

What Finetuning Fixed

1. Entity Normalization

2. Schema Adherence

3. Confidence Calibration

Intended Use

Limitations

Repository

Citation

Explore FriendliAI today

README

Motivation

Model Details

Training Configuration

Dataset

Corpus Sources

Dataset Statistics

Relation Schema

Dataset Creation Pipeline

Training Example

Evaluation

Metrics

What Finetuning Fixed

1. Entity Normalization

2. Schema Adherence

3. Confidence Calibration

Intended Use

Limitations

Repository

Citation