Known Contract: Wikipedia Style
The adapter was trained on REBEL, so its output follows a Wikipedia-style contract:
- Entity surface forms follow Wikipedia conventions
- Relations follow Wikidata definitions and granularity
This is the contract the model is built against. Inputs and evaluation should stay inside that contract — domain-specific terminology, informal text, or alternate relation ontologies are out of scope and will degrade quality.
The model emits structured JSON:
{
"entities": ["Entity A", "Entity B"],
"relations": [
{"head": "Entity A", "relation": "relation_type", "tail": "Entity B"}
]
}
Usage
from transformers import pipeline
pipe = pipeline(
"text-generation", model="rst0070/tiny-graph-extractor-qwen3.5-0.8b-lora",
max_new_tokens=1024,
)
messages = [
{
"role": "system",
"content": (
"You are a knowledge graph extraction assistant. "
"Given a text, extract all entities and their relations as JSON. "
"Output only valid JSON with no additional text."
)
},
{
"role": "user",
"content": "Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976. Steve Jobs served as the CEO of Apple.",
},
]
result = pipe(messages)
response = result[0]['generated_text'][-1]
print(response)
import re
import json
fence_re = re.compile(r"```(?:json)?\s*\n(.*?)\n```", re.DOTALL)
match = fence_re.search(response["content"])
if match:
print(json.loads(match.group(1)))
Training
The training target was constructed to deviate as little as possible from the base model's natural output, so SFT only has to close the smallest possible gap:
- Observe how the base model formats answers on REBEL inputs with no fine-tuning.
- Align the SFT target to match that observed format/phrasing where possible, while staying factually correct.
- Train with QLoRA SFT against that aligned target.
Limitations
- English / Wikipedia distribution only. Performance on other languages, domains (medical, legal, financial), or informal text is unknown and likely poor.
- Wikidata relation ontology. Relations outside the Wikidata vocabulary will not be produced reliably.
- Sentence-level inputs. Trained on REBEL sentence-level examples; not validated on long documents.
- No factual grounding. The model extracts what it reads; it does not verify claims.
License
Apache 2.0, inheriting from the base model and dataset license terms.