Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0What is Exegetical Generation?
Unlike translation (1:1 semantic mapping), exegetical commentary expands a source text 5–20× by recovering:
- Implicit definitions of technical terms
- Philosophical context and doctrinal significance
- Cross-references to other texts in the tradition
- The commentator's interpretive framework
This model learns to produce such commentary in the style of Mark Dyczkowski's Kashmir Śaivism scholarship.
Experiment Results
| System | Info Gain (G) | Expansion Ratio | Notes |
|---|---|---|---|
| B2: Claude Haiku zero-shot | 77 | 128.6× | Fluent but ungrounded |
| B3.1: Claude RAG hybrid | 94 | 132.6× | MW dictionary grounding, best overall |
| B4-fs: Nova Micro few-shot | 25 | 81.8× | Small model baseline |
| B4-ft: This model | 17 | 56.0× | Style transfer works, metric undercounts Devanāgarī |
The low G score is partly an artifact — the information gain metric undercounts inline Devanāgarī terms (e.g., "Kubjikā (कुब्जिका)") which are a distinctive feature of Dyczkowski's style that this model successfully reproduces.
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-14B-Instruct |
| Method | QLoRA (4-bit NF4) |
| Rank / Alpha | 16 / 32 |
| Trainable params | 40M / 7.6B total (0.53%) |
| Training data | 704 pairs (28 OCR + 676 lecture) |
| Validation | 88 pairs |
| Test | 89 pairs |
| Epochs | 3 (264 steps) |
| Training time | 10 min 15 sec on 1× H100 80GB |
| Train loss | 1.626 |
| Eval loss | 1.67 |
| Token accuracy | 65.7% |
Training Data Sources
- OCR-extracted pairs (28) — Verse-commentary alignments from Tantrāloka Volume 1, pages 15-94. Extracted via Chandra OCR-2 (5.3B) on SageMaker.
- Lecture term-explanation pairs (676) — Sanskrit terms with contextual explanations from 24 Kubjikā/Paścimāmnāya lectures. Transcribed via Whisper large-v3, cleaned, and corrected with Claude Haiku 4.5.
Training data derived from Anamavajra-Labs/tantraloka-dyczkowski-raw.
Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfigfrom peft import PeftModelimport torchbase = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct",quantization_config=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16,),device_map="auto",)model = PeftModel.from_pretrained(base, "Anamavajra-Labs/exegen-qwen14b-lora")tokenizer = AutoTokenizer.from_pretrained("Anamavajra-Labs/exegen-qwen14b-lora")prompt = """<|im_start|>systemYou are a scholar of Kashmir Śaivism and Sanskrit philosophy specializing in the Tantrāloka of Abhinavagupta. Generate detailed exegetical commentary that explains the philosophical significance, technical terminology, and doctrinal context.<|im_end|><|im_start|>userGenerate exegetical commentary for: śaktipāta (शक्तिपात)<|im_end|><|im_start|>assistant"""inputs = tokenizer(prompt, return_tensors="pt").to(model.device)output = model.generate(**inputs,max_new_tokens=512,temperature=0.7,do_sample=True,repetition_penalty=1.15,)print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Hardware requirements: ~12 GB VRAM (4-bit quantized). Runs on a single RTX 3090/4090 or T4.
Example Output
Input: kuṇḍalinī (कुण्डलिनी)
Output (excerpt):
Kuṇḍalinī (कुण्डलिनी) is the coiled energy that resides at the base of the central channel (suṣumnā). In the Kubjikā (कुब्जिका) tradition of the Paścimāmnāya (पश्चिमाम्नाय), she is identified with the goddess herself — the crooked one, bent with the weight of manifestation. Abhinavagupta in the Tantrāloka (तन्त्रालोक) describes her awakening through śaktipāta (शक्तिपात), the descent of grace, which occurs through the guru's transmission...
Note the characteristic Dyczkowski-style inline Devanāgarī annotations and cross-references to Kubjikā and Paścimāmnāya traditions.
Qualitative Observations
The model successfully learns:
- Inline Devanāgarī — "mantra (मन्त्र)", "Kubjikā (कुब्जिका)" style annotations
- Tradition-specific framing — References to Kubjikā, Kula, Paścimāmnāya, Krama
- Commentarial voice — Adopts Dyczkowski's oral teaching register
- Cross-referencing — Spontaneous references to related texts and practices
Known Limitations
- Small training set (704 pairs) — repetition loops on some inputs (use
repetition_penalty=1.15) - Domain narrow — Primarily Kashmir Śaivism / Kubjikā tradition; limited coverage of other darśanas
- Hallucinated terms — Occasionally generates plausible but incorrect Sanskrit compounds
- No retrieval — Pure generation without grounding; B3.1 (RAG) scores higher on factual accuracy
Next Steps
119 additional Tantraloka lectures (chapters 1-3) have been transcribed and are available in the dataset. Extracting verse-commentary pairs from these will expand the training set from 704 to potentially 2000+ pairs for the next fine-tuning iteration.
Citation
bibtex
@misc{exegen-qwen14b-lora,title={ExeGen: Exegetical Generation as a New NLP Task},author={Ovcharov, Vladimir and Tatarchenko, Igor},year={2026},publisher={Anamavajra Labs},url={https://huggingface.co/Anamavajra-Labs/exegen-qwen14b-lora}}
Organization
Anamavajra Labs — Sanskrit NLP & Contemplative Studies
Model provider
Anamavajra-Labs
Model tree
Base
Qwen/Qwen2.5-14B-Instruct
Adapter
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information