mohitskaushal
gemma3-legal-sparsegpt-30
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherPruning Details
- Pruning method: SparseGPT-style one-shot pruning
- Target sparsity: 30%
- Calibration dataset:
mohitskaushal/InLegalLaySum-Phi4-Train-16K - Calibration samples: 32
- Sequence length used for calibration: 2048
- Pruned modules: decoder Linear layers
- Excluded from pruning:
lm_head
Important Note
This model contains zero-valued weights after pruning. The .safetensors file size may not be smaller because dense tensor storage still stores zero values.
Intended Use
This model is intended for research experiments on sparse legal summarization models, especially for comparing dense vs SparseGPT-pruned variants.
Example Use
python
import torchfrom transformers import AutoTokenizer, AutoModelForCausalLMmodel_id = "mohitskaushal/gemma3-legal-sparsegpt-30"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_id,dtype=torch.float16,device_map="auto",trust_remote_code=True,)prompt = \"\"\"You are a legal summarization assistant. Summarize the following legal text in simple layman language.Legal text:The appellant challenged the judgment of the High Court on the ground that the conviction was based on insufficient evidence and that material witnesses were not examined.Layman summary:\"\"\"inputs = tokenizer(prompt, return_tensors="pt").to(model.device)with torch.no_grad():outputs = model.generate(**inputs,max_new_tokens=200,do_sample=False,repetition_penalty=1.15,no_repeat_ngram_size=4,pad_token_id=tokenizer.eos_token_id,)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model provider
mohitskaushal
Model tree
Base
mohitskaushal/gemma-3-1b-it-inlegal-merged-fp16-16k
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information