EllaPriest45
Nanbeige4.1
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quickstart
For inference hyperparameters, we recommend the following settings:
- Temperature: 0.6
- Top-p: 0.95
- Repeat penalty: 1.0
- Max New Tokens: 131072
For the chat scenario:
markdown
from transformers import AutoModelForCausalLM, AutoTokenizertokenizer = AutoTokenizer.from_pretrained('Nanbeige/Nanbeige4.1-3B',use_fast=False,trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained('Nanbeige/Nanbeige4.1-3B',torch_dtype='auto',device_map='auto',trust_remote_code=True)messages = [{'role': 'user', 'content': 'Which number is bigger, 9.11 or 9.8?'}]prompt = tokenizer.apply_chat_template(messages,add_generation_prompt=True,tokenize=False)input_ids = tokenizer(prompt, add_special_tokens=False, return_tensors='pt').input_idsoutput_ids = model.generate(input_ids.to('cuda'), eos_token_id=166101)resp = tokenizer.decode(output_ids[0][len(input_ids[0]):], skip_special_tokens=True)print(resp)
For the tool use scenario:
markdown
from transformers import AutoModelForCausalLM, AutoTokenizertokenizer = AutoTokenizer.from_pretrained('Nanbeige/Nanbeige4.1-3B',use_fast=False,trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained('Nanbeige/Nanbeige4.1-3B',torch_dtype='auto',device_map='auto',trust_remote_code=True)messages = [{'role': 'user', 'content': 'Help me check the weather in Beijing now'}]tools = [{'type': 'function','function': {'name': 'SearchWeather','description': 'Find out the current weather in a place on a certain day.','parameters': {'type': 'dict','properties': {'location': {'type': 'string','description': 'A city in China.'},'required': ['location']}}}}]prompt = tokenizer.apply_chat_template(messages,tools,add_generation_prompt=True,tokenize=False)input_ids = tokenizer(prompt, add_special_tokens=False, return_tensors='pt').input_idsoutput_ids = model.generate(input_ids.to('cuda'), eos_token_id=166101)resp = tokenizer.decode(output_ids[0][len(input_ids[0]):], skip_special_tokens=True)print(resp)
For the deep-search scenario:
- Inference Framework: miroflow-framework!
- Switch tokenizer configuration to tokenizer_config_search.json
- Tools Configuration:
| Server | Description | Tools Provided |
|---|---|---|
| tool-python | Execution environment and file management (E2B sandbox) | create_sandbox, run_command, run_python_code, upload_file_from_local_to_sandbox, download_file_from_sandbox_to_local, download_file_from_internet_to_sandbox |
| search_and_scrape_webpage | Google search via Serper API | google_search |
| jina_scrape_llm_summary | Web scraping with LLM-based information extraction with Jina | scrape_and_extract_info |
- Summary model: Qwen3-14B-thinking
- Temperature: 1.0
- Note, access to HuggingFace has been explicitly disabled in these tools.
Limitations
While we place great emphasis on the safety of the model during the training process, striving to ensure that its outputs align with ethical and legal requirements, it may not completely avoid generating unexpected outputs due to the model's size and probabilistic nature. These outputs may include harmful content such as bias or discrimination. Please don't propagate such content. We do not assume any responsibility for the consequences resulting from the dissemination of inappropriate information.
Contact
If you have any questions, please raise an issue or contact us at nanbeige@kanzhun.com.
Model provider
EllaPriest45
Model tree
Base
Nanbeige/Nanbeige4-3B-Base
Fine-tuned
this model
Modalities
Input
Text
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information