June 2, 2024
3 min read

How TUNiB easily managed & scales their emotional chatbot service

Summary

TUNiB, a generative AI startup, leveraged FriendliAI's Friendli Dedicated Endpoints to efficiently manage and scale their emotional chatbot service, Dearmate.

Introduction to TUNiB

TUNiB is a generative AI startup specializing in conversational AI chatbots, language models, and NLP APIs. Their chatbots engage users in natural conversations powered by NLP technology. Committed to innovation and advanced technologies, TUNiB delivers tailored solutions swiftly to meet the personalization needs of each customer.

Challenges

As TUNiB's Dearmate chatbot service grew, managing and serving large language models (LLMs) became increasingly challenging. Serving LLMs requires specialized skills in deep learning, distributed systems, and Devops for integrations. Additionally, building and maintaining the infrastructure for serving LLMs at scale is a complex engineering undertaking that demands substantial resources. TUNiB's main challenge was to reduce the engineering efforts and operational costs associated with LLM maintenance for their chatbot service.

FriendliAI's Role

To address the challenges of managing and serving large language models (LLMs) for their Dearmate chatbot service, TUNiB turned to FriendliAI Dedicated Endpoints. This solution allowed them to easily create high-performance, scalable, cost-effective inference endpoints for their chatbot models. FriendliAI's comprehensive support for custom LLMs enabled TUNiB to deploy models tailored to their specific requirements and domain-specific challenges.

Benefits of FriendliAI Dedicated Endpoints

Comprehensive LLM Support: FriendliAI offers support for open-source and custom LLMs, enabling TUNiB to deploy custom models tailored to their unique requirements.

Dedicated GPU Instances: FriendliAI Dedicated Endpoints provide dedicated GPU instances, ensuring consistent access to computing resources without contention or performance fluctuations. By eliminating resource sharing, TUNiB could rely on predictable performance levels for their LLM inference tasks, enhancing productivity and reliability across their chatbot services.

Streamlined LLM Serving Process: FriendliAI provided a robust and scalable infrastructure for serving TUNiB's LLMs. It handled resource allocation, scaling, and load balancing across different deployment environments. Additionally, FriendliAI handled tasks like model optimization, parallelization, and efficient resource utilization for TUNiB's LLMs, allowing them to focus on their core chatbot applications.

Consistent Performance: The dedicated GPU instances and the underlying Friendli Engine ensured consistent performance. This enabled TUNiB to focus on delivering high-quality chatbot experiences to their customers without worrying about resource contention or fluctuations.

Monitoring Tools: FriendliAI simplified tracking LLM performance, identifying issues, and troubleshooting for TUNiB. It offered insights into resource utilization, latency, and other vital metrics.

The FriendliAI Dedicated Endpoints provided dedicated GPU instances, ensuring consistent access to computing resources without contention or performance fluctuations, automatic recovery of failures, and auto-scaling to handle varying input traffic. This eliminated the need for TUNiB to manage the underlying infrastructure, allowing them to focus on developing their LLMs.

Results

By leveraging Friendli Dedicated Endpoints, TUNiB experienced a convenient, reliable, and cost-efficient service without the burden of self-management. They achieved uninterrupted service and could onboard their chatbot service in less than 20 minutes, enabling them to focus on enhancing their generative AI applications.

About the Friendli Dedicated Endpoint Solution

FriendliAI is a leader in accelerating generative AI applications. Friendli Engine supports custom and open-source LLMs and introduces advanced features like its optimized DNN library, native quantization, iteration batching, token caching, and multi-LoRA serving. These technologies enable efficient model deployment on fewer GPUs without sacrificing performance or accuracy.

Friendli Dedicated Endpoints employs the Friendli Engine to efficiently serve the inference of generative AI models, significantly reducing costs. In addition, the cloud service makes it extremely easy to create, deploy, and serve generative AI models for those with no infrastructure management.

Join us in serving your LLM inference faster and without any hassle. Sign up

For inquiries, contact us at sales@friendli.ai.