Storm Reply improves LLM performance with Intel oneAPI

19/09/2024

Storm Reply, a leader in cloud solutions, has chosen Amazon EC2 C7i instances, powered by 4th Gen Intel® Xeon® processors, to optimize its large language models (LLM). By leveraging Intel® oneAPI tools, Storm Reply has achieved GPU-level performance while optimizing costs.

Storm Reply specializes in helping clients deploy generative AI and LLM solutions. To meet the needs of a large energy company, Storm Reply needed to find an affordable and high-availability hosting solution for its LLM workloads.

Storm Reply improves LLM performance with EC2 C7i and Intel® oneAPI instances

Following an in-depth evaluation, Storm Reply selected Amazon EC2 C7i instances, powered by 4th Gen Intel® Xeon® Scalable processors. This infrastructure has proven to be ideal for LLM workloads, particularly due to the integration of Intel libraries and the Intel® GenAI framework.

Thanks to the optimizations of the Intel® Extension for PyTorch and oneAPI Toolkit, Storm Reply was able not only to improve its models’ performance but also to significantly reduce costs. Testing revealed that LLM inference with Intel Xeon Scalable processors achieved a response time of 92 seconds, compared to 485 seconds without Intel optimizations. The results of Storm Reply show that this CPU-based solution rivals GPU environments in terms of price-performance ratio.

The EC2 C7i instances enable Storm Reply to offer its customers robust AI solutions, while ensuring optimized resource and cost management.

To read the full article, click here.