OctoAI Acquired by NVIDIA (Shuts Down) – Predictions & Alternatives

Since meeting their team virtually at DockerCon 2023, we have seen OctoAI as among the most developer-friendly AIaaS platforms for AI inference and media generation in the market. Their platform treated open models, such as LLaMA and Stable Diffusion as first class citizens, which was a refreshingly different approach than the hyperscalers who were mainly interested in pushing OpenAI GPT and Dall-E, Anthropic Claude, or Google Gemini. OctoAI’s presence in the market at this early stage of the generative AI revolution undoubtedly pushed Azure AI Studio, Amazon Bedrock, and Vertex AI to provide developers with the choice of integrating their applications with hosted, open models on the public cloud.

We would like to congratulate the OctoAI team on their exit for a reported $165M to NVIDIA – an impressive recognition of their technology platform from the current leader of the artificial intelligence market.

If you are an OctoAI customer, you will have received an email with the subject “Wind down of OctoAI Services – ACTION NEEDED by 31 October 2024” and an excerpt from the email as follows:

We have made the strategic decision to initiate the wind down of the commercial availability of our services. As such, effective October 31, 2024, we are terminating your access to all OctoAI services and deactivating your account as permitted under the OctoAI Services Terms of Use. Until then our team will be available to answer questions and help you transition to another inference provider.

So, it is “adios” to OctoAI as a company as we know it. It will shut down and be merged into NVIDIA, an acqui-hire of the talent on the team and acquisition of the intellectual property to improve NVIDIA’s own products. It is clear that NVIDIA wants to preserve the strong lead that its the CUDA platform has over AMD ROCm, so they are increasingly pushing towards becoming a fully integrated AI company, rather than just a chip designer.

It is unfortunate, but understandable that NVIDIA primarily had an interest in the OctoStack IP for running inference on private clouds, and not the customers using OctoAI’s public endpoints. It is possible that NVIDIA will use OctoAI’s technology to extend another platform that they recently acquired, Brev.dev, to provide public endpoints to individual developers. Currently, Brev is similar to DigitalOcean’s PaperSpace, allowing developers to easily host AI models packaged as NVIDIA NIM microservices on public cloud GPU instances at an hourly rate.

But, why is OctoStack so important to NVIDIA? NVIDIA NIM is part of the NVIDIA AI enterprise offering, a Kubernetes-based platform for enterprises that already own large clusters of NVIDIA GPUs to host their own private endpoints. By absorbing OctoStack into its product portfolio, NVIDIA will be able to improve the inference performance of NVIDIA NIM, particularly with open models like LLaMA or Mistral.

What Developers and IT Pros Need to Do Now

So what are individual developers & teams using OctoAI inference as a service to do? This announcement affects customers who have deployed open source projects such as LibreChat or the Nextcloud AI Assistant using models hosted by OctoAI. Fortunately, we have the solution to that, with alternatives to OctoAI where you can continue to use the same open models, and pay only for the tokens that you use.

For inference providers such as Amazon Bedrock or Google Vertex AI, which provide a different API than the standard OpenAI API, a middleware proxy such as LiteLLM can translate your application’s OpenAI compatible calls into the API specification required – making migration easy.

This list is not an endorsement or recommendation by Autoize – it is intended as a starting point for research into alternative providers to OctoAI’s former inference service which will be sunsetting at the end of Oct 2024. Please contact our AI & Data Consultants for specific advice.