Private Enterprise GPT on Any Cloud with Inference APIs

Are your employees using the consumer versions of ChatGPT or Copilot (formerly Bing Chat) without your knowledge? Especially with hybrid work arrangements, this could be surreptitiously happening on employees’ mobile phones or personal laptops – even if company devices are properly monitored and managed. It’s no longer enough to trust users’ goodwill in following your AI Tool Usage Policy. The branding between the consumer & work versions are so confusingly similar that users can be easily lulled into a false sense of security simply because they’re on a paid plan with OpenAI or Microsoft.

In this article, we explain the privacy risks associated with the confusion between the personal vs. business ChatGPT and Copilot products, and a better alternative for letting your employees chat with their code and documents while preventing data leakage.

In fact, with our open source ChatGPT alternative, you will probably even wind up saving money compared to subscribing to Microsoft 365 Copilot for all of your users, as you can create an unlimited number of users with no per-user subscription in our enterprise GPT stack.

ChatGPT Plus ≠ ChatGPT Team or Enterprise

If you didn’t already know, the data sharing policy differentiates the personal version of ChatGPT or ChatGPT Plus from its business counterparts, ChatGPT Team or Enterprise, at OpenAI. On the personal accounts, data-sharing for model training is on by default (opt out), while on the Team or Enterprise plans, it is off by default (opt in). It’s important to point out that paying for the $20/mo Plus version of ChatGPT does not automatically switch off data sharing.

Office 365 (Personal or Family) and Copilot Pro ≠ Microsoft 365 Copilot

Similarly, Microsoft draws the line between personal and business use for Copilot based on whether the signed in Microsoft account is an Entra ID (formerly Azure AD) login with an active Microsoft 365 subscription. If users are signed in with their personal Microsoft accounts (with Outlook or Hotmail addresses), their interactions with Copilot can be retained by Microsoft. This is especially problematic if a user shares private or confidential information with Copilot, as it may resurface in future chats, and even in chats with other users. All the free email addresses share the same Entra ID tenant (live.com) that is administered by Microsoft. In Microsoft’s own words, the Copilot Pro subscription is “not intended for use with work data or files” – even if it is associated with a paid Office 365 Personal or Family subscription.

Only work or school accounts associated with an Entra ID directory and a Microsoft 365 license are covered by enterprise-grade data protection where the inputs and outputs are never used to train the foundation AI models. This is why only Microsoft 365 Copilot can ground its outputs based on your OneDrive for Business files, Outlook emails, and Teams conversations – as the data is kept strictly within the same M365 tenancy. Also, the work version of Copilot provides the possibility of tagging documents with “sensitivity labels” in OneDrive for Business and SharePoint – to exclude sensitive or confidential documents from being used with AI for retrieval augmented generation (RAG).

Due to the exceptionally high privacy risks, the personal version of Copilot does not automatically use files in OneDrive, Outlook, or Teams for RAG. However, it’s possible for users signed in with their Outlook or Hotmail accounts to leak data by highlighting and sharing it through the “Ask Copilot” context menu, or by copying & pasting it into Copilot outright.

The most private way to access GPT models — through an inference API

Believe it or not, there is a third approach that organizations can choose to access the latest AI models (Claude, Gemini, GPT) which is even more secure, and potentially more cost effective than ChatGPT Enterprise or Microsoft 365 Copilot. This is through integrating open source software with inference APIs such as the Azure OpenAI Service API, and its counterparts including Bedrock on AWS and Vertex AI on Google Cloud.

The Autoize™ Enterprise GPT Stack allows you to deploy your own ChatGPT-style frontend and RAG database in an Azure VNET leveraging the Azure AI API — providing even stronger security & controls than Microsoft 365 Copilot, through features including:

Data residency – inference endpoints are deployed in specific regions (e.g. Canada Central), thus guaranteeing the data residency of your information.
VNET/VPC isolation – chats & embeddings stored on databases in your own VNET/VPC
Single sign on (SSO) – with LDAP/AD, Entra ID (formerly Azure AD), or OAuth 2.0
External logging – to Langfuse, S3, Sentry, etc. to audit AI usage against AUPs
Moderation – to refuse inappropriate requests using models such as Llama Guard. (Additional licensing is required for moderation)
Rate limiting & quotas – to constrain model usage costs by organization, department, team
PII masking – through a local NER (Named Entity Recognition) model and/or Azure AI Language service before inputs are even sent to the completions model.

Enterprise GPT Stack - PII Redaction with Microsoft Presidio — Enterprise GPT Stack – PII Redaction with LibreChat, LiteLLM, and Microsoft Presidio

What is the Autoize™ Enterprise GPT Stack on Azure or AWS?

Our enterprise GPT stack is based on open source technologies including:

Chat Frontend and RAG API – LibreChat
Application Backend – MongoDB
Middleware Proxy – LiteLLM Proxy
PII Masking – Microsoft Presidio and AI Language Service
RAG Database – PostgreSQL with pgVector for embedding data

Compared to developing a custom AI application or integrating open source software on your own, using our pre-integrated stack will accelerate delivery of a private and compliant AI workspace to your users. These architectural diagrams illustrate sample deployments of the enterprise GPT stack in an Azure VNET or an AWS VPC by our AI and cloud architects:

Enterprise GPT architecture in Azure VNET

What’s more, is that through Microsoft’s investment in OpenAI, they are the only cloud provider that may host the proprietary GPT models on their own infrastructure, ensuring that your data is never accessed by OpenAI itself. This “ring fencing” protects your information much more strongly compared to using the OpenAI API directly at api.openai.com.

The Product Terms clearly state that the prompts & embeddings you use with Azure OpenAI Service are never used by Microsoft to train foundation models, nor to otherwise develop their own products. For a use case like this one where you’re building an internal AI tool for your employees and not a public-facing SaaS, you can also opt out of “abuse monitoring” to further limit Microsoft from temporarily retaining any data about your usage.

AWS has a similar arrangement with Anthropic for the Claude family of models. Likewise, AWS unequivocally states that “Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties.” When a model provider like Anthropic delivers a model to Amazon, the AWS Bedrock team spins up its own infrastructure in the AWS region to run a deep copy of the model. The model provider cannot access this AWS infrastructure at all, which creates a clear separation of concerns unlike for instance, using the Anthropic API directly through api.anthropic.com.

Microsoft 365 Copilot vs. Self-Hosted Enterprise GPT Stack – Costs

The Copilot add on to a Microsoft 365 license is a flat $30.00/user/month paid annually, based on the number of seats you have in your M365 subscription. With our Enterprise GPT stack which accesses the GPT models through the Azure OpenAI Service API, you pay only for the input-and-output tokens used — with no per-user monthly fee. Create as many users as you need, and even integrate with Entra ID for SSO (or OAuth 2.0 if you use Google Workspace).

Based on the published pricing of $0.15/million tokens (input) and $0.60/million tokens (output) for gpt-4o mini (a comparable model to gpt-3.5-turbo), it would be quite uncommon in most use cases for the average user to exceed the cost of a Copilot license in token usage. As a rule of thumb, 100 tokens represents about 75 English words. So, one million tokens is equivalent to roughly “20 novels or 1000 legal case briefs” according to this Medium article.

With our enterprise GPT stack, you could potentially save a lot in per-user costs ($360/user/year for Copilot), and only pay for the inferencing that you actually use.

A Multi-Cloud, Multi-Model, Multi-Inference Provider AI Stack

Furthermore, the Autoize™ Enterprise GPT Stack can be extended for use with other proprietary and open foundation models like Claude, LLaMA, or Mistral, and alternate inference providers including Amazon Bedrock and Google Vertex AI. The integration of the OpenAI API-compatible middleware simplifies adding new models in the future, and switching between inference providers — providing the ultimate in flexibility and no lock-in. For any open models such as LLaMA, you could even move from a managed inference service like Azure AI, to a provisioned GPU infrastructure by using LiteLLM and Ollama (or TGI or vLLM).

If your organization’s preferred cloud vendor is not Azure, the enterprise GPT stack can be deployed at any compatible Linux cloud provider which supports containers. Some customers require not only that the region of the datacenter is in the EU, but that the legal jurisdiction of the parent company also is. For these customers, we can recommend cloud providers such as OVHcloud (FR), Scaleway (FR), 1&1 IONOS (DE), and Infomaniak (CH) who have launched their own AI inference services in Europe.

AI Inference Services in Europe for Private Enterprise GPT

OVH AI Deploy (France) – https://www.ovhcloud.com/en/public-cloud/ai-deploy/
Scaleway Managed Inference (France) – https://www.scaleway.com/en/inference/
IONOS AI Model Hub (Germany) – https://cloud.ionos.com/managed/ai-model-hub
Infomaniak AI Tools (Switzerland) – https://www.infomaniak.com/en/hosting/ai-tools

Some of these European providers offer provisioned deployment of AI models on GPU instances, or managed endpoints for inference on-demand (per token), or both. The benefits of provisioning a dedicated GPU instance, typically, is that you can run your own custom trained models as opposed to a stock, foundation model. Our AI consultants would be pleased to help you choose the right option.

Also, note that the inference providers do not necessarily need to be in the same cloud as the rest of your infrastructure. For example, you could mix-and-match an enterprise GPT infrastructure hosted in Azure, with Amazon Bedrock to get access to the Claude models, or Vertex AI for the Gemini models. We can architect a custom solution on your behalf that incorporates all the models you would like in the LibreChat ChatGPT-style interface, and even integrate it with other software such as Nextcloud Assistant.

Features such as Context Chat and Context Write provide similar functionality to Microsoft 365 Copilot, by allowing you to select documents from your Nextcloud shares to use with AI models. Specifically, Context Chat lets you get answers from the information in a document by asking questions in natural language prompts, and Context Write can adapt your writing to the style of an selected document.

It’s time to get serious about giving your employees the modern AI tools that they need to be more productive, while mitigating the risk that they use unauthorized AI apps such as ChatGPT, Copilot in Bing and Edge, or Office 365 Personal in conjunction with your corporate data. Deploying a private AI workspace in a (virtual) private cloud is a win-win decision for both employees and the business. CIOs & executives can rest assured that the organization’s data is being properly governed, while employee satisfaction increases with the belief that IT is enabling them with the latest tools.