Building AI systems that handle Protected Health Information (PHI) requires careful architectural decisions from the very beginning. Retrofitting compliance is expensive, time-consuming, and often means starting over. This guide covers the key technical decisions you need to get right.
The Foundation: Cloud Architecture
Choosing Your Cloud Provider
All three major cloud providers (AWS, Azure, GCP) offer HIPAA-eligible services, but not all services within each provider are covered. Key considerations:
AWS: Sign a BAA that covers specific services. Use AWS GovCloud for extra isolation. Key services: SageMaker, Bedrock, S3, RDS, Lambda (all HIPAA-eligible).Azure: Microsoft's Healthcare APIs and Azure AI services are built with HIPAA in mind. Azure's HITRUST certification provides additional assurance.GCP: Google Cloud Healthcare API provides FHIR, HL7v2, and DICOM support natively. Vertex AI is HIPAA-eligible.Network Architecture
Your VPC design matters enormously:
Isolated subnets for PHI processing — no internet egress without explicit proxyPrivate endpoints for all cloud services — no data traversing the public internetNetwork segmentation between training and inference workloadsVPN or Direct Connect for on-premise EHR connectivityData Pipeline Design
Ingestion
PHI must be encrypted in transit (TLS 1.2+) and at rest (AES-256). But encryption alone isn't enough:
Implement **field-level encryption** for sensitive identifiersUse **tokenization** to replace PHI with reversible tokens during processingBuild **data lineage tracking** so you can answer "where did this patient's data go?" at any timeDe-identification for Model Training
Whenever possible, train models on de-identified data. Follow the HIPAA Safe Harbor method (remove all 18 identifier types) or the Expert Determination method:
Use NLP-based PII detection as a first passApply rule-based scrubbing for structured fieldsValidate with statistical re-identification risk analysisMaintain a **de-identification audit log**Feature Engineering
When building features from clinical data:
Never use raw PHI as model featuresAggregate and bin continuous values (age ranges, not exact DOB)Use clinical concept embeddings rather than raw textDocument every feature's PHI lineageModel Training and Deployment
Training Environment
Use **ephemeral compute** — training instances should be destroyed after each runStore models in **versioned, encrypted repositories**Log all training runs with full hyperparameter and data provenanceImplement **model cards** documenting intended use, limitations, and bias evaluationsInference Architecture
Deploy behind **API gateways** with authentication and rate limitingImplement **input/output logging** for audit trails (encrypted, access-controlled)Use **model versioning** so you can trace any prediction back to a specific model versionBuild **circuit breakers** that fail safely when the model is unavailableAudit and Monitoring
Access Controls
Implement **role-based access control (RBAC)** at every layerUse **attribute-based access control (ABAC)** for fine-grained PHI accessLog every access to PHI data — who, what, when, whyImplement **break-the-glass** procedures for emergency accessContinuous Monitoring
Monitor for **data drift** that could indicate PHI leakageAlert on **unusual access patterns** (potential insider threats)Run regular **penetration testing** on AI endpointsConduct **quarterly access reviews**The Compliance Checklist
Before going to production, verify:
BAA signed with all cloud providers and subprocessorsPHI data flow diagram documented and reviewedEncryption at rest and in transit for all PHIAccess controls tested and audit logs verifiedDe-identification pipeline validatedIncident response plan that covers AI-specific scenariosModel monitoring for bias and driftDocumentation sufficient for OCR auditKey Takeaway
HIPAA-compliant AI architecture isn't about adding security on top — it's about designing the system so that compliance is the default state. Every architectural decision should make it harder to accidentally expose PHI, not easier.