FREE!! Open Source - A Technical Recruiter's Daily Wiki
← Back to dashboard Login
Technical Recruiter Reference · Cloud Technologies · 2025–2026

Amazon AWS · Microsoft Azure
Google Cloud — Complete Cloud Guide

A plain-English explainer of every major cloud technology, service, and concept appearing in cloud engineering job descriptions — written for technical recruiters, not engineers.

50+Cloud Services
3Cloud Providers
60+Interview Qs
80+Glossary Terms
☁️

Cloud Computing 101

// everything a recruiter needs to understand before the technical details

What is "The Cloud"? Foundation
// the foundation of all modern IT infrastructure
🗣️ In plain EnglishThe cloud is simply someone else's computer — accessed over the internet. Instead of a company buying expensive physical servers, they rent computing power, storage, and software from massive data centers owned by Amazon, Microsoft, or Google. You only pay for what you use, scale up or down instantly, and never have to buy or maintain hardware. The cloud is the backbone of every modern app — Netflix, Spotify, Instagram, and your company's own internal tools all run in the cloud.

Cloud computing is described by three delivery models that every recruiter should know: IaaS (Infrastructure as a Service — you get virtual machines, storage, networking; you manage the OS and apps); PaaS (Platform as a Service — the cloud manages infrastructure and OS; you just deploy your code); SaaS (Software as a Service — fully managed app accessed via browser, like Gmail, Salesforce). The "Big Three" providers — AWS (30% market share), Azure (21–25%), and Google Cloud (11–13%) — collectively control 65%+ of global cloud spending.

// the cloud service model pyramid — who manages what
IaaS VMs · Storage · Networking AWS EC2 · Azure VMs · GCE PaaS Managed Platform You: just deploy code Heroku · App Engine Azure App Service SaaS Complete Managed App Access via browser Gmail · Salesforce Office 365 · Slack Zoom · Workday You manage: OS, app, data You manage: app + data You manage: nothing
Public Cloud Model
// AWS · Azure · Google Cloud
🗣️ In plain EnglishPublic cloud is like renting an apartment in a big apartment building. Amazon, Microsoft, or Google owns the building (data centers), and thousands of companies rent space on the same shared infrastructure. You get your own private virtual space, but you don't see your neighbors. It's cost-effective, instantly scalable, and maintained by the cloud provider 24/7.

The public cloud is owned and operated by a third-party provider (AWS, Azure, GCP) and delivered over the internet. Resources are shared across multiple customers ("multi-tenant") but logically isolated for security. Key advantages: no upfront hardware investment, pay only for what you use, global availability in minutes, automatic updates and patching. Over 95% of organizations use public cloud in some capacity. AWS S3, Azure Virtual Machines, and Google BigQuery are examples of public cloud services.

Private Cloud Model
// VMware · OpenStack · on-premises data center
🗣️ In plain EnglishPrivate cloud is like owning a house instead of renting an apartment. The company owns the hardware (or rents dedicated servers) and runs cloud-like software on top. Only that company uses the infrastructure — nothing is shared. Banks, hospitals, defense contractors, and governments often use private cloud because regulations require keeping data in-house or on dedicated hardware.

Private cloud gives organizations full control over their infrastructure, data residency, and security posture — at higher cost. Hardware is either owned on-premises or collocated in a third-party data center. Technologies: VMware vSphere, OpenStack, Microsoft Azure Stack HCI, and Red Hat OpenShift. Key use cases: highly regulated industries (healthcare/HIPAA, finance/PCI-DSS, defense/FedRAMP), data sovereignty requirements, ultra-low latency applications. Disadvantage: significant capital expenditure and specialized staff required.

Hybrid & Multi-Cloud Model
// Azure Arc · AWS Outposts · Google Anthos
🗣️ In plain EnglishHybrid cloud is having both a house (private cloud) and an apartment (public cloud), using each for what it's best at. Sensitive payroll data stays in your own data center; burst traffic from a viral marketing campaign spills into AWS. Multi-cloud means using more than one public cloud provider simultaneously — using AWS for some workloads and Azure for others to avoid dependence on a single vendor.

Hybrid cloud is increasingly the enterprise norm — over 80% of large organizations run hybrid environments. It enables "cloud bursting" (overflow to public cloud when private capacity is exceeded) and data residency compliance. Multi-cloud (using AWS + Azure + GCP simultaneously) prevents vendor lock-in and lets organizations use "best of breed" services from each provider. Tools: AWS Outposts (AWS hardware in your data center), Azure Arc (manage any infrastructure from Azure), Google Anthos (Kubernetes anywhere). Multi-cloud increases complexity — a senior cloud architect who manages multi-cloud is highly valued.

Core Cloud Concepts Foundation
// concepts that appear in every cloud job description
🗣️ In plain EnglishBefore hiring cloud engineers, a recruiter needs to know these universal terms. They appear across all three clouds — understanding them helps you evaluate any cloud résumé, regardless of provider preference.

Region — a physical geographic area with data centers (e.g., "US East"). Availability Zone (AZ) — physically separate data centers within one region for redundancy. Virtual Machine (VM) — a simulated computer running in the cloud. Container — a lightweight package containing an app and its dependencies. Kubernetes (K8s) — the industry-standard system for running many containers at scale. Serverless — code runs in response to events with no server management. IAM — Identity and Access Management (who can do what). VPC/VNet — a private network within the cloud. DevOps — combining development and operations for faster, automated releases. CI/CD — automated pipelines that test and deploy code continuously.

Cloud Certifications Career
// AWS SAA · AZ-900 · GCP ACE · professional certs
🗣️ In plain EnglishCloud certifications are like professional licenses that prove a person has passed an official exam from Amazon, Microsoft, or Google. They're a strong screening tool — they guarantee a baseline level of proven knowledge that self-described experience alone can't. For entry-level cloud roles, a certification is often required; for senior roles, certifications combined with real project experience are the gold standard.

AWS: Cloud Practitioner (CLF-C02) — entry; Solutions Architect Associate (SAA-C03) — most popular; Solutions Architect Professional (SAP-C02); DevOps Engineer Professional. Azure: AZ-900 Fundamentals — entry; AZ-104 Administrator; AZ-305 Architect; AZ-400 DevOps. GCP: Cloud Digital Leader — entry; Associate Cloud Engineer (ACE) — most common; Professional Cloud Architect. A "trifecta" (all three provider certs) signals exceptional breadth. AWS SAA is the most sought-after cloud cert in job postings worldwide.

🟠

Amazon Web Services (AWS)

// the world's largest cloud platform · 30%+ market share · 200+ services · launched 2006

Amazon Web Services — The Cloud Pioneer

AWS is the oldest and largest cloud provider, with the broadest service catalog (200+ services), 38 global regions, and the largest ecosystem of third-party tools. AWS is the default choice for startups (Netflix, Airbnb, Airbnb) and many enterprises due to its proven reliability, massive community, and widest selection of compute, storage, AI/ML, and database services. An AWS-fluent engineer typically commands a premium salary and has the broadest employability of any cloud specialty.

🏆 Market Leader: ~30% global share 38 Regions · 120+ AZs worldwide 200+ cloud services Best For: Broadest requirements
Amazon EC2 Compute
// Elastic Compute Cloud · virtual machines in the cloud
🗣️ In plain EnglishEC2 is the virtual computer you rent from Amazon. Instead of buying a physical server, you pick a size (from a tiny 1-CPU test machine to a massive 192-CPU monster), launch it in seconds, install whatever you want, and pay by the hour. When you're done, you turn it off and stop paying. Netflix, NASA, and thousands of companies run their entire infrastructure on EC2.

EC2 is the foundational AWS service — nearly every AWS architecture uses it. Instance types: t3/t4g (general purpose, cheap), c6i/c7g (compute-optimized), r6i (memory-optimized), p3/p4d (GPU for AI/ML). Pricing: On-Demand (pay per hour), Reserved Instances (1–3 year discount up to 72%), Spot Instances (spare capacity, up to 90% discount). Auto Scaling groups automatically add/remove EC2 instances based on demand. A cloud engineer who can't configure and manage EC2 is a junior hire at best.

// how EC2 instances fit into a web application
Users Internet Load Balancer ELB EC2 web-01 web-02 web-03 auto-scales RDS Database managed PostgreSQL
Amazon S3 Storage
// Simple Storage Service · object storage · 99.999999999% durability
🗣️ In plain EnglishS3 is Amazon's unlimited online file cabinet. You store any type of file — images, videos, backups, code, documents — in "buckets" (folders) and access them from anywhere in the world via URL. It virtually never loses data (Amazon guarantees 11 nines of durability — that's 99.999999999%), and you pay per gigabyte stored. Every AWS architecture uses S3; it's the single most-used AWS service worldwide.

S3 stores "objects" (files) in "buckets" (containers). Storage classes optimize cost: S3 Standard (frequently accessed), S3 Intelligent-Tiering (auto-moves objects), S3 Glacier (archive, cents per GB/month). S3 is used for: static website hosting, data lake foundation, ML training data, application backups, user-uploaded content. Key features: versioning, lifecycle policies, cross-region replication, bucket policies/ACLs (access control), and S3 Transfer Acceleration. S3 powers the internet — AWS, Netflix, Airbnb, and millions of apps use it as their primary file store.

AWS Lambda Serverless
// event-driven serverless functions · pay per invocation
🗣️ In plain EnglishLambda is code that runs on-demand without any server management. You write a function (a small piece of code), upload it to Lambda, and Amazon runs it whenever it's triggered — by a web request, a new file in S3, a scheduled time, or hundreds of other events. You only pay for the milliseconds your code actually runs. No servers to manage, no maintenance, no idle costs.

Lambda is the cornerstone of "serverless architecture" on AWS. Supports: Python, Node.js, Java, Go, C#, Ruby. Common patterns: image processing (trigger on S3 upload), REST APIs (API Gateway + Lambda), scheduled jobs (CloudWatch Events → Lambda), data pipeline transforms. Lambda functions can scale from zero to thousands of concurrent executions automatically. Cold starts (brief delay when first invoked) are the main drawback. SAM (Serverless Application Model) and the Serverless Framework simplify Lambda development. Lambda on a résumé signals modern cloud-native development experience.

Amazon RDS & Aurora Database
// managed relational databases · auto-patching · backups
🗣️ In plain EnglishRDS is a managed database service — Amazon runs, patches, backs up, and monitors the database for you. Instead of a team of DBAs maintaining database servers, you click a few buttons and get a fully managed PostgreSQL, MySQL, or SQL Server database. Aurora is Amazon's own database engine that's up to 5x faster than standard MySQL and 3x faster than PostgreSQL — at a fraction of the effort of self-managing.

RDS supports: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server. Amazon Aurora is MySQL/PostgreSQL-compatible and scales storage automatically to 128 TB. Key features: Multi-AZ deployments (automatic failover if a data center fails), Read Replicas (horizontal scaling for read-heavy apps), automated backups and point-in-time recovery. DynamoDB is AWS's proprietary NoSQL database — massively scalable key-value store used by Amazon.com itself. A cloud engineer who has worked with RDS + DynamoDB covers both the relational and NoSQL database needs of most enterprise applications.

AWS VPC & Networking Networking
// Virtual Private Cloud · subnets · security groups · Route 53
🗣️ In plain EnglishA VPC (Virtual Private Cloud) is your private, fenced-off section of Amazon's cloud. Inside your VPC, you control the network — which servers can talk to which, what traffic comes in from the internet, what stays completely private. It's like having your own private data center inside Amazon's massive infrastructure, with your own IP addresses, firewalls, and routing rules.

VPC fundamentals: Subnets (public subnets face the internet; private subnets are isolated); Security Groups (virtual firewalls for individual resources — stateful); NACLs (Network Access Control Lists — subnet-level firewall, stateless); Internet Gateway (connects VPC to internet); NAT Gateway (lets private subnet resources access internet without being publicly accessible); Route 53 (AWS DNS — domain name management and traffic routing); CloudFront (CDN — delivers content globally from edge locations near users); Direct Connect (dedicated private fiber from on-premises to AWS).

AWS IAM & Security Security
// Identity & Access Management · least privilege · KMS · GuardDuty
🗣️ In plain EnglishIAM is the permissions system for your entire AWS account. It controls who (users, services, applications) can do what (read, write, delete) on which resources (S3 buckets, EC2 instances, databases). Misconfigured IAM is the #1 cause of cloud security breaches — an engineer who deeply understands IAM is invaluable. "Least privilege" means giving each user/service only the minimum permissions they need — nothing more.

IAM concepts: Users, Groups, Roles (identities that can be assumed by services or users), Policies (JSON documents defining permissions). Key services: KMS (Key Management Service — encrypts data at rest); Secrets Manager (stores API keys and passwords securely); GuardDuty (threat detection, monitors for anomalous activity); Security Hub (aggregates security findings); WAF (Web Application Firewall — blocks common web attacks). AWS Well-Architected Framework's Security Pillar is the standard reference. AWS shared responsibility model: Amazon secures the cloud infrastructure; customers secure what they put in the cloud.

AWS AI/ML — SageMaker & Bedrock AI/ML
// machine learning platform · generative AI · model deployment
🗣️ In plain EnglishSageMaker is Amazon's all-in-one machine learning workbench. Data scientists use it to build, train, and deploy AI models without managing servers. Amazon Bedrock is newer — it gives developers access to powerful foundational AI models (like Anthropic's Claude, Meta's Llama, and Amazon's Titan) through a simple API, enabling them to build AI-powered apps without training models from scratch.

SageMaker Studio is the integrated ML development environment. SageMaker provides: data labeling (Ground Truth), model training at scale, automated ML (AutoML), model deployment (endpoints), MLOps pipelines, and model monitoring. Amazon Bedrock (launched 2023) is the fastest-growing AWS service — it provides access to foundation models as an API. Rekognition (image/video analysis), Polly (text-to-speech), Lex (chatbots), Comprehend (NLP) are pre-trained AI services that require no ML expertise. An AWS cloud engineer who also has SageMaker/Bedrock experience commands the highest salaries in the market.

CloudFormation & Terraform DevOps
// Infrastructure as Code · CloudFormation · CDK · Terraform
🗣️ In plain EnglishInfrastructure as Code (IaC) means describing your entire cloud infrastructure (servers, databases, networking) in code files — then running that code to automatically create everything. Instead of clicking through the AWS console to set up 50 resources, an engineer writes a template that creates all 50 simultaneously and consistently. CloudFormation is Amazon's own IaC tool; Terraform is a vendor-neutral alternative used across all three clouds — it's the most popular IaC tool in the industry.

CloudFormation templates define AWS infrastructure in JSON or YAML. AWS CDK (Cloud Development Kit) lets engineers write CloudFormation using Python, TypeScript, or Java — increasingly preferred over raw YAML. Terraform (by HashiCorp) uses HCL (HashiCorp Configuration Language) and works across AWS, Azure, and GCP — a cloud engineer fluent in Terraform is valuable to any organization regardless of which cloud they use. IaC enables: version-controlled infrastructure, reproducible environments, disaster recovery automation, and team collaboration on infrastructure.

🔵

Microsoft Azure

// the enterprise cloud · 21–25% market share · 60+ compliance certifications · launched 2010

Microsoft Azure — The Enterprise Cloud

Azure is the cloud for organizations already invested in Microsoft technology. Its seamless integration with Windows Server, Active Directory, SQL Server, Office 365, and Teams makes it the natural choice for large enterprises, government agencies, healthcare systems, and financial institutions. Azure has the most compliance certifications (100+) of any cloud provider, making it the preferred choice for regulated industries. Azure's partnership with OpenAI gives enterprises exclusive access to GPT-4o and other cutting-edge AI models within Azure's security boundary.

🏢 Enterprise Leader: ~21-25% share 60+ global regions 100+ compliance certifications Best For: Microsoft shops, regulated industries
Azure Virtual Machines Compute
// Azure's equivalent of AWS EC2 · Windows + Linux VMs
🗣️ In plain EnglishAzure VMs are the cloud computers you rent from Microsoft. Unlike AWS (which historically favored Linux), Azure has always had exceptional Windows Server support — critical for enterprises running Microsoft workloads like .NET applications, SQL Server, and Active Directory. Azure VMs are the starting point for any cloud migration of existing Windows infrastructure.

Azure VM series: B-series (burstable/dev/test), D-series (general purpose), E-series (memory-optimized), F-series (compute-optimized), N-series (GPU — AI/ML). Azure Hybrid Benefit: customers running Windows Server or SQL Server on-premises can reuse their existing licenses on Azure VMs — saving up to 49% compared to paying list price. Azure VM Scale Sets automatically adjust the number of VMs based on demand. Azure Spot VMs provide up to 90% discount for interruptible workloads.

Microsoft Entra ID (Azure AD) Identity
// enterprise identity · SSO · MFA · conditional access
🗣️ In plain EnglishMicrosoft Entra ID (formerly Azure Active Directory) is the login system for corporate Microsoft environments. When an employee opens their laptop and logs in with their company email, that's Entra ID verifying who they are and what they're allowed to access. It's like a corporate keycard system that works for apps (Office 365, Salesforce, Slack) as well as computers — one login gets you into everything you're authorized for.

Entra ID is Azure's #1 differentiator for enterprise customers — virtually every large organization uses Active Directory on-premises, and Entra ID extends this to the cloud. Key features: SSO (Single Sign-On — one login for thousands of apps); MFA (Multi-Factor Authentication); Conditional Access (block login from unknown locations); PIM (Privileged Identity Management — just-in-time elevated access); B2C (customer identity for consumer-facing apps). Seeing "Azure AD/Entra ID" on a résumé signals enterprise identity expertise — one of the most in-demand Azure skills.

Azure DevOps & GitHub Actions DevOps
// CI/CD · Repos · Pipelines · Boards · Artifacts
🗣️ In plain EnglishAzure DevOps is Microsoft's all-in-one development platform — it combines a code repository (like GitHub), a task board (like Jira), automated build/test/deploy pipelines, and package management. When developers commit code, Azure Pipelines automatically runs tests and deploys to production. GitHub (owned by Microsoft since 2018) and GitHub Actions are increasingly the preferred choice, while Azure DevOps is used in enterprises with existing Microsoft investments.

Azure DevOps components: Boards (Agile project management); Repos (Git code hosting); Pipelines (CI/CD automation — YAML-based); Artifacts (package management); Test Plans (QA management). GitHub Actions is replacing Azure Pipelines for new projects. Azure DevOps + Azure Kubernetes Service is a common enterprise pattern for containerized application deployment. A candidate listing "Azure DevOps Pipelines" signals professional CI/CD experience in the Microsoft ecosystem.

Azure SQL & Cosmos DB Database
// managed SQL Server · globally distributed NoSQL
🗣️ In plain EnglishAzure SQL is SQL Server in the cloud — Microsoft manages all the infrastructure, patches, and backups. For organizations already running SQL Server on-premises, migrating to Azure SQL is the most natural cloud journey with minimal code changes. Cosmos DB is Microsoft's planet-scale NoSQL database, distributed across data centers worldwide, capable of serving data with millisecond response times regardless of where users are located.

Azure SQL Family: Azure SQL Database (fully managed), Azure SQL Managed Instance (near-100% SQL Server compatibility), Azure Database for PostgreSQL/MySQL (open-source options). Cosmos DB is a multi-model database supporting multiple APIs: SQL, MongoDB, Cassandra, Gremlin, Table. It's the best choice for globally distributed apps requiring consistent low latency. Key Azure data services: Synapse Analytics (data warehouse + analytics), Data Factory (ETL pipeline), Databricks (big data + ML). Azure Data Engineer is one of the most in-demand cloud roles, driven by data governance and compliance requirements.

Azure OpenAI & AI Services AI/ML
// GPT-4o · Copilot · Azure AI Foundry · Cognitive Services
🗣️ In plain EnglishAzure OpenAI Service gives enterprises access to the same AI models that power ChatGPT (GPT-4o, DALL-E 3) — but hosted within Microsoft's secure, compliant Azure environment. This is Azure's fastest-growing and most differentiating service: regulated industries (banking, healthcare, government) that can't send data to OpenAI's public API can use Azure OpenAI within their own compliance boundary. Microsoft's Copilot is built on this foundation.

Azure AI portfolio: Azure OpenAI Service (GPT-4o, DALL-E, Whisper within Azure); Azure AI Foundry (build/fine-tune/deploy AI models at scale); Azure Machine Learning (MLOps, model training, deployment); Cognitive Services (pre-built AI — vision, speech, language, translation); Azure AI Search (intelligent search with RAG/embeddings). Microsoft Copilot integrations: Copilot for Microsoft 365, Copilot Studio (build custom AI assistants), GitHub Copilot. Azure AI cloud engineers are among the highest-compensated cloud specialists in 2025–2026.

Azure Arc & Hybrid Cloud Hybrid
// manage any infrastructure from Azure · Azure Stack HCI
🗣️ In plain EnglishAzure Arc lets companies manage servers, Kubernetes clusters, and databases — wherever they run (on-premises, AWS, GCP, or edge locations) — all from the Azure portal. It's like having a single remote control for your entire IT estate regardless of where things live. This makes Azure the #1 choice for enterprises managing hybrid environments during their cloud migration journey.

Azure's hybrid dominance stems from: Azure Arc (unified management), Azure Stack HCI (run Azure services on your own hardware), Azure Migrate (assessment and migration tools), Azure Site Recovery (disaster recovery). Microsoft's enterprise relationships mean most large organizations that aren't "cloud-native" chose Azure for their hybrid journey. An Azure specialist with hybrid cloud experience — particularly Azure Arc and Azure Stack — is extremely valuable to large enterprises mid-migration.

🔴

Google Cloud Platform (GCP)

// the AI-native cloud · 11–13% market share · fastest growing · BigQuery · Kubernetes · Vertex AI

Google Cloud Platform — The AI & Data Cloud

Google Cloud is the cloud for AI-first and data-driven organizations. Google invented the Transformer architecture that powers all modern AI (including ChatGPT), and this lineage is evident — GCP's AI tooling (Vertex AI with Gemini) is more deeply integrated than any other provider. BigQuery (Google's data warehouse) is widely considered the best-in-class for large-scale analytics, processing petabytes of data in seconds. Google's private fiber network makes it the fastest and lowest-latency cloud globally. GCP is the fastest-growing of the Big Three, fueled by enterprise AI adoption.

🤖 AI-Native: Gemini + Vertex AI 40 regions worldwide World's fastest private network Best For: AI, data analytics, Kubernetes
Google Compute Engine Compute
// GCP's virtual machines · sustained use discounts
🗣️ In plain EnglishGoogle Compute Engine (GCE) is Google's virtual machine service — the GCP equivalent of AWS EC2 and Azure VMs. Google's key differentiator: Sustained Use Discounts (SUDs) automatically apply up to 30% off for VMs that run for more than 25% of a month — no contracts required, unlike AWS or Azure's Reserved Instances which require 1–3 year commitments.

GCE machine types: E2 (general purpose, cheapest), N2/N2D/C2 (balanced and compute-optimized), M2/M3 (memory-optimized), A2/A3 (GPU/TPU for AI). Google's custom machine types let you specify exact CPU/RAM ratios — only pay for what you need. Google Cloud Spot VMs (equivalent to AWS Spot) offer 60–91% discount for interruptible workloads. Google's network performance is notable — VM-to-VM communication within a region is extremely fast due to Google's private backbone. App Engine (PaaS) lets developers deploy apps without managing VMs at all.

BigQuery Analytics
// serverless data warehouse · petabyte-scale SQL analytics
🗣️ In plain EnglishBigQuery is Google's data warehouse — a database that can analyze trillions of rows of data in seconds using regular SQL. Unlike traditional databases that require extensive setup and tuning, BigQuery is serverless: you just run your query and pay per terabyte processed. It's used by data analysts and data scientists to answer business questions like "how many users purchased in the last 30 days broken down by country, device, and age group" — across billions of events instantly.

BigQuery is widely considered the best data warehouse in the cloud market — its advantages include: serverless (no cluster management), columnar storage (extremely fast for analytics queries), automatic query optimization, built-in machine learning (BigQuery ML — run ML models directly in SQL), real-time streaming ingest, and Omni (query data across AWS and Azure S3-compatible storage without moving it). BigQuery consistently wins performance benchmarks against AWS Redshift and Azure Synapse. A GCP Data Engineer without BigQuery experience is a major gap.

Vertex AI & Gemini AI/ML
// Google's unified AI platform · Gemini · MLOps
🗣️ In plain EnglishVertex AI is Google's all-in-one AI/ML platform. It's where data scientists build, train, and deploy machine learning models at Google scale. Gemini (Google's most powerful AI model) is accessible through Vertex AI — it's Google's answer to OpenAI's ChatGPT and Microsoft's Azure OpenAI Service. Google invented the Transformer architecture that all modern AI is built on, and Vertex AI puts that research directly in developers' hands.

Vertex AI provides: AutoML (train models without code), custom model training, model evaluation and deployment, Model Garden (access to 130+ foundation models including Gemini), Pipelines (MLOps), Feature Store (shared feature engineering), Colab Enterprise (Jupyter notebooks with enterprise features). Google's AI advantages: TPUs (Tensor Processing Units — custom chips optimized for ML, 10x faster than GPUs for certain workloads), Gemini 1.5 Pro (longest context window in the industry — 1M tokens), and AI Search (RAG-powered enterprise search). GCP AI engineers are among the most in-demand cloud professionals globally.

Google Kubernetes Engine (GKE) Containers
// Google created Kubernetes · best-in-class K8s service
🗣️ In plain EnglishKubernetes is the industry-standard system for running and managing containerized applications at scale — think of it as an intelligent traffic director that automatically decides where to run thousands of containers across a fleet of servers. Google invented Kubernetes in 2014 and donated it to the open-source community. GKE (Google Kubernetes Engine) is the managed Kubernetes service — Google runs the Kubernetes control plane so engineers focus on their applications, not infrastructure.

All three clouds offer managed Kubernetes: AWS EKS, Azure AKS, Google GKE. GKE is widely considered the most mature and production-ready managed Kubernetes service, with features like Autopilot (Google manages all nodes), automatic security patching, and native integration with Google's private network. Kubernetes skills are cloud-agnostic and transfer between all providers — a K8s expert is valuable regardless of which cloud an organization uses. Cloud Run is GCP's serverless container service — deploy containers without managing Kubernetes, similar to AWS Fargate.

Cloud Storage & GCP Networking Storage
// Cloud Storage · VPC · Cloud CDN · Cloud Armor
🗣️ In plain EnglishGoogle Cloud Storage is the GCP equivalent of AWS S3 — unlimited object storage for files, images, backups, and data. Google's private global fiber network is one of its most significant competitive advantages: data travels on Google's own cables (not the public internet), resulting in dramatically lower latency and higher reliability. When a Netflix video plays smoothly for a user in Tokyo while a user in São Paulo, that's often Google's network doing the work.

GCP storage: Cloud Storage (object), Persistent Disk (block — SSDs for VMs), Filestore (managed NFS file storage). Networking: Global VPC (single VPC spans all regions — unique to Google), Cloud CDN (content delivery network on Google's global edge), Cloud Load Balancing (global anycast load balancing — routes users to nearest healthy backend automatically), Cloud Armor (DDoS protection and WAF), Cloud Interconnect (dedicated fiber to GCP). Google's private network (one of the largest in the world) processes 40% of global internet traffic — GCP customers benefit from this infrastructure directly.

Google Anthos & Multi-Cloud Multi-Cloud
// run workloads on any cloud · Kubernetes everywhere
🗣️ In plain EnglishAnthos is Google's platform for running applications on any cloud or on-premises environment consistently. A company using Anthos can run the same containerized app on GCP, AWS, Azure, and their own data center — all managed and monitored through a single Google Cloud console. It's Google's answer to Azure Arc, and it's particularly powerful for organizations standardizing on Kubernetes across a multi-cloud strategy.

Anthos features: multi-cluster Kubernetes management, service mesh (Istio) for microservices communication, centralized policy enforcement across clouds, unified monitoring and logging. Google's differentiation in multi-cloud: as a smaller market-share provider, Google has been aggressive about supporting interoperability — Anthos, BigQuery Omni, and AlloyDB Omni all run outside GCP. A GCP engineer with Anthos experience is rare and valuable. For organizations hedging on vendor lock-in, GCP + Anthos is a compelling architectural choice.

⚖️

AWS vs. Azure vs. Google Cloud — Side by Side

// the definitive comparison for recruiters evaluating multi-cloud candidates

Category 🟠 Amazon AWS 🔵 Microsoft Azure 🔴 Google Cloud
Market Share (2025)~30% — clear leader~21–25% — #2 fast growing~11–13% — fastest growth
Best ForBroadest requirements, startups, general enterpriseMicrosoft shops, regulated industries, hybrid cloudAI/ML, data analytics, Kubernetes-first
Compute (VMs)EC2 (200+ instance types)Azure Virtual Machines (+ Hybrid Benefit)GCE (sustained use discounts auto-apply)
Object StorageS3 — the gold standardAzure Blob StorageCloud Storage
ServerlessLambda — most matureAzure FunctionsCloud Functions / Cloud Run
Managed KubernetesEKS (Elastic Kubernetes Service)AKS (Azure Kubernetes Service)GKE — Google invented K8s, most mature
Relational DatabaseRDS / AuroraAzure SQL / SQL Managed InstanceCloud SQL / AlloyDB
NoSQL DatabaseDynamoDBCosmos DB (multi-model)Firestore / Bigtable
Data WarehouseRedshiftAzure Synapse AnalyticsBigQuery — widely considered #1
AI/ML PlatformSageMaker + BedrockAzure ML + Azure OpenAI (GPT-4o)Vertex AI + Gemini — AI-native
Identity & AccessIAM + CognitoMicrosoft Entra ID — enterprise leaderCloud IAM + Identity Platform
Hybrid CloudAWS Outposts / ECS AnywhereAzure Arc + Azure Stack — market leaderGoogle Anthos
CDNCloudFrontAzure Front Door / CDNCloud CDN (Google's global network)
DevOps / CI/CDCodePipeline / CodeBuildAzure DevOps + GitHub ActionsCloud Build / Cloud Deploy
IaC (Infrastructure)CloudFormation / CDKARM Templates / BicepDeployment Manager / Terraform
MonitoringCloudWatch + X-RayAzure Monitor + Application InsightsCloud Monitoring + Cloud Trace
NetworkVPC + Route 53VNet + Azure DNSGlobal VPC (single spans all regions)
Pricing ModelOn-Demand, Reserved (1–3 yr), SpotPAYG + Reserved + Hybrid BenefitPAYG + Sustained Use (automatic)
Compliance Certs100+ (FedRAMP, HIPAA, SOC)100+ — most for regulated industries100+ (strong for GDPR, ISO)
Entry CertAWS Cloud Practitioner (CLF-C02)Azure Fundamentals (AZ-900)Cloud Digital Leader
Key ClientsNetflix, NASA, Airbnb, Capital OneBMW, H&M, NASDAQ, NHSSpotify, PayPal, Twitter/X, HSBC

Service Name Cross-Reference — The Same Thing, Different Names

What It Does 🟠 AWS Name 🔵 Azure Name 🔴 GCP Name
Virtual MachinesEC2Virtual Machines (VM)Compute Engine (GCE)
File/Object StorageS3Blob StorageCloud Storage (GCS)
Virtual Private NetworkVPCVirtual Network (VNet)Virtual Private Cloud (VPC)
Serverless FunctionsLambdaAzure FunctionsCloud Functions
Managed KubernetesEKSAKSGKE
Managed SQL DBRDSAzure SQL DatabaseCloud SQL
DNS ServiceRoute 53Azure DNSCloud DNS
IAM / Access ControlIAMAzure RBAC / Entra IDCloud IAM
Load BalancerELB (ALB/NLB)Azure Load BalancerCloud Load Balancing
CDNCloudFrontAzure Front Door / CDNCloud CDN
Monitoring/LoggingCloudWatchAzure MonitorCloud Monitoring / Logging
Message QueueSQS / SNSService Bus / Event GridPub/Sub
AI/ML PlatformSageMakerAzure Machine LearningVertex AI
IaC (Infrastructure)CloudFormationARM Templates / BicepDeployment Manager
📖

Cloud Technologies Glossary

// 80+ cloud terms decoded for non-technical recruiters

Availability Zone (AZ)A physically separate data center within a cloud region. Running across multiple AZs protects applications if one data center fails.
Auto ScalingAutomatically adds or removes cloud resources (VMs, containers) based on demand — more traffic = more servers, less traffic = fewer servers and lower cost.
CDN (Content Delivery Network)A network of servers distributed globally that deliver web content from locations close to users — reducing load times dramatically. AWS CloudFront, Azure Front Door, Google Cloud CDN.
CI/CDContinuous Integration / Continuous Delivery — automated pipelines that test, build, and deploy code changes without human intervention. The backbone of modern DevOps.
Cloud MigrationMoving an organization's applications, data, and infrastructure from on-premises data centers to cloud providers. The "lift-and-shift" approach is fastest; "cloud-native rewrite" is most optimized.
Cloud-NativeApplications designed and built specifically to run in the cloud — using microservices, containers, and serverless rather than adapting existing on-premises software.
ContainerA lightweight package containing an application and all its dependencies — runs identically on any computer. Docker is the standard container format; Kubernetes orchestrates many containers.
CapEx vs OpExCapital Expenditure (buying servers upfront) vs. Operational Expenditure (paying monthly for cloud). Cloud converts CapEx to OpEx — a key financial benefit for CFOs.
DevOpsA cultural and technical practice combining software development and IT operations to deliver applications faster and more reliably. Cloud engineers and DevOps engineers often overlap.
DevSecOpsDevOps with security integrated from the start — security is "shifted left" into the development process rather than added at the end.
DockerThe most popular container platform — packages applications into portable containers. Every cloud engineer should know Docker. Kubernetes orchestrates these containers at scale.
Edge ComputingProcessing data close to where it's generated (at the "edge") rather than sending it to a central data center. Critical for IoT, autonomous vehicles, and real-time applications.
ElasticityThe ability to automatically scale resources up or down based on demand — a core cloud advantage. "Elastic" in Amazon EC2 and Elastic Load Balancer reflects this property.
FaaS (Function as a Service)Serverless computing where individual functions run in response to events. AWS Lambda, Azure Functions, and Google Cloud Functions are the major examples.
FinOpsThe practice of managing and optimizing cloud spending. As cloud costs grow, FinOps engineers help organizations understand and reduce their cloud bills — a fast-growing specialty.
GCPGoogle Cloud Platform — Google's cloud computing service. Often shortened to "GCP" or "Google Cloud."
High Availability (HA)Designing systems to minimize downtime — typically 99.9% uptime (8.7 hours downtime/year) or 99.99% (52 minutes/year). Achieved through redundancy across multiple AZs.
IAMIdentity and Access Management — controls who can do what on which cloud resources. The #1 security configuration that every cloud engineer must master perfectly.
IaaSInfrastructure as a Service — cloud provides virtual machines, storage, and networking. You manage the OS and everything above. EC2, Azure VMs, GCE are examples.
Infrastructure as Code (IaC)Defining cloud infrastructure in code files (Terraform, CloudFormation, ARM Templates) rather than clicking through a console. Enables version control, repeatability, and automation.
InstanceA single virtual machine (server) running in the cloud. An EC2 instance, Azure VM instance, or GCE instance all mean the same thing — a rented computer.
Kubernetes (K8s)The industry-standard container orchestration system — automatically deploys, scales, and manages containerized applications across many servers. Originally created by Google in 2014.
LatencyThe delay between a request and its response — measured in milliseconds. Cloud region selection dramatically affects latency for end users.
Load BalancerDistributes incoming traffic across multiple servers so no single server is overwhelmed. Essential for high-availability applications. AWS ELB, Azure Load Balancer, GCP Cloud Load Balancing.
MicroservicesBreaking a large application into small, independently deployable services — each doing one job. Cloud platforms are ideal for microservices because each service can scale independently.
Multi-CloudUsing more than one cloud provider simultaneously — e.g., AWS for compute, Azure for Active Directory, GCP for analytics. Avoids vendor lock-in but adds complexity.
Object StorageStores unstructured data (files, images, videos) as "objects" in "buckets." AWS S3 is the original; Azure Blob Storage and GCS are equivalents. Virtually unlimited capacity.
On-Demand PricingPay by the hour (or second) for cloud resources with no commitment. The most flexible but most expensive pricing model — compare with Reserved Instances and Spot pricing.
On-Premises (On-Prem)Infrastructure physically owned and located in the company's own building or rented data center — as opposed to hosted in the cloud. The "old way" of IT before cloud.
PaaSPlatform as a Service — cloud manages the OS and middleware; you deploy your code. AWS Elastic Beanstalk, Azure App Service, Google App Engine are examples.
Private EndpointA private connection to a cloud service (like a database) that uses private IP addresses — traffic never traverses the public internet. Critical for security-sensitive workloads.
RegionA physical geographic area containing multiple data centers. AWS has 38 regions; Azure has 60+; GCP has 40. Running in multiple regions provides global coverage and disaster recovery.
Reserved Instances / Reserved CapacityCommitting to use a cloud resource for 1 or 3 years in exchange for discounts up to 72% (AWS) compared to On-Demand pricing.
SaaSSoftware as a Service — a fully managed application accessed via browser. Gmail, Salesforce, Slack, Zoom, and Microsoft 365 are SaaS products.
Security GroupA virtual firewall for cloud resources (VMs, databases) — controls which traffic can reach the resource. In AWS it's called Security Group; Azure calls them NSGs (Network Security Groups).
ServerlessRun code without managing servers — the cloud provider handles all infrastructure. AWS Lambda, Azure Functions, and GCP Cloud Functions/Cloud Run are serverless compute services.
Shared Responsibility ModelThe division of security between the cloud provider and the customer. AWS/Azure/GCP secure the underlying infrastructure; customers secure their data, access controls, and configurations.
SLA (Service Level Agreement)The uptime guarantee from the cloud provider — typically 99.9% to 99.99% for compute services. Understanding SLAs is critical for enterprise contract negotiations.
Spot Instance / Spot VMCloud capacity available at deep discount (up to 90%) but can be interrupted with short notice. Ideal for batch processing, dev/test, and fault-tolerant workloads.
SubnetA segment of a VPC/VNet's IP address space. Public subnets are internet-accessible; private subnets are isolated. Proper subnet design is fundamental to cloud security.
TerraformThe most popular cloud-agnostic Infrastructure as Code tool — works with AWS, Azure, and GCP. A cloud engineer who knows Terraform is valuable regardless of which cloud a company uses.
VPC / VNetVirtual Private Cloud (AWS/GCP) or Virtual Network (Azure) — an isolated private network within the cloud where your resources live. The foundation of cloud networking and security.
Vendor Lock-InThe risk of becoming too dependent on a single cloud provider's proprietary services — making it difficult or expensive to switch. Multi-cloud and open standards (Kubernetes, Terraform) reduce lock-in.
Well-Architected FrameworkAWS, Azure, and GCP each publish a framework of best practices for cloud architecture across five pillars: Security, Reliability, Performance, Cost Optimization, and Operational Excellence.
Zero TrustA security model where no user, device, or service is trusted by default — every access request is verified regardless of network location. Replaces the old "inside the firewall = trusted" model.
WAF (Web Application Firewall)Protects web applications from common internet attacks (SQL injection, XSS). AWS WAF, Azure WAF, and Google Cloud Armor are cloud-native WAF services.
SSO (Single Sign-On)Log in once and access many applications — no separate login per app. Azure Entra ID is the enterprise SSO leader; AWS SSO and Google Workspace provide this for their ecosystems.
TPU (Tensor Processing Unit)Google's custom AI processor — designed specifically for machine learning workloads. Up to 10x faster than standard GPUs for certain AI training tasks. Only available on GCP.
SageMakerAWS's fully managed machine learning platform — covers the full ML lifecycle from data preparation to model training to deployment. The most widely used enterprise ML platform.
BigQueryGoogle's serverless, scalable data warehouse — analyze petabytes of data using standard SQL in seconds. Widely considered the best cloud data warehouse for analytics.
DynamoDBAmazon's proprietary NoSQL database — massively scalable key-value and document store. Used by Amazon.com itself. Single-digit millisecond response at any scale.
Cosmos DBMicrosoft's globally distributed, multi-model NoSQL database on Azure. Supports multiple APIs (SQL, MongoDB, Cassandra). Guarantees < 10ms latency globally.
Pub/Sub (Publish-Subscribe)A messaging pattern where publishers send messages to a topic and subscribers receive them asynchronously. GCP Pub/Sub, AWS SNS/SQS, and Azure Service Bus implement this pattern.
RAG (Retrieval-Augmented Generation)An AI technique that gives LLMs access to your private data without retraining — the LLM retrieves relevant documents from your database before generating an answer. AWS Bedrock, Azure AI Search, and GCP Vertex AI all support RAG.
MLOpsMachine Learning Operations — applying DevOps principles to ML model development and deployment. Includes automated training pipelines, model monitoring, and version control for models.
Data LakeA centralized repository for storing raw data in its original format — structured, semi-structured, and unstructured. AWS S3, Azure Data Lake Storage, and GCS are used as data lake foundations.
ETL / ELTExtract, Transform, Load — the process of moving data from source systems to a data warehouse. AWS Glue, Azure Data Factory, and GCP Dataflow are managed ETL services.
Compliance (HIPAA, PCI-DSS, GDPR, FedRAMP)Regulatory frameworks that cloud deployments must adhere to. HIPAA (healthcare data), PCI-DSS (payment card data), GDPR (EU personal data), FedRAMP (US government). All three clouds have compliance programs for these standards.
💬

Cloud Engineer Recruiter Interview Cheat Sheet

// 60+ questions for AWS, Azure, and GCP cloud engineering roles

📌 How to Use This Section

You don't need to understand the technology to evaluate answer quality. Listen for real service names (not just "we used the cloud"), scale and complexity (how many users? how many services?), tradeoff thinking, and depth when pushed. Each question shows Strong ✓, Average ≈, and patterns.

☁️

Universal Cloud Questions

ALL CLOUD CANDIDATES
Opener
"Describe the most complex cloud architecture you've designed or managed. Walk me through the services used and why you chose them."
Strong: Names specific services (e.g., "EC2 Auto Scaling group behind an ALB, RDS Aurora Multi-AZ, ElastiCache Redis, S3 with CloudFront, monitored with CloudWatch alarms feeding PagerDuty"). Explains the business problem and why each service was chosen. Mentions scale (users, requests/sec, data volume).

Average: Can describe the architecture but vague on service names or scale.

"We used the cloud for our app" — no specifics, no architectural thinking. Shows surface-level cloud exposure only.
IaaS vs PaaS
"When would you choose IaaS (virtual machines) versus PaaS (App Service/Elastic Beanstalk/App Engine) for deploying an application?"
Strong: IaaS for: full OS control, custom software, compliance requiring specific configurations, cost optimization through right-sizing. PaaS for: faster deployment, less operational overhead, standard web apps where you just want to push code. Mentions that PaaS costs more per hour but saves engineering time — the total cost of ownership consideration.

Average: "PaaS is easier, IaaS gives more control" — correct but no nuance.

Can't explain the difference. A cloud professional must understand these fundamental service models.
Cost Optimization
"Cloud costs are growing faster than expected. What's your process for identifying and reducing cloud waste?"
Strong: Starts with visibility — enabling Cost Explorer (AWS), Azure Cost Management, or GCP Cost tools. Identifies: idle resources (VMs running but unused), overprovisioned instances (c5.4xlarge doing c5.large work), unattached EBS volumes/managed disks, forgotten old snapshots. Implements: Reserved Instances/Savings Plans for predictable workloads, Spot Instances for flexible jobs, auto-shutdown dev environments nights/weekends, rightsizing recommendations. Sets budget alerts. FinOps discipline.

Average: "Turn off what you don't use" — correct instinct but no process or tooling.

"Cloud is expensive, we just need more budget" — no optimization mindset. A red flag for any cloud role.
Security Basics
"What's the Shared Responsibility Model in cloud security, and what does that mean for what your team must protect?"
Strong: Cloud providers (AWS/Azure/GCP) secure the physical infrastructure, networking, and hypervisor. Customers secure everything built on top: IAM configurations, data encryption, network security groups, patch management for OS in IaaS, application code security, and data access controls. For SaaS, the vendor handles almost everything; for IaaS, customers have the most responsibility. Mentions least-privilege IAM as the #1 customer responsibility.

Average: "The cloud provider handles security" — a dangerous oversimplification.

Unaware of the model. Any cloud security breach starts with someone who thought "the cloud is secure, we don't need to worry."
Disaster Recovery
"If your primary cloud region has an outage, how would your architecture recover? What are RPO and RTO?"
Strong: RPO (Recovery Point Objective) = how much data loss is acceptable (last backup from 1 hour ago = 1 hour RPO). RTO (Recovery Time Objective) = how long to restore service (4 hours RTO = back up within 4 hours). DR strategies: backup/restore (cheapest, slowest), pilot light (minimal standby infrastructure), warm standby (partially running replica), multi-site active/active (most expensive, ~zero downtime). Chooses based on business impact of downtime. References specific services: RDS cross-region replicas, S3 cross-region replication, Route 53 health check failover.

Average: "We have backups" — backups alone aren't DR.

No awareness of RPO/RTO or multi-region design.
Kubernetes
"What problem does Kubernetes solve, and have you managed it in production? What was the hardest part?"
Strong: Kubernetes solves deploying, scaling, and managing many containers across many servers automatically. Without K8s, you'd manually manage which servers run which containers. Hardest parts from experience: networking (services, ingress, CNI), storage (persistent volumes for stateful apps), RBAC configuration, resource quotas, debugging pod crashes (kubectl logs, describe, events), and managing cluster upgrades without downtime.

Average: Knows Kubernetes is for containers but hasn't run it in production.

"Docker is the same thing as Kubernetes" — fundamentally incorrect and a knowledge gap for any mid-level cloud role.
🟠

AWS Engineer Questions

EC2 · S3 · LAMBDA · RDS · IAM · VPC
IAM Security
"What does 'least privilege' mean in AWS IAM, and how do you implement it in practice?"
Strong: Least privilege = grant only the minimum permissions needed to do a job. In practice: start with no permissions, add only what's needed; never use root account for daily work; use IAM Roles for services (EC2 instances, Lambda functions) instead of embedding access keys in code; use IAM Permission Boundaries for delegated admin; regularly audit and remove unused permissions with IAM Access Analyzer; enable MFA for all human users. Gives example of EC2 role having only s3:GetObject for one specific bucket, not s3:* for all buckets.

Average: "Give people only what they need" — correct concept but no implementation detail.

"We give admin access to developers so they can get things done quickly" — a critical security failure.
Architecture
"Design a highly available web application on AWS that can handle traffic spikes during a product launch."
Strong: CloudFront (CDN + WAF) → Route 53 (DNS failover) → ALB (Application Load Balancer) across multiple AZs → EC2 Auto Scaling Group (min=2, max=20) → RDS Aurora Multi-AZ (primary) + Read Replicas → ElastiCache Redis (session cache + frequent queries) → S3 (static assets, user uploads) → CloudWatch (alarms → SNS → PagerDuty). Database: use read replicas for product catalog reads; write to primary. Auto Scaling scales out at 70% CPU, scales in at 30%.

Average: "EC2 with a load balancer and database" — correct foundation but missing redundancy, caching, and auto-scaling specifics.

A single EC2 instance or no awareness of load balancing and auto-scaling.
S3 vs EBS vs EFS
"What's the difference between S3, EBS, and EFS? When do you use each?"
Strong: S3 (object storage) — unlimited files accessed via URL/API; no attached to specific EC2; great for backups, static websites, data lakes, large files. EBS (block storage) — virtual hard drive attached to one EC2 instance; persistent disk, like an SSD. Use for OS volumes, databases running on EC2. EFS (Elastic File System) — shared NFS file system mountable by many EC2 instances simultaneously; good for shared application storage, CMS media files. Uses GPT: S3=files you access via API, EBS=one server's disk, EFS=shared drive for many servers.

Average: Knows S3 is for files and EBS is for EC2 but unclear on EFS.

"S3 is like a hard drive for EC2" — fundamentally incorrect. S3 is not mounted to EC2 like a disk.
Lambda
"What is a Lambda cold start and how do you minimize its impact?"
Strong: Cold start = the delay (50ms–10s) when Lambda must initialize a new execution environment for the first time (or after a period of inactivity). It includes: downloading the function code, initializing the runtime, and running the initialization code outside the handler. Mitigation: Provisioned Concurrency (pre-warm a fixed number of environments — costs money), using smaller deployment packages (less to download), choosing runtimes with lower cold starts (Python/Node.js vs. Java/C#), keeping Lambda functions warm with scheduled pings. Tradeoff: Provisioned Concurrency eliminates cold starts but you pay even when idle.

Average: "Lambda can be slow to start sometimes" without knowing why or the solutions.

Unaware of cold starts — a fundamental Lambda production concern.
CloudFormation
"Have you used Infrastructure as Code? What's the difference between CloudFormation, CDK, and Terraform?"
Strong: All three create AWS infrastructure via code. CloudFormation: AWS-native, YAML/JSON templates, no coding knowledge needed, deeply integrated, sometimes verbose. CDK (Cloud Development Kit): write CloudFormation using Python/TypeScript/Java — much more expressive for complex infrastructure with loops and conditions. Terraform: cloud-agnostic HCL language, multi-cloud (AWS + Azure + GCP in one codebase), huge community, most popular IaC tool overall. Chooses based on: team skills, multi-cloud needs, existing investment. Mentions state files (Terraform) and stack management (CloudFormation).

Average: Has used one of the three but can't compare them.

Still clicking through the AWS console for production infrastructure — unscalable and error-prone for any serious environment.
VPC Design
"Design a VPC for a three-tier web application (web, app, database layers) with proper security isolation."
Strong: VPC with CIDR 10.0.0.0/16. Two public subnets (web tier — ALB + NAT Gateways) across 2 AZs. Two private subnets (app tier — EC2 Auto Scaling) across 2 AZs. Two isolated subnets (database tier — RDS Multi-AZ) across 2 AZs. Security Groups: ALB SG (allow 80/443 from 0.0.0.0/0); App SG (allow 8080 from ALB SG only); DB SG (allow 5432 from App SG only). NAT Gateway allows private subnets to reach internet for updates without being publicly accessible. No direct route from database subnet to internet gateway.

Average: Public/private subnet concept but not the full three-tier isolation.

"Put everything in the same subnet" — no security segmentation is a production risk.
🔵

Azure Engineer Questions

VMs · ENTRA ID · AKS · SQL · DEVOPS · ARM
Identity
"Explain Azure RBAC and how you'd set up access for a team of developers who need to deploy to Azure but not manage billing or security settings."
Strong: Azure RBAC (Role-Based Access Control) assigns roles at scope (management group → subscription → resource group → resource). Developers need: "Contributor" on their resource group (deploy and manage resources, can't change access control) NOT "Owner" (which grants permission to modify access). Use Azure AD security groups — assign the group to the role, not individual users (easier to manage as team grows). Use Custom Roles for precise permission sets. Enable Privileged Identity Management (PIM) for just-in-time elevated access rather than permanent permissions.

Average: Knows RBAC exists and the Owner/Contributor/Reader roles but no advanced implementation.

"Give everyone Owner access to avoid permission issues" — creates security and compliance failures.
Hybrid
"A company has 200 Windows servers on-premises and wants to start using Azure. What's your migration approach?"
Strong: Phase 1: Assessment using Azure Migrate (discovers on-premises VMs, assesses compatibility and sizing, estimates cloud costs). Phase 2: Connect networks — Azure VPN Gateway or ExpressRoute for private connection. Phase 3: Enable Azure Arc to manage on-premises servers from Azure portal (non-intrusive, immediate cloud benefits). Phase 4: Lift-and-shift priority workloads using Azure Migrate (converts VMs to Azure format). Phase 5: Optimize — Azure Hybrid Benefit (use existing Windows Server licenses), right-sizing, reserved instances. Phased approach reduces risk. Mentions Azure Site Recovery for minimal-downtime migrations.

Average: "Move everything to Azure" without a phased approach or assessment first.

No migration methodology — "just upload the VMs" shows no enterprise migration experience.
AKS
"What's the difference between AKS (Azure Kubernetes Service) and Azure Container Apps? When would you choose each?"
Strong: AKS: full Kubernetes control — you choose everything (node sizes, networking, storage, ingress controller). Best for teams with K8s expertise who need customization, or complex microservices architectures. Container Apps: serverless Kubernetes — Microsoft manages the entire cluster. Built on Kubernetes + KEDA (event-driven auto-scaling) + Dapr (service mesh). Much simpler — just deploy your container image. Best for: simple microservices, event-driven workers, teams without K8s expertise. Container Apps = "Kubernetes without the PhD." Azure Functions is even simpler — for single-function event processing.

Average: Knows AKS but not Container Apps (relatively new service).

"We just use VMs for containers" — no container orchestration knowledge for a container role.
DevOps Pipeline
"Walk me through the CI/CD pipeline you'd set up for a .NET application deploying to Azure App Service."
Strong: Git push to main branch → Azure Pipelines (or GitHub Actions) triggers: 1) Restore NuGet packages, 2) Build .NET app, 3) Run unit tests (fail if tests fail), 4) Run SAST security scan, 5) Publish artifact, 6) Deploy to Staging slot (App Service deployment slots for zero-downtime), 7) Run smoke tests, 8) Swap staging slot to production (zero-downtime swap). Monitoring: Application Insights for errors, Azure Monitor alerts, auto-rollback if error rate spikes. Secrets in Azure Key Vault, referenced from pipeline.

Average: "Push code, Azure deploys it" — aware of automation but no structured pipeline design.

Manual deployment processes — "I copy files to the server" is a dealbreaker for any DevOps role.
🔴

Google Cloud Engineer Questions

GCE · BIGQUERY · VERTEX AI · GKE · IAM
BigQuery
"A data analyst complains that their BigQuery query is costing $500 every time they run it. How do you fix this?"
Strong: BigQuery charges per TB of data scanned. Fixes: 1) Use column selection instead of SELECT * (only read needed columns — columnar storage only reads selected columns); 2) Partition tables by date and filter by partition (scans only relevant date ranges, not entire table); 3) Cluster tables on frequently filtered columns; 4) Preview data volume before running using dry run mode; 5) Use BigQuery BI Engine for repeated dashboard queries; 6) Set custom cost controls (project-level cost limits). Explains the difference between storage cost (cheap) and query cost (per TB scanned).

Average: "Optimize the SQL" without understanding BigQuery's columnar/partition cost model.

No awareness of BigQuery pricing model — very common but expensive mistake in GCP data environments.
Vertex AI
"An executive wants to build an internal AI assistant that answers questions using the company's private documents. How would you architect this on GCP?"
Strong: RAG (Retrieval-Augmented Generation) architecture: 1) Store documents in GCS or BigQuery; 2) Use Vertex AI's Document AI or chunking pipeline to process documents; 3) Generate embeddings using text-embedding-gecko model; 4) Store embeddings in Vertex AI Vector Search (or AlloyDB pgvector); 5) When user asks a question: embed the query → search vector store for relevant documents → pass documents + query to Gemini 1.5 Pro via Vertex AI → return grounded answer. Uses Agent Builder (Vertex AI Search and Conversation) for faster implementation. Keeps data within GCP for security/compliance.

Average: "Connect ChatGPT to our documents" without understanding RAG or data privacy concerns.

"Send company documents to OpenAI's API" — a compliance and data privacy violation for most enterprises.
GCP IAM
"What's the difference between GCP Service Accounts and IAM roles? How do you securely allow a GCE VM to access a GCS bucket?"
Strong: IAM Roles define what permissions are granted (e.g., "storage.objectViewer" — read GCS objects). Service Accounts are identities for machines/services (not humans). To allow a VM to access GCS: 1) Create a service account; 2) Grant it "Storage Object Viewer" role on the specific bucket (principle of least privilege — not the entire project); 3) Attach the service account to the VM at creation; 4) The VM's code uses Application Default Credentials — no API keys needed. Never embed service account JSON key files in code or store in environment variables — use Workload Identity (preferred) or Secret Manager.

Average: Knows service accounts exist but embeds API key JSON in application code.

"Store the API key in a config file" — hardcoded credentials are a critical security vulnerability.
Data Pipeline
"Design a real-time data pipeline on GCP that ingests 1 million events per minute from IoT sensors into BigQuery for analytics."
Strong: IoT devices → Cloud Pub/Sub (managed message queue — handles 1M+ events/min, durable, decouples producers/consumers) → Dataflow (managed Apache Beam — streaming pipeline, transforms/cleanses data in real time) → BigQuery (streaming inserts — data queryable within seconds) OR GCS + BigQuery scheduled load (for batch, cheaper). Monitoring: Cloud Monitoring for Pub/Sub subscription lag, Dataflow job metrics. Error handling: dead-letter topic for failed messages. Alternative: use BigQuery Direct Streaming API for simple cases without transformation.

Average: "Use Pub/Sub and BigQuery" — correct services but no pipeline architecture or scaling considerations.

"Write directly from devices to BigQuery" — at 1M events/min this creates pressure on BigQuery's streaming quota and lacks error handling.

🚩 Universal Cloud Engineering Red Flags

Warning signs that should prompt deeper questioning regardless of résumé claims

Click-ops onlyEngineer configures everything manually via the cloud console — no Terraform, CloudFormation, or ARM Templates. Unscalable, error-prone, and not reproducible. A mid-senior cloud role requires IaC.
No security mindset"The cloud handles security" or giving admin access liberally. Cloud misconfiguration (not the cloud provider) is the #1 cause of breaches. An engineer who doesn't think about least privilege IAM is a liability.
Single-region designEvery production architecture should be designed for regional failures. An engineer who's never designed for multi-AZ or cross-region failover hasn't built production-grade systems.
Hardcoded credentialsAPI keys, database passwords, or service account keys embedded in code or config files. Every cloud provider provides secrets management (Secrets Manager, Key Vault, Secret Manager) — not knowing to use them is a critical security gap.
No cost awarenessCan't estimate cloud costs or has no experience with Reserved Instances, right-sizing, or Auto Scaling. Cloud engineers who don't manage costs create budget overruns — FinOps literacy is now required.
Certification without experienceHas AWS/Azure/GCP cert but can't answer architectural questions or describe real projects. Always pair certification screening with scenario-based questions. "What did you actually build with this?" reveals the truth.