Azure Error Guide: 'VMExtensionProvisioningError' Extension Failed to Provision
Fix the Azure VMExtensionProvisioningError when a VM extension or custom script returns a non-zero exit code, with diagnostics and a step-by-step resolution.
- #azure
- #troubleshooting
- #errors
- #compute
Exact Error Message
When a VM extension fails to provision, Azure surfaces an error like this from az vm create, az vm extension set, or a Terraform/ARM/Bicep deployment:
(VMExtensionProvisioningError) VM has reported a failure when processing
extension 'customScript' (publisher 'Microsoft.Azure.Extensions' and type
'CustomScript'). Error message: 'Enable failed: failed to execute command:
command terminated with exit status=1
[stdout]
Reading package lists...
[stderr]
E: Unable to locate package nginx-full
'. More information on troubleshooting is available at
https://aka.ms/VMExtensionCSELinuxTroubleshoot.
Code: VMExtensionProvisioningError
You may also see a timeout variant when the extension never reports completion within its allotted window:
(VMExtensionProvisioningTimeout) Provisioning of VM extension 'customScript'
has timed out. Extension provisioning has taken too long to complete. The
extension did not report a status within the time limit.
Code: VMExtensionProvisioningTimeout
What the Error Means
VMExtensionProvisioningError means the underlying virtual machine booted successfully, but the extension handler running inside the guest reported a failure. The Azure control plane runs the extension (for example Custom Script, the Azure Monitor Agent, or a configuration tool like Chef/Puppet), waits for a status file, and reads the exit code. A non-zero exit code from the extension’s command bubbles up as this error.
The important distinction: the VM itself is usually healthy. This is a guest-side failure, not a fabric or hardware problem. The ProvisioningState of the extension goes to Failed, and depending on how you deployed, the whole VM resource may report Failed as well, even though it is running and reachable.
The VMExtensionProvisioningTimeout variant means the handler never wrote a terminal status file at all — the script is hung, waiting on input, or running longer than the extension’s timeout allows.
Common Causes
- Custom script returns a non-zero exit code. The most common cause. A
command terminated with exit status=1means a command in your script failed and the script did not handle it (orset -eaborted it). - Script download fails. The
fileUrispoint to blob storage with an expired SAS token, a private container without credentials, or a host that is unreachable from the VM’s subnet (NSG/firewall/no outbound route). - Package repository unreachable.
apt-get update,yum install, ordnfcannot reach the repo — outbound 443/80 is blocked, DNS is broken, or the mirror is down. - Dependency not ready. The script assumes a disk is mounted, a service is up, cloud-init finished, or another extension already ran. Race conditions are frequent.
- Timeout. A long compile, large download, or
aptlock wait exceeds the extension window. - Wrong settings vs. protectedSettings. Secrets (SAS tokens, passwords) placed in
settingsinstead ofprotectedSettings, or a malformedcommandToExecute. - Conflicting extensions. Two extensions touch the same package manager or config at once, or you assigned the same extension twice with different settings.
- OS / extension version mismatch. The extension version does not support the distro/version (for example a new Ubuntu release before the handler supports it).
How to Reproduce the Error
A reliable way to reproduce it is to ship a Custom Script that references a package that does not exist, with strict error handling enabled:
# install.sh — fails because the package name is wrong and set -e aborts
set -euo pipefail
apt-get update
apt-get install -y nginx-full # not a valid package name -> exit 1
systemctl enable --now nginx
az vm extension set \
--resource-group rg-demo \
--vm-name web-01 \
--name customScript \
--publisher Microsoft.Azure.Extensions \
--version 2.1 \
--settings '{"fileUris":["https://demostore.blob.core.windows.net/scripts/install.sh"]}' \
--protected-settings '{"commandToExecute":"bash install.sh"}'
The CLI returns VMExtensionProvisioningError with the failing stderr embedded.
Diagnostic Commands
These commands are read-only — they inspect state without changing anything.
# 1. List all extensions on the VM and their provisioning state
az vm extension list \
--resource-group rg-demo \
--vm-name web-01 \
--query "[].{name:name, state:provisioningState, type:typePropertiesType}" \
-o table
# 2. Show full config and status of the failing extension
az vm extension show \
--resource-group rg-demo \
--vm-name web-01 \
--name customScript \
-o json
# 3. Read the per-extension status from the instance view (has the message)
az vm get-instance-view \
--resource-group rg-demo \
--name web-01 \
--query "instanceView.extensions[].{name:name, statuses:statuses[].message}" \
-o json
# 4. Pull the serial/boot log (boot diagnostics must be enabled)
az vm boot-diagnostics get-boot-log \
--resource-group rg-demo \
--name web-01
The most detailed output lives on the VM in the extension’s own logs. SSH in (the VM is running) and read:
# Linux — Custom Script and handler logs
sudo cat /var/log/azure/custom-script/handler.log
sudo cat /var/log/azure/Microsoft.Azure.Extensions.CustomScript/*/extension.log
sudo cat /var/lib/waagent/custom-script/download/0/stderr
These files capture the exact stdout/stderr and exit code the handler saw — far more than the truncated control-plane message.
Step-by-Step Resolution
-
Read the instance view first. Run diagnostic command #3 above. The
statuses[].messagefield tells you whether it was a non-zero exit, a download failure, or a timeout. This points you at the right fix before you touch anything. -
Read the on-VM extension logs. SSH in and inspect
/var/log/azure/.../extension.logand thestderrfile. The actual failing command and its output are here. The control-plane message is often truncated. -
Fix the script for idempotency and correct exit codes. Make the script safe to re-run and explicit about failures:
set -euo pipefail export DEBIAN_FRONTEND=noninteractive # wait for the apt lock instead of failing while fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; do sleep 5; done apt-get update -y apt-get install -y nginx # correct package name systemctl enable --now nginxAlways use the real package name, guard against locks, and let the script exit 0 only on genuine success.
-
Validate downloads and SAS tokens. Confirm every URI in
fileUrisis reachable from the VM and not expired. From the VM:curl -fsSL "<fileUri>" -o /tmp/test.sh. If it 403s, regenerate the SAS token (longer expiry) or use a managed identity instead of SAS. -
Increase the timeout for slow scripts. If you hit
VMExtensionProvisioningTimeout, addtimestampto force a re-run and ensure the script does not block on prompts. Background nothing that must finish — the handler needs the foreground process to exit so it can write the status file. -
Re-run the extension. Update the settings (changing
timestampforces re-execution) and re-apply:az vm extension set \ --resource-group rg-demo \ --vm-name web-01 \ --name customScript \ --publisher Microsoft.Azure.Extensions \ --version 2.1 \ --settings '{"fileUris":["https://demostore.blob.core.windows.net/scripts/install.sh"],"timestamp":123456790}' \ --protected-settings '{"commandToExecute":"bash install.sh"}' -
Confirm success. Re-run diagnostic command #1; the
provisioningStateshould now readSucceeded.
Prevention and Best Practices
- Make scripts idempotent. Extensions can re-run on reboot or model change. Re-running must not break a working host.
- Handle errors explicitly. Use
set -euo pipefail, check return codes, and retry transient network operations with backoff. - Keep secrets in
protectedSettings. SAS tokens and passwords belong there;settingsis logged in plaintext. - Prefer managed identity over SAS for
fileUrisso you never ship an expiring token. - Test the script standalone on the same base image before wiring it into an extension.
- Set realistic timeouts and avoid interactive prompts (
DEBIAN_FRONTEND=noninteractive,-yflags). - Pin and validate extension versions against your OS image, and avoid two extensions racing on the same package manager.
- Enable boot diagnostics up front so logs are available the moment something fails.
For a broader treatment of guarding cloud automation against partial failures, see our DevOps troubleshooting guides.
Related Errors
- VMExtensionProvisioningTimeout — the handler never reported a terminal status in time. Usually a hung script or a blocking prompt. Fix the script to exit; raise the timeout.
- VMExtensionHandlerNonTransientError — the extension handler itself failed in a way Azure will not retry (bad version, unsupported OS, corrupt handler). Often needs a version change or image fix.
- ProvisioningState/failed — the generic rollup state. A failed extension frequently drives the parent VM into this state even though the VM is running.
- OSProvisioningTimedOut — different layer: the guest agent (waagent / Windows provisioning) never reported the OS as ready. Tied to a broken image or cloud-init, not your extension.
- ImagePullBackOff (AKS) — the Kubernetes analogue. A pod cannot pull its container image (bad tag, private registry, no credentials). Same theme of a workload failing to start, but at the cluster level.
Frequently Asked Questions
Where do extension logs live on Linux vs. Windows?
On Linux: /var/log/azure/<extension-name>/ plus the guest agent logs in /var/log/waagent.log, and downloaded scripts under /var/lib/waagent/custom-script/download/. On Windows: C:\WindowsAzure\Logs\Plugins\<Publisher>.<Type>\<version>\ and C:\WindowsAzure\Logs\WaAppAgent.log.
Does a failed extension fail the whole VM?
The VM almost always keeps running and stays reachable — only the extension’s provisioningState is Failed. But the parent VM resource can roll up to Failed, which breaks downstream automation and Terraform plans. The compute is fine; the deployment status is not.
How do I retry the extension without recreating the VM?
Re-apply it with az vm extension set using a new timestamp value (or any settings change) to force re-execution. You do not need to delete the VM — only the extension re-runs.
Why does the error message look truncated?
The control-plane message caps the embedded stdout/stderr. The complete output is in the on-VM extension.log and stderr files. Always read those for the real root cause.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.