The Implementation Of A Virtual Machine For Smart Contracts: Evm Bytecode Execution And Gas Metre

A comprehensive technical exploration of the implementation of a virtual machine for smart contracts: evm bytecode execution and gas metre, covering key concepts, practical implementations, and real-world applications.
Contents
The Heart of the Blockchain Engine: A Deep Dive into EVM Bytecode Execution and the Gas Metre
Imagine, for a moment, you’re sending a friend a few dollars via a traditional banking app. Behind that single, simple tap on your screen lies a staggering cascade of centralized infrastructure: secure servers, database validations, fraud detection algorithms, and the eventual clearinghouse settlement. Now, imagine you’re doing something far more radical. Instead of sending simple currency, you’re deploying a self-executing contract—a piece of code that will, without a lawyer, a judge, or a bank, autonomously trade a digital asset, based on a data feed from the real world, and then distribute the proceeds to thousands of anonymous participants across the globe. This isn’t a science fiction fantasy; it is the daily reality of decentralized finance (DeFi), and it is made possible by a single, elegant, and fiercely deterministic piece of engineering: the Ethereum Virtual Machine (EVM).
The EVM is the beating heart of the most prominent smart contract ecosystem. It is the abstract, sandboxed runtime environment that gives birth to the “world computer.” But for most developers and users, it remains a black box—a mystical engine that somehow processes our Solidity code, stores our state, and, most importantly, charges us a variable and often frustrating fee known as “gas.” We write code, we deploy, we pay. But how does it actually work? How does a network of thousands of globally distributed nodes, each running a different operating system on different hardware, arrive at the exact same result from the same input? The answer lies not in Solidity, or Vyper, or any high-level language, but in the raw, unforgiving language of the machine itself: EVM bytecode.
Understanding the bytecode execution model and the economic mechanism of the gas metre is not merely an academic exercise. It is the key to writing efficient, secure, and cost-effective smart contracts. It unlocks the ability to debug low-level failures, design gas-efficient algorithms, and truly appreciate the fundamental trade-offs that make decentralized computation viable. In this deep dive, we will strip away the abstractions layer by layer. We will start with the EVM’s architecture—its stack, memory, storage, and call context—and then follow the precise choreography of opcodes as they execute. We will dissect how each instruction consumes gas, why some operations are vastly more expensive than others, and how the Ethereum protocol has evolved to mitigate abuse. Along the way, we will write raw bytecode, compile simple Solidity snippets, and trace the gas costs with surgical precision.
By the end, you will see the Ethereum blockchain not as a vague cloud of transactions, but as a deterministic state machine where every instruction is accounted for, every storage slot is measured, and every drop of gas is a precious resource. Let’s begin.
1. The Architecture of the EVM: A Minimalist Stack Machine
The EVM is not a general-purpose computer like your laptop. It is a stack-based virtual machine with a very narrow instruction set (around 140 opcodes as of the Shanghai upgrade). It has no registers, no arbitrary memory access, and no multi-threading. Instead, it operates on three distinct data areas:
- Stack: A last-in-first-out (LIFO) data structure. All arithmetic, logical, and control-flow operations work by popping operands from the top and pushing results. The stack can hold up to 1024 elements, each 256 bits (32 bytes) wide. This 256-bit word size is no accident—it aligns with the cryptographic primitives (hashing, elliptic curve operations) that underpin Ethereum.
- Memory: A linear byte array that grows dynamically. It is volatile and cleared between external calls. Memory is used for temporary data like arrays, function arguments, and return values. It is word-addressable (32-byte aligned) but can be accessed at byte granularity.
- Storage: A persistent key-value store, where both keys and values are 256-bit words. Storage is the contract’s long-term memory—it persists across transactions and even across contract upgrades. Unlike memory, storage is expensive to write because it changes the global state trie and must be agreed upon by all nodes.
Additionally, the EVM has a program counter (PC) that points to the current bytecode instruction, and a gas counter that decrements with every operation. It also has access to contextual information: the caller’s address, the current contract’s balance, block data, transaction origin, and more—all exposed through special opcodes like CALLER, BALANCE, TIMESTAMP, and GASLIMIT.
The EVM is deterministic: given the same initial state and the same bytecode, every node in the network must compute the exact same final state. This is enforced by specifying the exact gas cost for every opcode, the exact order of stack operations, and the exact semantics of each instruction. There is no room for ambiguity or implementation-specific behavior. That is why the EVM is specified in the Ethereum Yellow Paper with mathematical precision.
2. Bytecode and Opcodes: The Language of the Machine
High-level languages like Solidity are compiled down to EVM bytecode, a sequence of bytes where each byte (or pair of bytes) represents an opcode or immediate data. For example, the opcode STOP (0x00) halts execution. ADD (0x01) pops two 256-bit values from the stack, adds them modulo 2^256, and pushes the result. SSTORE (0x55) pops a key and a value from the stack and writes the value into the contract’s storage at that key.
Let’s look at a trivial example: a program that adds 2 and 3 and stores the result in storage slot 0.
In raw bytecode:
PUSH1 0x03 (opcode 0x60, followed by 0x03)
PUSH1 0x02 (opcode 0x60, followed by 0x02)
ADD (opcode 0x01)
PUSH1 0x00 (opcode 0x60, followed by 0x00)
SSTORE (opcode 0x55)
STOP (opcode 0x00)
The byte representation is:
60 03 60 02 01 60 00 55 00
But the EVM reads this as a stream. The program counter starts at byte 0 (the first PUSH1). PUSH1 tells the EVM to read the next byte (0x03) and push it onto the stack. Then PC moves to byte 2 (the next PUSH1), pushes 0x02, moves to byte 4, executes ADD, which pops 0x02 and 0x03, computes 0x05, pushes it. Then PUSH1 0x00 pushes 0, then SSTORE pops the key (0) and value (5) and writes to storage. Then STOP.
This is the fundamental execution cycle: fetch opcode, decode, execute, consume gas, advance PC.
2.1 Opcode Families
Opcodes are grouped by purpose:
- Arithmetic:
ADD,SUB,MUL,DIV,SDIV,MOD,SMOD,ADDMOD,MULMOD,EXP,SIGNEXTEND - Comparison & Bitwise:
LT,GT,SLT,SGT,EQ,ISZERO,AND,OR,XOR,NOT,BYTE,SHL,SHR,SAR - Memory & Stack:
POP,MLOAD,MSTORE,MSTORE8,MSIZE,JUMP,JUMPI,PC,MSIZE - Storage:
SLOAD,SSTORE - Environment:
ADDRESS,BALANCE,ORIGIN,CALLER,CALLVALUE,CALLDATALOAD,CALLDATASIZE,CALLDATACOPY,CODESIZE,CODECOPY,GASPRICE,EXTCODESIZE,EXTCODECOPY,RETURNDATASIZE,RETURNDATACOPY,EXTCODEHASH - Block:
BLOCKHASH,COINBASE,TIMESTAMP,NUMBER,DIFFICULTY,GASLIMIT,CHAINID,SELFBALANCE - Calls:
CALL,CALLCODE,DELEGATECALL,STATICCALL,RETURN,REVERT,SELFDESTRUCT,CREATE,CREATE2 - Logging:
LOG0throughLOG4 - System:
STOP,INVALID,SELFDESTRUCT,REVERT,RETURN,INVALID
Each opcode has a fixed upfront gas cost plus sometimes a dynamic cost based on the data being processed. For example, SLOAD costs 2100 gas (post-EIP-2929 for a cold slot) and SSTORE varies from 20000 to 2900 depending on whether it is a fresh write, a change, or a reset to zero (which yields a gas refund). Understanding these costs is crucial for gas optimization.
3. The Gas Metre: Why Does Computation Cost Gas?
The concept of gas is the linchpin of Ethereum’s economic security. Without gas, a malicious or buggy contract could run an infinite loop, consuming all computational resources of every node in the network—a classic denial-of-service attack. Gas introduces a market for computation: every transaction specifies a gas limit (the maximum computational units it can consume) and a gas price (the amount of ETH per unit the sender is willing to pay). Miners choose transactions with higher gas prices to maximize their fee revenue.
The gas cost of an opcode is designed to reflect the real-world cost of performing that operation on hardware. For example, a simple arithmetic operation like ADD (3 gas) is cheap because it’s just a CPU instruction. But an SSTORE operation that modifies persistent storage is expensive because it involves updating the Merkle Patricia trie—a data structure that requires O(log n) hashing and disk I/O across thousands of nodes. Similarly, the EXP opcode (exponentiation) costs a variable gas that scales with the exponent’s byte size, because modular exponentiation is computationally heavy.
Gas also prevents infinite loops by imposing a finite budget. If a contract runs out of gas mid-execution, all state changes are reverted (except the gas fee paid to the miner). This “all-or-nothing” atomicity ensures that a failed transaction cannot leave a corrupted state.
3.1 Gas Cost Tables (Simplified)
| Opcode | Gas Cost | Notes |
|---|---|---|
| STOP | 0 | Halts execution |
| ADD/SUB | 3 | Arithmetic |
| MUL/DIV | 5 | More expensive than ADD |
| EXP | 10 + (50 * byte_len(exponent)) | Variable; can be very high |
| SLOAD | 2100 (cold) / 100 (warm) | EIP-2929 introduced warm/cold distinction |
| SSTORE (set to non-zero from zero) | 22100 | Creates a new storage slot (cold write) |
| SSTORE (change non-zero to non-zero) | 2900 (if warm) or 5000 (cold) | Overwrites existing slot |
| SSTORE (clear to zero) | 5000 (warm) + refund 4800 | But refund capped at 20% of total gas |
| CALL | 700 (warm) + 2300 (reserved for callee) + value transfer cost | Complex; includes gas stipend |
| MLOAD | 3 | Memory read |
| MSTORE | 3 | Memory write (plus memory expansion cost) |
| JUMPDEST | 1 | Marks valid jump destination |
| LOG0 | 375 + 8 * data_byte_len | Logging to blockchain |
| CREATE | 32000 + 200 * code_byte_len | Contract creation |
The exact gas schedule has been tweaked many times. The most impactful changes were EIP-150 (introducing “call” gas cost quadratic scaling for certain operations), EIP-160 (exp gas cost), and EIP-2929 (cold/warm storage access). These improvements were responses to observed abuse patterns, such as the Shanghai attacks where attackers used inexpensive SLOAD and SSTORE to bloat state.
4. Execution Model: Step by Step
Let’s walk through the execution of a real Ethereum transaction, from the moment the EVM receives the input to the final state transition.
- Transaction starts: The sender specifies the destination contract address, call data (ABI-encoded function selector + arguments), gas limit, and gas price. The node validates the signature and sender’s balance.
- Gas prepayment: The total possible gas cost (gas limit * gas price) is deducted from the sender’s balance.
- EVM instantiation: The EVM creates a new execution context. It loads the contract’s bytecode from the state trie, sets the program counter to 0, initializes an empty stack and memory, sets the gas counter to the transaction’s gas limit, and copies the call data into a separate read-only area (the calldata space).
- Execution loop: At each step, the EVM reads the opcode at the current PC, increments the PC (unless the opcode modifies it, like
JUMP), deducts the gas cost for that opcode from the gas counter, and executes it. If the gas counter falls below the required cost, execution halts with an “Out of Gas” exception and all state changes are reverted. - Return data: If the transaction is a contract creation, the EVM runs the initialization bytecode, which returns the runtime bytecode (the code that will be stored on-chain). The returned bytecode is then stored at the new contract’s address.
- Gas refund and payment: After execution, any unused gas is refunded to the sender. The miner receives the gas used multiplied by the gas price, plus any gas refunds are netted (capped at 20% of the total gas used). The state diff is applied.
4.1 Memory Expansion
Memory is dynamic. When you MSTORE at an offset beyond the current memory size, the EVM expands memory. The cost of memory expansion is quadratic in the number of 32-byte words used. The formula is:
memory_gas_cost = (a * a + a) / 512
where a = ceil(memory_byte_size / 32). This incentivizes minimal memory usage. For example, storing at offset 0 is free (already zero size), but storing at offset 31 expands memory to 32 bytes (cost 3 gas, because a=1 gives (1+1)/512=~0 but there is a minimum of 3 gas for MSTORE). However, if you store at offset 10,000, you’ll pay a lot more.
4.2 Stack Depth Limit
The EVM stack can hold at most 1024 elements. If a PUSH would exceed that, execution halts with a “Stack Overflow” exception. Similarly, POP on an empty stack causes a “Stack Underflow.” This limits recursion depth and enforces a fixed upper bound on combinatorial complexity.
5. From Solidity to Bytecode: A Compilation Walkthrough
To truly understand gas, let’s compile a simple Solidity contract and examine its bytecode.
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
contract Counter {
uint256 public count;
function increment() public {
count += 1;
}
}
Using solc 0.8.20, we compile with --asm or --opcodes to see the intermediate assembly. The runtime bytecode (the deployed code) starts after the constructor code. Let’s look at the increment() function.
The function selector for increment() is the first 4 bytes of keccak256("increment()"). That hash is 0x27c33cd4 (just an example; actual might differ). The Solidity compiler generates a dispatcher that reads the first 4 bytes of calldata, compares to known selectors, and jumps to the appropriate code. Suppose the jump table is set up.
The core of increment():
; Load storage slot 0
PUSH1 0x00
SLOAD ; push current value of count onto stack
; Add 1
PUSH1 0x01
ADD
; Store back to slot 0
PUSH1 0x00
SSTORE
; Return nothing (void function)
STOP
But the compiler also inserts checks for overflow? In Solidity 0.8+, overflow checks are automatic. So it uses ADD with overflow check? Actually ADD wraps around, so Solidity uses checked_add which is implemented via the add opcode followed by an overflow check using LT and ISZERO? No, Solidity 0.8+ adds an overflow check by using the addmod opcode? Actually, for uint256, the checked addition in EVM uses the fact that if the result is less than either operand, overflow occurred. So the compiled code might be:
PUSH1 0x00
SLOAD
PUSH1 0x01
DUP3 ; duplicate the original value
ADD
DUP2
LT ; check if result < original? Actually careful: a + b < a if overflow (since b>0)
ISZERO
PUSH1 overflow_label?
JUMPI
; then store
PUSH1 0x00
SSTORE
...
But the exact details are intricate. For simplicity, assume the overflow check adds extra gas.
Now, let’s compute gas for this increment() call:
- Dispatcher jump: The calldata is loaded, selector matched. This involves
CALLDATALOAD(3 gas),PUSHs,EQ(3),JUMPI(10). Rough estimate: ~30 gas. - Storage read:
SLOADcosts 2100 (cold slot) or 100 (warm). If this is the first time the contract is called in a transaction, it’s cold. So 2100. - Arithmetic:
PUSH1(3 each) x3 = 9,ADD(3),DUP2(3),LT(3),ISZERO(3),JUMPI(10) -> about 31 gas for overflow check (plus onePUSHfor destination). So 31. - Storage write:
SSTOREfor changing count from 0 to 1: 22100 (cold write, since slot was zero). If count was already 1, changing to 2 costs 2900 (warm) or 5000 (cold). Also note that a zero-to-nonzero write costs 22100, but a nonzero-to-nonzero change costs 5000 (if cold) or 2900 (if warm). This is a huge difference. - Cleanup:
STOPcosts 0.
So total gas for first increment: ~30 + 2100 + 31 + 22100 = 24261 gas. The base fee for a transaction (21,000 gas) is separate for the transaction itself (covering call data bytes, signature verification, etc.). So total transaction gas would be about 21,000 + 24,261 = 45,261 gas.
But if we call increment() again in the same transaction, the storage slot is warm from the first SLOAD, so the second SLOAD costs 100, and the second SSTORE (changing 1 to 2) costs 2900 (if warm) or 5000 (if cold slot but warm from previous write? Actually, after a cold SLOAD, the slot becomes warm for the rest of the transaction. The second SLOAD costs 100. The SSTORE sees the slot is warm (since it was accessed) and it changes from non-zero to non-zero, so 2900. That’s a significant saving: 2100 -> 100 and 22100/5000 -> 2900. This illustrates why batching operations in a single transaction is more efficient than multiple transactions.
6. Advanced Topics: Precompiles, CREATE2, and Gas Refunds
6.1 Precompiled Contracts
Ethereum includes a set of precompiled contracts at specific addresses (1 to 9, plus later additions) that implement cryptographic primitives directly in the client software, bypassing the EVM bytecode interpretation. These precompiles are highly optimized native functions that cost a fixed gas per operation. Examples:
- ecrecover (0x01): Recovers an Ethereum address from an ECDSA signature. Cost: 3000 gas.
- sha256 (0x02): SHA-256 hash. Cost: 60 + 12 * data_word_len.
- ripemd160 (0x03): RIPEMD-160 hash. Cost: 600 + 120 * data_word_len.
- identity (0x04): Copies data. Used for cheap memory-to-memory copying.
- modexp (0x05): Modular exponentiation (used in BLS signatures, RSA). Cost: highly variable, defined by EIP-198.
- ecadd, ecmul, ecpairing (0x06, 0x07, 0x08): Elliptic curve operations on the bn128 curve (used for zk-SNARKs).
- blake2f (0x09): Blake2 compression function (for zk-STARKs).
These precompiles are called via the CALL opcode with a specific address. The gas cost is deducted from the calling contract’s gas, and the precompile runs in constant time (or predictable time) to prevent timing attacks.
6.2 Contract Creation: CREATE and CREATE2
Creating a new contract requires a special transaction (to address zero) or internal call using CREATE or CREATE2. The CREATE opcode takes as arguments: value (ETH to send), memory offset, byte length of the initialization code. It returns the address of the new contract, computed as keccak256(rlp([sender_nonce])). CREATE2 uses a user-provided salt, allowing deterministic contract addresses regardless of nonce.
The gas cost for CREATE is 32,000 base gas plus 200 gas per byte of the deployed code (the runtime bytecode). Additionally, the initialization code consumes gas as it runs. The total gas for contract creation can be substantial. For example, deploying a simple Counter contract with 200 bytes of runtime code might cost: 32,000 + 200*200 = 72,000 gas for the creation itself, plus the gas for running the constructor (which often just returns the runtime code). So a typical ERC20 token deployment can easily exceed 200,000 gas.
6.3 Gas Refunds
When you clear a storage slot (set it to zero), you get a gas refund of 4,800 gas (EIP-3529, reduced from 15,000). Also, when you SELFDESTRUCT a contract, you get a refund of 24,000 gas (but only if the contract is not already in the process of being destroyed). However, refunds are capped at 20% of the total gas used. The rationale: refunds incentivize state cleanup, but absolute refunds could be gamed to create cheap transactions that bloat state temporarily. The cap prevents abuse.
For example, if you execute a transaction that clears 10 storage slots, you’d be entitled to a refund of 48,000 gas. But if the total gas used is 100,000, the refund is capped at 20% of 100,000 = 20,000. So you only get 20,000 back. This encourages users to clean up state but not excessively.
7. Real-World Gas Optimization Strategies
Understanding bytecode empowers developers to write cheaper contracts. Here are some proven strategies:
7.1 Packing Variables in Storage
Storage slots are 256 bits. If you have multiple small unsigned integers (e.g., uint64, uint128), the Solidity compiler can pack them into a single slot. This reduces the number of SSTORE and SLOAD operations. For example:
struct Data {
uint128 a;
uint128 b;
}
This uses one slot for both a and b. Reading both requires one SLOAD (2100) instead of two (4200). Writing both requires one SSTORE (5000 or 22100) instead of two.
7.2 Using calldata Instead of memory
In Solidity, function parameters declared as memory are copied to memory, which costs gas for memory expansion and copying. Declaring them as calldata avoids the copy. For example:
function process(uint256[] calldata data) external {
// data is read directly from calldata, no copy
}
This saves gas for large arrays.
7.3 Using unchecked Blocks for Arithmetic
As of Solidity 0.8.0, arithmetic overflow checks are enabled by default. For loops where overflow cannot happen (e.g., i++ bounded by array length), you can wrap the operation in an unchecked block to skip the overflow check, saving a few gas per iteration:
for (uint256 i = 0; i < arr.length; ) {
// ...
unchecked { i++; }
}
7.4 Batched Operations and Warm Storage
As we saw, repeated access to the same storage slot within one transaction is cheaper due to warm/cold semantics. Therefore, batch all updates to the same variable in a single transaction. For example, if you have a withdrawal function that updates a user’s balance, structure the logic so that the storage slot is read once and written once, rather than multiple times.
7.5 Using require Instead of assert
require consumes all remaining gas on failure (reverting), while assert used to consume all gas (in older versions) but now also reverts with INVALID opcode (costing all remaining gas). Actually, assert in Solidity 0.8+ uses REVERT as well, but it’s still better to use require for input validation because require’s error message can be short while assert is for invariants. The gas difference is negligible, but best practice is require for external conditions.
8. Security Implications of Bytecode
Understanding the EVM’s execution model is crucial for writing secure contracts. Many famous hacks exploit subtle bytecode-level behaviors.
8.1 Reentrancy on the Bytecode Level
The 2016 DAO hack exploited a reentrancy vulnerability where the attacker’s fallback function (called via a CALL in the victim’s code) re-entered the victim before its state variable was updated. From a bytecode perspective, the sequence was:
SLOAD(balance of sender) -> check > 0PUSH ... CALL(send ether to sender) -> the sender’s fallback can callwithdrawagainSSTORE(deduct balance) -> happens after the call
Because SSTORE hadn’t been executed yet, the second withdraw call saw the old balance. The fix: use the “Checks-Effects-Interactions” pattern, which ensures all state updates (SSTORE) happen before any external call (CALL). The bytecode ordering enforces this.
8.2 Short Address Attacks (Abi Encoded Parameters)
Before Solidity 0.5.0, the ABI encoder did not properly pad dynamic arrays. An attacker could craft a malicious calldata that omitted the final bytes of an address, causing the EVM to read from the stack or memory incorrectly. This was a bytecode-level exploit in the calldata parsing. Newer compilers include extra checks.
8.3 Immutables and Constant Propagation
Solidity’s immutable variables are compiled into PUSH (immediate) values stored in the bytecode, rather than storage reads. This makes them cheaper to read (3 gas for PUSH vs 2100 for SLOAD). But if an attacker can tamper with the bytecode (impossible on mainnet), they could change those values. For contracts, immutable is secure because the bytecode is committed at deployment.
9. The Future: EVM Upgrades and Gas Evolution
Ethereum development continues. Upcoming or recent upgrades that affect gas include:
- EIP-1559: Redesigned fee market with base fee burned and priority tip. Gas price is now dynamic per block.
- EIP-2929: Introduced cold/warm storage tiers (as discussed). Reduced the cost of repeated storage accesses.
- EIP-3529: Reduced gas refund for
SSTOREclearing to 4800 and capped at 20% of total gas (reduced from 15,000 and 50% cap). - EIP-4488: Proposed to reduce calldata cost from 16 gas per byte to 3 gas per byte (to help rollups). Not yet live.
- Account Abstraction (ERC-4337): Will allow smart contract wallets to manage gas payments, potentially enabling batch transactions with single gas payment.
The EVM is also being extended with EOF (EVM Object Format) to allow more efficient bytecode compression and validation. The gas model will continue to be refined to match real-world computation costs while preventing abuse.
Conclusion: From Black Box to Open Book
We have journeyed from the high-level concept of a “world computer” down to the bare metal of stack operations, gas tables, and storage trie modifications. The Ethereum Virtual Machine is no longer a black box. It is a carefully engineered, deterministic stack machine where every opcode has a purpose and a price. Gas is not an arbitrary fee; it is the economic binding that aligns incentives, prevents infinite loops, and makes decentralized execution economically viable.
For developers, understanding EVM bytecode and gas is the difference between writing naive contracts that cost users a fortune and crafting efficient, secure, and trustless applications that scale. It allows you to read assembly output from the compiler, spot inefficiencies, and appreciate why some optimizations work. For users, it demystifies why gas prices spike and why certain transactions are cheaper than others.
The blockchain is not magic. It is a symphony of precise mathematical rules, executed by thousands of nodes in perfect harmony. At the heart of that symphony lies the EVM, ticking through instructions, one bytecode at a time, each step metered by the timeless dance of gas. Now you can hear its rhythm.
Note: This article assumes knowledge of basic blockchain concepts. Gas costs referenced are from the London/EIP-1559 era. Always check the latest Ethereum specification for current costs. The EVM is a living standard; upgrade with the chain.