All Study Guides Advanced Computer Architecture Unit 3
🥸 Advanced Computer Architecture Unit 3 – Advanced Pipelining: Techniques & HazardsAdvanced pipelining techniques are crucial for enhancing processor performance. By dividing instruction execution into stages and employing strategies like superpipelining and out-of-order execution, processors can achieve higher clock frequencies and increased throughput.
However, these advanced techniques introduce challenges such as data and control hazards. To mitigate these issues, processors employ sophisticated methods like forwarding, branch prediction, and speculative execution, balancing performance gains with the complexities of managing dependencies and resource conflicts.
Pipelining Basics Recap
Pipelining improves processor performance by overlapping the execution of multiple instructions
Divides instruction execution into stages (fetch, decode, execute, memory access, write back)
Each stage operates concurrently on different instructions
Enables higher clock frequencies and increased throughput
Requires careful management of dependencies and hazards to ensure correct execution
Data dependencies occur when an instruction relies on the result of a previous instruction
Control dependencies arise from branch instructions that alter the program flow
Advanced Pipeline Stages
Superpipelining increases the number of pipeline stages to achieve higher clock frequencies
Superpipelined processors (Intel Pentium 4) have deeper pipelines with more stages
Splitting complex stages (execute) into multiple substages allows for shorter clock cycles
Additional stages may include address generation, register read/write, and branch resolution
Introduces more opportunities for hazards and requires advanced techniques to mitigate them
Forwarding paths become longer and more complex
Branch prediction accuracy becomes critical to avoid pipeline stalls
Careful balancing of stage latencies is necessary to prevent performance bottlenecks
Instruction-Level Parallelism
ILP refers to the ability to execute multiple independent instructions simultaneously
Pipelining exploits ILP by overlapping the execution of instructions
Out-of-order execution allows instructions to be executed in a different order than the program sequence
Requires hardware to track dependencies and reorder instructions
Enables better utilization of pipeline resources and higher ILP
Superscalar architectures issue multiple instructions per clock cycle to multiple execution units
Very Long Instruction Word (VLIW) architectures bundle multiple operations into a single instruction
ILP is limited by data dependencies, control dependencies, and resource constraints
Data Hazards and Forwarding
Data hazards occur when an instruction depends on the result of a previous instruction still in the pipeline
Read After Write (RAW) hazards are the most common type of data hazard
Forwarding (bypassing) is a technique used to mitigate RAW hazards
Forwards the result of an instruction directly to the dependent instruction, bypassing pipeline stages
Requires additional forwarding paths and control logic
Load-use hazards occur when an instruction uses the result of a load immediately after it
Difficult to forward due to the delay in memory access
Can be mitigated using delayed load or load forwarding techniques
Compiler optimization techniques (instruction scheduling) can help reduce data hazards
Control Hazards and Branch Prediction
Control hazards occur due to branch instructions that alter the program flow
Pipeline stalls occur when a branch is resolved, and the fetched instructions are discarded
Branch prediction techniques are used to mitigate control hazards
Static branch prediction uses heuristics (backward taken, forward not taken) to predict branch outcomes
Dynamic branch prediction uses runtime information to make more accurate predictions
Branch history tables (BHTs) store the history of branch outcomes
Two-level adaptive predictors (local and global history) improve prediction accuracy
Branch delay slots allow useful instructions to be executed while a branch is being resolved
Speculative execution fetches and executes instructions based on predicted branch outcomes
Requires mechanisms to discard speculative results if the prediction is incorrect
Pipeline Stalls and Bubbles
Pipeline stalls occur when an instruction cannot proceed to the next stage due to a hazard or resource conflict
Stalls introduce bubbles (empty stages) in the pipeline, reducing performance
Data hazards can cause stalls if the required data is not available in time
Forwarding and bypassing techniques help reduce data hazard stalls
Control hazards lead to stalls when a branch is mispredicted, and the pipeline needs to be flushed
Accurate branch prediction minimizes control hazard stalls
Structural hazards arise when multiple instructions compete for the same hardware resource
Sufficient hardware replication (multiple execution units) can alleviate structural hazards
Out-of-order execution and dynamic scheduling help reduce stalls by allowing independent instructions to proceed
Instruction prefetching fetches instructions ahead of time to reduce cache misses and pipeline stalls
Branch prediction and speculative execution optimize the handling of control hazards
Out-of-order execution and dynamic scheduling maximize resource utilization and minimize stalls
Register renaming eliminates false dependencies (WAR and WAW hazards) by using a larger set of physical registers
Superscalar and VLIW architectures exploit ILP by issuing multiple instructions per cycle
Compiler optimizations (loop unrolling, software pipelining) help expose more ILP and reduce hazards
Cache optimization techniques (prefetching, cache hierarchies) reduce memory access latencies
Multithreading allows multiple threads to share pipeline resources, hiding latencies and improving throughput
Real-world Pipeline Implementations
Intel Core microarchitecture (Skylake) uses a 14-19 stage pipeline with advanced features
Out-of-order execution, branch prediction, and speculative execution
Supports hyper-threading (simultaneous multithreading) for improved throughput
ARM Cortex-A series processors employ deep pipelines and advanced hazard mitigation techniques
Cortex-A77 has a 13-stage pipeline with out-of-order execution and speculation
IBM Power processors (Power9) feature a deep pipeline and aggressive optimization techniques
Out-of-order execution, branch prediction, and simultaneous multithreading
AMD Zen microarchitecture (Ryzen) utilizes a high-performance pipeline with advanced features
Perceptron-based branch prediction, micro-op cache, and large instruction windows
RISC-V architectures implement various pipeline designs based on the implementation
Range from simple in-order pipelines to advanced out-of-order designs with speculation