Advanced Computer Architecture

🥸Advanced Computer Architecture Unit 3 – Advanced Pipelining: Techniques & Hazards

Advanced pipelining techniques are crucial for enhancing processor performance. By dividing instruction execution into stages and employing strategies like superpipelining and out-of-order execution, processors can achieve higher clock frequencies and increased throughput. However, these advanced techniques introduce challenges such as data and control hazards. To mitigate these issues, processors employ sophisticated methods like forwarding, branch prediction, and speculative execution, balancing performance gains with the complexities of managing dependencies and resource conflicts.

Pipelining Basics Recap

  • Pipelining improves processor performance by overlapping the execution of multiple instructions
  • Divides instruction execution into stages (fetch, decode, execute, memory access, write back)
  • Each stage operates concurrently on different instructions
  • Enables higher clock frequencies and increased throughput
  • Requires careful management of dependencies and hazards to ensure correct execution
    • Data dependencies occur when an instruction relies on the result of a previous instruction
    • Control dependencies arise from branch instructions that alter the program flow

Advanced Pipeline Stages

  • Superpipelining increases the number of pipeline stages to achieve higher clock frequencies
  • Superpipelined processors (Intel Pentium 4) have deeper pipelines with more stages
  • Splitting complex stages (execute) into multiple substages allows for shorter clock cycles
  • Additional stages may include address generation, register read/write, and branch resolution
  • Introduces more opportunities for hazards and requires advanced techniques to mitigate them
    • Forwarding paths become longer and more complex
    • Branch prediction accuracy becomes critical to avoid pipeline stalls
  • Careful balancing of stage latencies is necessary to prevent performance bottlenecks

Instruction-Level Parallelism

  • ILP refers to the ability to execute multiple independent instructions simultaneously
  • Pipelining exploits ILP by overlapping the execution of instructions
  • Out-of-order execution allows instructions to be executed in a different order than the program sequence
    • Requires hardware to track dependencies and reorder instructions
    • Enables better utilization of pipeline resources and higher ILP
  • Superscalar architectures issue multiple instructions per clock cycle to multiple execution units
  • Very Long Instruction Word (VLIW) architectures bundle multiple operations into a single instruction
  • ILP is limited by data dependencies, control dependencies, and resource constraints

Data Hazards and Forwarding

  • Data hazards occur when an instruction depends on the result of a previous instruction still in the pipeline
  • Read After Write (RAW) hazards are the most common type of data hazard
  • Forwarding (bypassing) is a technique used to mitigate RAW hazards
    • Forwards the result of an instruction directly to the dependent instruction, bypassing pipeline stages
    • Requires additional forwarding paths and control logic
  • Load-use hazards occur when an instruction uses the result of a load immediately after it
    • Difficult to forward due to the delay in memory access
    • Can be mitigated using delayed load or load forwarding techniques
  • Compiler optimization techniques (instruction scheduling) can help reduce data hazards

Control Hazards and Branch Prediction

  • Control hazards occur due to branch instructions that alter the program flow
  • Pipeline stalls occur when a branch is resolved, and the fetched instructions are discarded
  • Branch prediction techniques are used to mitigate control hazards
    • Static branch prediction uses heuristics (backward taken, forward not taken) to predict branch outcomes
    • Dynamic branch prediction uses runtime information to make more accurate predictions
      • Branch history tables (BHTs) store the history of branch outcomes
      • Two-level adaptive predictors (local and global history) improve prediction accuracy
  • Branch delay slots allow useful instructions to be executed while a branch is being resolved
  • Speculative execution fetches and executes instructions based on predicted branch outcomes
    • Requires mechanisms to discard speculative results if the prediction is incorrect

Pipeline Stalls and Bubbles

  • Pipeline stalls occur when an instruction cannot proceed to the next stage due to a hazard or resource conflict
  • Stalls introduce bubbles (empty stages) in the pipeline, reducing performance
  • Data hazards can cause stalls if the required data is not available in time
    • Forwarding and bypassing techniques help reduce data hazard stalls
  • Control hazards lead to stalls when a branch is mispredicted, and the pipeline needs to be flushed
    • Accurate branch prediction minimizes control hazard stalls
  • Structural hazards arise when multiple instructions compete for the same hardware resource
    • Sufficient hardware replication (multiple execution units) can alleviate structural hazards
  • Out-of-order execution and dynamic scheduling help reduce stalls by allowing independent instructions to proceed

Performance Optimization Techniques

  • Instruction prefetching fetches instructions ahead of time to reduce cache misses and pipeline stalls
  • Branch prediction and speculative execution optimize the handling of control hazards
  • Out-of-order execution and dynamic scheduling maximize resource utilization and minimize stalls
  • Register renaming eliminates false dependencies (WAR and WAW hazards) by using a larger set of physical registers
  • Superscalar and VLIW architectures exploit ILP by issuing multiple instructions per cycle
  • Compiler optimizations (loop unrolling, software pipelining) help expose more ILP and reduce hazards
  • Cache optimization techniques (prefetching, cache hierarchies) reduce memory access latencies
  • Multithreading allows multiple threads to share pipeline resources, hiding latencies and improving throughput

Real-world Pipeline Implementations

  • Intel Core microarchitecture (Skylake) uses a 14-19 stage pipeline with advanced features
    • Out-of-order execution, branch prediction, and speculative execution
    • Supports hyper-threading (simultaneous multithreading) for improved throughput
  • ARM Cortex-A series processors employ deep pipelines and advanced hazard mitigation techniques
    • Cortex-A77 has a 13-stage pipeline with out-of-order execution and speculation
  • IBM Power processors (Power9) feature a deep pipeline and aggressive optimization techniques
    • Out-of-order execution, branch prediction, and simultaneous multithreading
  • AMD Zen microarchitecture (Ryzen) utilizes a high-performance pipeline with advanced features
    • Perceptron-based branch prediction, micro-op cache, and large instruction windows
  • RISC-V architectures implement various pipeline designs based on the implementation
    • Range from simple in-order pipelines to advanced out-of-order designs with speculation


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.