Print indd

Download 18,42 Mb.

Pdf ko'rish

bet	366/366
Sana	31.12.2021
Hajmi	18,42 Mb.
	#276933

1 ... 358 359 360 361 362 363 364 365 366

Bog'liq
(Lecture Notes in Computer Science 10793) Mladen Berekovic, Rainer Buchty, Heiko Hamann, Dirk Koch, Thilo Pionteck - Architecture of Computing Systems – ARCS

Document Outline

Preface
Organization
Biologically-Inspired Massively-Parallel Computation (Keynote Talk)
Contents
Embedded Systems
Trade-Off Between Performance, Fault Tolerance and Energy Consumption in Duplication-Based Taskgraph Scheduling
- 1 Introduction
- 2 The Trade-Off Problem
- 3 Fault Tolerant and Energy Efficient Scheduling
  - 3.1 Previous Approach
  - 3.2 Extensions
- 4 Runtime System
  - 4.1 System Check Tool
  - 4.2 Scheduler and User Preferences
  - 4.3 Runtime System
- 5 Power Model
  - 5.1 Model Validation
  - 5.2 Real-World Evaluation
- 6 Experimental Results
- 7 Conclusions
- References
Lipsi: Probably the Smallest Processor in the World
- 1 Introduction
- 2 Related Work
- 3 The Lipsi Design
  - 3.1 The Datapath
  - 3.2 The Instruction Set
  - 3.3 Implementation and Assembly in Hardware
  - 3.4 Simulation and Testing
  - 3.5 Developing a Processor
- 4 Evaluation and Discussion
  - 4.1 Resource Consumption
  - 4.2 The Smallest Processor?
  - 4.3 A Lipsi Manycore Processor
  - 4.4 Lipsi in Teaching
  - 4.5 Source Access
- 5 Conclusion
- References
Superlinear Scalability in Parallel Computing and Multi-robot Systems: Shared Resources, Collaboration, and Network Topology
- 1 Introduction
  - 1.1 Superlinear Performance in Multi-robot Systems
  - 1.2 Universal Scalability Law
- 2 Unified Interpretation Across Fields of Research
- 3 Results
  - 3.1 Stick Pulling: Shared Resources and Collaboration
  - 3.2 Parallel Optimization: Network Topologies and Information Flow
- 4 Discussion and Conclusion
- References
Multicore Systems
Closed Loop Controller for Multicore Real-Time Systems
- 1 Introduction
- 2 Related Work
- 3 Closed Performance Control Loop
  - 3.1 Basic Fingerprinting
  - 3.2 Pulse Width Modulated Interferences
  - 3.3 Closed Loop Controller
- 4 Evaluation
  - 4.1 PWM Effectiveness
  - 4.2 Closed Loop Controller
- 5 Conclusion
- References
Optimization of the GNU OpenMP Synchronization Barrier in MPSoC
- 1 Introduction
- 2 Related Work
- 3 The GNU OpenMP Synchronization Barrier Mechanism
  - 3.1 Code Parallelization and Synchronization
  - 3.2 Active Wait and GNU OpenMP Policy
- 4 Experimentation Environment
  - 4.1 TSAR Manycore Architecture
  - 4.2 Evaluation Platform
  - 4.3 A Non Intrusive Measurement Tool Chain
- 5 Active Wait Optimization for GNU OpenMP Synchronization Barrier
  - 5.1 Barrier Mechanism Measurements and Study
  - 5.2 Optimization Proposal
  - 5.3 Micro-benchmark Results
  - 5.4 Performances Evaluation on the NAS Benchmark IS Application
- 6 Conclusion
- References
Analysis and Optimization
Ampehre: An Open Source Measurement Framework for Heterogeneous Compute Nodes
- 1 Introduction
- 2 Architecture and Components of Ampehre
  - 2.1 Extended PAPI Library
  - 2.2 Ampehre Library API
  - 2.3 Ampehre Tools
- 3 Example: Measuring Energy on CPU and GPU
- 4 Balancing Accuracy and Overhead
- 5 Availability and Extensibility of Ampehre
- 6 Conclusion
- References
A Hybrid Approach for Runtime Analysis Using a Cycle and Instruction Accurate Model
- 1 Introduction
- 2 Related Work
- 3 Proposed Methodology
  - 3.1 Analyzing the Program
  - 3.2 Running the Simulation
- 4 Evaluation
  - 4.1 Metric
  - 4.2 Results
- 5 Conclusion
- References
On-chip and Off-chip Networks
A CAM-Free Exascalable HPC Router for Low-Energy Communications
- 1 Introduction
- 2 Related Work
- 3 ExaNeSt System Architecture
  - 3.1 Router Architecture
  - 3.2 Routing Algorithms
- 4 Evaluation
  - 4.1 Experimental Setup
  - 4.2 Area
  - 4.3 Power Consumption
  - 4.4 Performance
- 5 Conclusions and Future Work
- References
Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips
- 1 Introduction
- 2 Related Work and Background
- 3 Synchronization Concept
- 4 Hardware Supported ready Synchronization
  - 4.1 Hardware Implementation
  - 4.2 New Instructions
  - 4.3 Impact of Ready Synchronization on Hardware Size
- 5 Evaluation
  - 5.1 Comparison of Ready Synchronization in Software and Hardware
  - 5.2 Execution Times
  - 5.3 Impact on Hardware Costs
- 6 Conclusion
- References
-1Network Optimization for Safety-Critical Systems Using Software-Defined Networks
- 1 Introduction
- 2 Related Work
- 3 Problem Formulation
- 4 Experimental Setup
  - 4.1 Assumptions
  - 4.2 Baseline
- 5 Numerical Results and Discussion
  - 5.1 Standard Networks
  - 5.2 Critical Networks
- 6 Conclusion and Future Work
- References
CaCAO: Complex and Compositional Atomic Operations for NoC-Based Manycore Platforms
- 1 Introduction
- 2 Related Work
- 3 Complex and Compositional Atomic Operations
  - 3.1 Comparison of the Synchronization Primitives () and ()
  - 3.2 CaCAO Approach ()
- 4 Implementation Aspects
- 5 Experimental Setup and Results
- 6 Conclusion and Future Work
- References
Memory Models and Systems
Redundant Execution on Heterogeneous Multi-cores Utilizing Transactional Memory
- 1 Introduction
- 2 Related Work
- 3 Transaction-Based Redundant Execution Model
  - 3.1 Loosely-Coupled Redundancy with Checkpoints
  - 3.2 Extension of HTM to Support Fault Tolerance
  - 3.3 Heterogeneous Redundant Systems
- 4 Evaluation
- 5 Conclusion
- References
Improving the Performance of STT-MRAM LLC Through Enhanced Cache Replacement Policy
- 1 Introduction
- 2 Related Work
- 3 Motivation and Approach
  - 3.1 Motivational Example
  - 3.2 Writes Operations at Last-Level Cache
  - 3.3 Cache Replacement Policy
- 4 Experimental Results
  - 4.1 Environment Setup
  - 4.2 Results
- 5 Conclusion and Perspectives
- References
On Automated Feedback-Driven Data Placement in Multi-tiered Memory
- 1 Introduction
- 2 Related Work
- 3 Feedback-Driven Data Placement for Hybrid Memories
  - 3.1 Allocation Site Partitioning
  - 3.2 Profile-Guided Management
- 4 Implementation Details
  - 4.1 Associating Memory Usage Profiles with Program Allocation Sites
  - 4.2 Hybrid Memory Management
- 5 Experimental Framework
  - 5.1 Simulation Platform
  - 5.2 Benchmarks Description
- 6 Evaluation
  - 6.1 Baseline Configurations
  - 6.2 Static Application Guidance
  - 6.3 Adaptive Application Guidance
  - 6.4 Comparison with OS/Architectural Reactive Profiling
  - 6.5 Performance Summary
- 7 Conclusions and Future Work
- References
Operational Characterization of Weak Memory Consistency Models
- 1 Introduction
- 2 Related Work
- 3 View-Based Definitions of Memory Consistency Models
  - 3.1 Local Consistency
  - 3.2 Cache Consistency (CC)
  - 3.3 Pipelined-RAM (PRAM) Consistency
  - 3.4 Sequential Consistency (SC)
- 4 Operational Definitions of Memory Consistency Models
  - 4.1 Basic Components
  - 4.2 Reference Machine for Local Consistency
  - 4.3 Reference Machine for Cache Consistency
  - 4.4 Reference Machine for PRAM Consistency
  - 4.5 Reference Machine for Sequential Consistency
  - 4.6 Implementation of Reference Machines
- 5 Conclusions and Future Work
- References
Energy Efficient Systems
A Tightly Coupled Heterogeneous Core with Highly Efficient Low-Power Mode
- 1 Introduction
- 2 Existing TCHC Architecture
  - 2.1 Composite Core
  - 2.2 Front-End Execution Architecture
- 3 Dual-Mode Front-End Execution Architecture
  - 3.1 Implementation of LP Mode
  - 3.2 Switching from HP to LP Mode
  - 3.3 Switching from LP to HP Mode
  - 3.4 Execution Correctness
  - 3.5 LP Mode Utilization
  - 3.6 Hardware Cost
- 4 Evaluation
  - 4.1 Evaluation Environment
  - 4.2 Evaluation Results
- 5 Related Work
- 6 Conclusion
- References
Performance-Energy Trade-off in CMPs with Per-Core DVFS
- 1 Introduction
- 2 Related Work
- 3 Model Construction Methodology
  - 3.1 Contention Metrics
  - 3.2 Data Collection
  - 3.3 Building the Model
  - 3.4 Application of the Model
- 4 Comparison of Machine Learning Algorithms
- 5 Evaluation
  - 5.1 Evaluation Setup
  - 5.2 Analysis of the Results
- 6 Conclusion
- References
Towards Fine-Grained DVFS in Embedded Multi-core CPUs
- 1 Introduction
- 2 Related Works
- 3 Fine-Grained DVFS
  - 3.1 DVFS Points Extension
  - 3.2 Overhead Characterization
- 4 Experimental Results
  - 4.1 DVFS Points Extension
  - 4.2 Overhead Characterization
- 5 Conclusions
- References
Partial Reconfiguration
Evaluating Auto-adaptation Methods for Fine-Grained Adaptable Processors
- 1 Introduction
- 2 Approach
  - 2.1 Target Processor
  - 2.2 Proposed Auto-adapting Method
- 3 Implementation
  - 3.1 Common
  - 3.2 Window-Based Monitoring
  - 3.3 BTCB
  - 3.4 Phase Change Annotations
- 4 Evaluation
  - 4.1 Experimental Setup
  - 4.2 Results
- 5 Related Work
- 6 Conclusions
- References
HLS Enabled Partially Reconfigurable Module Implementation
- 1 Introduction
- 2 Related Work
- 3 Model
- 4 Bounding Box Generation
  - 4.1 Overview
  - 4.2 Generation
- 5 Case Study
  - 5.1 Maxeler System and Dataflow
  - 5.2 Static System
  - 5.3 Implemented Modules
  - 5.4 Mitigation Strategies
- 6 Conclusion
- References
Hardware Acceleration in Genode OS Using Dynamic Partial Reconfiguration
- 1 Introduction
- 2 Genode OS
  - 2.1 Microkernel Based System Policy
  - 2.2 Component Communication
- 3 Related Work
- 4 Reconfigurable Hardware
- 5 Reconfiguration Software
  - 5.1 Loading Partial Bitstreams
  - 5.2 Accessing the Configuration Port
  - 5.3 Hardware Scheduler
  - 5.4 Hardware Acceleration
- 6 Exemplary Use Case and Evaluation
- 7 Conclusion
- References
Large Scale Computing
Do Iterative Solvers Benefit from Approximate Computing? An Evaluation Study Considering Orthogonal Approximation Methods
- 1 Introduction
  - 1.1 Current Status
  - 1.2 Methodology of the Evaluation
  - 1.3 Main Findings
- 2 Mathematical Background and Data Generation
- 3 Approximation Computing Methods
  - 3.1 Relaxed Synchronization
  - 3.2 Sampling
  - 3.3 On the Data Type Level
  - 3.4 Input Data Approximation
- 4 Experiments
  - 4.1 Evaluation Metrics
  - 4.2 Influence of Approximate Computing on the Data Type Level
  - 4.3 Analysis of Approximate Computing Loop Strategies
  - 4.4 Accuracy Degradation Caused by Relaxed Synchronization
  - 4.5 Input Approximation
  - 4.6 Putting Everything Together
  - 4.7 Discussion
- 5 Conclusion and Future Directions
- References
A Flexible FPGA-Based Inference Architecture for Pruned Deep Neural Networks
- 1 Introduction and Motivation
- 2 Related Work
- 3 Concept
- 4 Architecture
- 5 Experimental Results
- 6 Conclusions
- References
Author Index

Download 18,42 Mb.

Do'stlaringiz bilan baham:

1 ... 358 359 360 361 362 363 364 365 366