网站首页 所有分类 注册 登陆 |
||
| | 文学 | 人文社科 | 经济与管理 | 外语 | 艺术与摄影 | 科技 | 计算机 | 工具书 | 外文原版 | 学术 | 母婴少儿 | 生活时尚 | 教育考试 |
| 当前位置:首页->计算机->电子工程-> ->可扩展并行计算--技术、结构与编程 | |
收藏 |
可扩展并行计算--技术、结构与编程作者:Hwang,黄铠,Zhiwei,分类:电子工程 人气: 装帧:平装 / 16开 / 802页 / 0字 ISBN(10位/13位):711107176X 出版:机械工业出版社于1999-11-30出版 定价:¥69元 标签(Tags):计算机 电子工程 收藏人数: |
| 简介: |
| Kai Hwang, Zhiwei Xu: Scalable Parallel Computing: Technology, Architecture, Programming. Copyright ?1998 by The McGraw-Hill Companies, Inc. All rights reserved. Jointly published by China Machine Press/McGraw -Hill. This edition may be sold in the People Republic of China only. This book cannot be re-exported and is not for sale outside the People Republic of China. |
| 目录: |
| Table of Contents About the Authors Iv Foreword xv Preface xvI Guide to Instructors/Readers xIx Part l Scalability and Clustering Chapter l Scalable Computer Platforms and Models l.l Evolution ofComputer Architecture l.l.l Computer Generations l.l.2 Scalable Computer Architectures l.l.3 Converging System Architectures l.2 Dimensions ofScalability l.2.l Resource Scalability 1.2.2 Application Scalability 1.2.3 Technology Scalability 1.3 Parallel Computer Models l.3.l Semantic Attributes 1.3.2 Performance Attributes l.3.3 Abstract Machine Model l.3.4 Physical Machine Model 1.4 BasicConceptsofClustering l.4.l Cluster Characteristics 1.4.2 Architectural Comparisons l.4.3 BenefitsandDifficultiesofChisters 1.5 Scalable Design Principles l.5.l Principle of Independence l.5.2 Principle ofBalanced Design l.5.3 Design for Scalability 1.6 Bibliographic Notes and Problems Chapter 2 Basics of Parallel Programming 2.1 Parallel Programming Overview 2.l.l Why Is Parallel Programming Difficult? 2.1.2 Parallel Programming Environments 2.l.3 Parallel Programming Approaches 2.2 Processes, Tasks, and Threads 2.2.1 DefinitionsofanAbstractProcess 2.2.2 Execution Mode 2.2.3 Address Space 2.2.4 Process Context 2.2.5 Process Descriptor 2.2.6 Process Control 2.2.7 Variations ofProcess 2.3 Parallelism Issues 2.3.1 Homogeneity in Processes 2.3.2 Static versus Dynamic Parallelism 2.3.3 Process Grouping. 2.3.4 Allocation Issues 2.4 Interaction/Communication Issues 2.4.1 Interaction Operations 2.4.2 Interaction Modes 2.4.3 Interaction Pattems 2.4.4 Cooperative versus Competitive Interactions 2.5 Semantic Issues in ParaUel Programs 2.5.1 Program Tennination 2.5.2 Determinacy ofPrograms 2.6 Bibliographic Notes and Problems Chapter 3 Performance Metrics and Benchmarks 3.1 System and Applicatioo Benchmarks 3.1.1 Micro Benchmarks 3.1.2 Parallel Computing Benchmarks 3.1.3 Business and TPC Benchmarks 3.1.4 SPEC Benchmark Family 3.2 Perfonnance versus Cost 3.2.1 Execution Time and Throughput 3.2.2 Utilization and Cost-Effectiveness 3.3 Basic Performance Metrics 3.3.1 Workload and Speed Metrics 3.3.2 Caveats in Sequential Performance 3.4 PerfonnanceofParallelComputers 3.4.1 Computatiomal Characteristics 3.4.2 Parallelism and Interaction Overheaas 3.4.3 Overhead Quantification 3.5 Performance of Parallel Programs 3.5.1 Performance Metrics 3.5.2 Available Parallelism in Benchmarks 3.6 Scalability and Speedup Analysis 3.6.1 Amdahl's Law: Fixed Problem Size 3.6.2 Gustafson's Law: FixedTime 3.6.3 Sun and Ni's Eaw: Memory Bounding 3.6.4 Isoperformance Models 3.7 Bibliographic Notes-aod Problems Part II Enabling Technologies Chapter 4 Microprocessors as Building Blocks 4.1 System Development Trends 4.l.l Advances in Hardware 4.1.2 Advances in Software 4.l.3 Advances in Applications 4.2 PrinciplesofProcessorDesign 4.2.1 BasicsoflnstructionPipeline 4.2.2 From ClSC to RlSC and Beyond 4.2.3 Architectural Enhancement Approaches 4.3 Microprocessor Architecture Families 4.3.1 Major Architecture Familiei 4.3.2 Superscalar versus Superpipelined Processors 4.3.3 Embedded Microprocessors 4.4 Case Studies of Microprocessors 4.4.l Digital's Alpha 21 164 Microprocessor 4.4.2 Intel Pentium Pro Processor 4.5 Post-RlSC, Multimedia, and VLlW 4.5.1 Post-RlSC Processor Features 4.5.2 Multimedia Extensions 4.5.3 TheVLlWArchitecture 4.6 The Future of Microprocessors 4.6.l Hardware Trends and Physical Limits 4.6.2 Future Workloads and Challenges , 4.6.3 Future Microprocessor Architectures 4.7 Bibliographic Notes and Problems Chapter 5 Distributed Memory and Latency Tolerance 5.1 Hierarchical Memory Technology 5.l.l Characteristics of Storage Devices 5.1.2 Memory Hierarchy Properties 5.l.3 Memory Capacity Planning 5.2 Cache Cohereoce Protocob 5.2.1 Cache Coherency Problem 5.2.2 Snoopy Coherency Protocols 5.2.3 The MESl Snoopy Protocol 5.3 Shared-Memory Consistency 5.3.1 Memory Event Ordering 5.3.2 Memory Consistency Models 5.3.3 Relaxed Memory Models 5.4 Distributed Cache/Memory Architecture 5.4.l NORMA, NUMA, COMA, and DSM Models 5.4.2 Directory-Based Coherency Protocol 5.4.3 The Stanford Dash Multiprocessor 5.4.4 Directory-Based Protocol in Dash 5.5 Latency Tolerance Techniques 250 5.5.1 Latency Avoidance, Reduction, and Hiding 5.5.2 Distributed Coherent Caches 5.5.3 Data Prefetching Strategies 5.5.4 Effects of Relaxed Memory Consistency 5.6 Multithreaded Latency Hiding 5.6.1 Multithreaded Processor Model 5.6.2 Context-Switehing Policies 5.6.3 Combining Latency Hiding Mechanisms 5.7 Bibliographic Notes and Problems Chapter 6 System Interconnects and Gigabit Networks 6.1 Basics of Interconnection Network 6.1.1 Interconnection Environmnents 6.1.2 Networik Components 6.1.3 Network Characteristics 6.1.4 Network Performance Metrics 6.2 Network Topologies and Properties 6.2.1 Topological and Functional Properties 6.2.2 Routing Schemes and Functions 6.2.3 Networidng Topologies 6.3 Buses, Crossbar, aod Multistage Switehes 6.3.1 Multiprocessor Buses 6.3.2 Crossbar Switches 6.3.3 Multistage Interconnection Networks 6.3.4 Comparison of Switched Interconnects 6.4 Gigabit Network Technologies 6.4.1 Fiber Channel and FDDI Rings 6.4.2 Fast Ethemet and Gigabit Ethemel 6.4.3 Myrinet for SAN/LAN Construction 6.4.4 HiPPI and SuperHiPPI 6.5 ATM Switches and Networks 6.5.1 ATM Technology 6.5.2 ATMNetworkInterfaces 6.5.3 Four Layers of ATM Architecture 6.5.4 ATM Intemetwork Connectivity 6.6 Scalable Coherence Interfaee 6.6.1 SCI Interconmects 6.6.2 Implementation Issues 6.6.3 SCI Coherence Protocol 6.7 ComparisoD of Network Technologies 6.7.1 Standard Networks and Perspectives 6.7.2 Network Performance arid Applications 6.8 Bibliographic Notes and Problems Chapter 7 Threading, Synchronization, and Communication 7.1 Software Multithreading 7.1.1 TheThreadConcept 7.1.2 Threads Management 7.1.3 Thread Synchronization 7.2 Synchronization Mechanisms 7.2.l Atomicity versus Mutual Exclusion 7.2.2 High-Level Synchronization Constructs 7.2.3 Low-Level Synchronization Primitiyes 7.2.4 Fast Locking Mechanisms 7.3 The TCP/lP Communication Protocol Suite 7.3.l Features of The TCP/IP Suite 7.3.2 UDP.TCP.andlP 7.3.3 The Sockets Interface 7.4 Fast and Efficient Coramunication 7.4.l Key Problems in Communication 7.4.2 The Log P Communication Model 7.4.3 Low-Level Communications Support 7.4.4 Communication Algorithms 7.5 Bibliographic Notes and Problems Part lll Systems Architecture Chapter 8 Symmetric and CC-NUMA Multiprocessors 8.1 SMP and CC-NUMA Technology 8.l.l Multiprocessor Architecture 8.1.2 Commercial SMP Servers 8.1.3 ThelntelSHVServerBoara 8.2 Sun Ultra Enterprise lOOOO System 8.2.l The Ultra E- l 0000 Architecture 8.2.2 System Board Architecture 8.2.3 Scalability and Availability Support 8.2.4 Dynamic Domains and Performance 8.3 HP/Convex Exemplar X-Class 8.3.l The Exemplar X System Architecture 8.3.2 Exemplar Software Environment 8.4 The Sequent NUMA-Q 2000 8.4.l The NUMA-Q 2000 Architecture 8.4.2 Software Environment ofNUMA-Q 8.4.3 PerformanceoftheNUMA-Q 8.5 The SGl/Cray Origin 2000 Superserver 8.5.l Design Goals of Origin 2000 Series 8.5.2 The Origin 2000 Architecture 8.5.3 The Cellular IRIX Environment 8.5.4 PerformanceoftheOrigin2000 8.6 Comparison ofCC-NUMA Architectures 8.7 Bibliographic Notes and Problems Chapter 9 Support of Clustering and Availability 9.1 Challenges in Clustering 9.1.1 Classification of Clusters 9.1.2 Cluster Architectures 9.1.3 Cluster Design Issues 9.2 Availability Support for Clusteriog 9.2.1 The Availability Concept 9.2.2 Availability Techniques 9.2.3 Checkpointing and Failure Recbvery 9.3 Support for Single System Image 9.3.1 Single System Image Layers 9.3.2 Single Entry and Single File Hierarchy 9.3.3 Single 1/0, Networking, and Memory Space 9.4 Single System Image in Solaris MC 9.4.1 Global File System 9.4.2 Global Process Management 9.4.3 Single 1/O System Imnage 9.5 Job Management in Clusters 9.5.1 Job Management System 9.5.2 Survey of Job Management Systems 9.5.3 Load-Sharing Facility (LSF) 9.6 Bibliographk Notes and ProNems Chapter 10 Clusters of Servers and Workstations 10.1 Cluster Products and Research Projects 10.1.1 Supporting Trend ofCluster Products 10.1.2 ClusterofSMPServers 10.1.3 ClusterResearchProjects 10.2 Microsoft Wolfpack for NT Clusters 10.2.1 Microsoft Wolfpack Configurations 10.2.2 Hot Standby Multiserver Clusters 10.2.3 Active Availability Clusters 10.2.4 Fault-Tolerant Multiserver Cluster 10.3 The IBM SP System 10.3.1 Design Goals and Strategies 10.3.2 The SP2 System Architecture 10.3.3 1/o and Intemetworking. 10.3.4 The SP System Software 10.3.5 The SP2 and Beyond 10.4 The Digital TruCIuster 10.4.1 The TmCluster Architecture 10.4.2 The Memory Channel Interconnect 10.4.3 Programming the TruCluster 10.4.4 The TruCluster System Software 10.5 The Berkeley NOW Project 10.5.1 Active Messages for Fast Communication 10.5.2 GLUnix for Global Resource Management 10.5.3 ThexFSServerlessNetworkFileSystem 10.6 TreadMarks: A Software-lmplemented DSM Cluster 10.6.1 Boundary Conditions l0.6.2 User Interface for DSM l0.6.3 Implementation Issues l0.7 Bibliographic Notes and Problems Chapter ll MPP Architecture and Performance ll.l An Overview of MPP Technology ll.l.l MPP Characteristics and Issues ll.l.2 MPP Systems - An Overview ll.2 The Cray T3E System ll.2.l The System Architecture of T3E ll.2.2 The System Software in T3E 11.3 New Generation of ASCl/MPPs ll.3.l ASCl Scalable Design Strategy ll.3.2 Hardware and Software Requirements ll.3.3 Contracted ASCI/MPP Platforms 11.4 Intel/Sandia ASCl Option Red ll.4.l The Option Red Architecture ll.4.2 Option Red System Software 11.5 Parallel NAS Benchmark Results ll.5.l The NAS Parallel Benchmarks ll.5.2 Superstep Structure and Granulanty ll.5.3 Memory, VO, and Communications 11.6 MPl and STAP Benchmark Results ll.6.l MPl Performance Measurements ll.6.2 MPl Latency and Aggregate Bandwidth ll.6.3 STAP Benchmark Evaluation of MPPs ll.6.4 MPP Architectural Implications 11.7 Bibliographic Notes and Problems Part IV Part IV Parallel Programming Chapter 12 Parallel Paradigms and Programming Models 12.1 Paradigms and Programmability 12.1.1 Algorithmic Paradigms 12.1.2 Programmability Issues 12.1.3 Parallel Programming Examples 12.2 Parallel Programming Models 12.2.1 Implicit Parallelism 12.2.2 Explicit Parallel Modeis l2.2.3 ComparisonofFourModels l2.2.4 Other Parallel Programming Models 12.3 Shared-Memory ProgrammiBg 12.3.1 The ANSI X3H5 Shared-Memory Model 12.3.2 ThePOSIX Threads(Pthreads)Model 12.3.3 The OpenMP Standard 12.3.4 TheSGIPowerCModel 12.3.5 Cll: A Structured Parallel C Language 12.4 Bibliographic Notes and Problems Chapter 13 Message-Passing Programmmg 13.1 The Message-Passing Paradigm 13.1.1 Message-Passing Libraries 13.1.2 Message-Passing Modes 13.2 Message-Passing Interface (MPI) 13.2.1 MPIMessages 13.2.2 Message Envelope in MPI 13.2.3 Point-to-Point Communications 13.2.4 Collective MPI Communications 13.2.5 The MP1-2 Extensions 13.3 Parallel Virtual Machine (PVM) 13.3.1 Virtual Machine Construction 13.3.2 Process Management in PVM 13.3.3 Communication with PVM 13.4 Bibliographic Notes and Problems Chapter 14 Data-ParalleI Programming 14.1 The Data-Parallel Model 14.2 The Fortran 90 Approach 14.2.1 Parallel Array Operations 14.2.2 Intrinsic Functions in Fortran 90 14.3 High-Performance Fortran 14.3.1 Support for Data Parallelism 14.3.2 DataMappinginHPF 14.3.3 SummaryofFortran90andHPF 14.4 Other Data-Parallel Approaches 14.4.1 Fortran 95andFortran200l 14.4.2 ThepC andNeslApproaches 14.5 Bibliographic Notes and Problems Bibliography Web Resources List Subject Index Author Index |
| 内容摘要: |
| We introduce the four parallel programming models listed below. Details of the models are postponed until Part IV. The parallelizing compiler model The data-pwallel model The message-passing model The shared-memory model Chapter 3 This chapter covers basic, performance benchmarks and metrics. The purpose is to identify attributes toward scalable perfonnance. We start with a comprehen- sive introduction ofparallel benchmark suites. Then we elaborate on the tradeoffs between performance and costs. The caveats ofsequential program execution are identified. Overheads in parallelism management and software interactions are analyzed with a quantitative approach. Granularity, available parallelism, parallel performance metrics, Amdahl's law, Gustafson's law, Sun and Ni's law, and various isoperformance models are quantitatively analyzed with illustrative benchmark results. 1.2Notes to Readers Chapter l must be read ahead of all remaining chapters. It is required for all four possible course offerings suggested in the Preface. Chapter 2 must be read before those software-oriented Chapters 7, 9, l2, 13, and 14. For hardware-oriented readers, these chapters can be skipped in the first reading. Chapter 3 will be helpful to understand the performance-sensitive material presented in Chapters 4, 5,6, 8, lO, and ll. For an introductory course taken by mixed students from Computer Science and Electrical Engineering majors, Chapter 3 can be skipped in the first reading. However, research-oriented students may find Chapter 3 extremely useful, as long as the research topic chosen is related to system performance. Scalable Computer Platforms and Models This chapter presents basic models of parallel and cluster computers. Fundamental design issues and operational principles of scalable computer platforms are introduced. We review the computer technology over the last 50 years. Scalable and cluster computer systems are modeled with key architectural distinctions. Scalability will be introduced in three orthogonal dimensions: resource, application, and technology. Abstract and physical machine models are specified in Section 1.3.In Section 1.4, we introduce basic concepts of multicomputer clustering. The differences among symmetric multiprocessors, clusters of computers, and distributed computer systems are clarified. Three basic principles are studied in Section l.5 to guide the design and application of scalable parallel computers. Bits, Bytes, and Words The following units are widely used in the computer field, but sometimes were wrongly used with confusing notations and ambiguous meanings. To cope with this problem, we present below a set of notations that will be used throughout the book. In particular, readers should not be confused with the shorthand notations for basic units of time, byte, and bit respectively. The basic unit in time is second, abbreviated as s. The two basic information units are byte and bit. One byte (l B) is 8 bits (8 b). Byte is always abbreviated as B and bit as b. Other information units are word (16 b or 2 B), doubleword (32 b or 4 B), and quadword (64 b or 8 B). This is based on convention used by Intel, Motorola, and Digital Equipment. Mainframe vendors consider a word to have 32 b. Some supercomputer designers consider 64 b in a word. A frequently used workload unit is the number offloating-point operations, abbreviated asflop. A unit for computing speed is the number offloating-point operations per second (flop/s). A unit for information transfer rate is the number ofbytes per second (B/s). The execution rate ofa processor is often measured as million instruc- tionsper second (MlPS), which is equivalent to the notation Mi/s used in Europe. |
| 书评: |
| 本书暂时没有评论 |
| 请登陆后发表评论,点击这里登陆,或者点击这里注册 |
| 以下网店提供购买本书: | ||||||
网店 |
价格 |
折扣 |
配送 |
支付方式 |
最近优惠 |
购买 |
| ¥26.25 | 39% | 送货上门、货到付款(运费:北京免费,其他地区5~12元) 送货上门、款到发货(运费:北京免费,其他地区5~12元) EMS快递(运费: 图书原价50元以内14元,超过50部分每增加25元(不足25元按25元计算)加收6元。其中单本书超过50元的部分打折) 普通邮寄(运费:中国大陆地区邮寄费每单6元) | 货到付款 邮局汇款 银行汇款 在线支付 | 直接购买 | ||
关于我们 联系我们
|