XcalableMP PGAS Programming Language : From Programming Model to Applications.

Yazar:Sato, Mitsuhisa
Materyal türü: KonuKonuYayıncı: Singapore : Springer Singapore Pte. Limited, 2020Telif hakkı tarihi: �2021Tanım: 1 online resource (265 pages)İçerik türü:text Ortam türü:computer Taşıyıcı türü: online resourceISBN: 9789811576836Tür/Form:Electronic books.Ek fiziksel biçimler:Print version:: XcalableMP PGAS Programming LanguageLOC classification: QA76.76.C65Çevrimiçi kaynaklar: Click to View
İçindekiler:
Intro -- Preface -- Contents -- XcalableMP Programming Model and Language -- 1 Introduction -- 1.1 Target Hardware -- 1.2 Execution Model -- 1.3 Data Model -- 1.4 Programming Models -- 1.4.1 Partitioned Global Address Space -- 1.4.2 Global-View Programming Model -- 1.4.3 Local-View Programming Model -- 1.4.4 Mixture of Global View and Local View -- 1.5 Base Languages -- 1.5.1 Array Section in XcalableMP C -- 1.5.2 Array Assignment Statement in XcalableMP C -- 1.6 Interoperability -- 2 Data Mapping -- 2.1 nodes Directive -- 2.2 template Directive -- 2.3 distribute Directive -- 2.3.1 Block Distribution -- 2.3.2 Cyclic Distribution -- 2.3.3 Block-Cyclic Distribution -- 2.3.4 Gblock Distribution -- 2.3.5 Distribution of Multi-Dimensional Templates -- 2.4 align Directive -- 2.5 Dynamic Allocation of Distributed Array -- 2.6 template_fix Construct -- 3 Work Mapping -- 3.1 task and tasks Construct -- 3.1.1 task Construct -- 3.1.2 tasks Construct -- 3.2 loop Construct -- 3.2.1 Reduction Computation -- 3.2.2 Parallelizing Nested Loop -- 3.3 array Construct -- 4 Data Communication -- 4.1 shadow Directive and reflect Construct -- 4.1.1 Declaring Shadow -- 4.1.2 Updating Shadow -- 4.2 gmove Construct -- 4.2.1 Collective Mode -- 4.2.2 In Mode -- 4.2.3 Out Mode -- 4.3 barrier Construct -- 4.4 reduction Construct -- 4.5 bcast Construct -- 4.6 wait_async Construct -- 4.7 reduce_shadow Construct -- 5 Local-View Programming -- 5.1 Introduction -- 5.2 Coarray Declaration -- 5.3 Put Communication -- 5.4 Get Communication -- 5.5 Synchronization -- 5.5.1 Sync All -- 5.5.2 Sync Images -- 5.5.3 Sync Memory -- 6 Procedure Interface -- 7 XMPT Tool Interface -- 7.1 Overview -- 7.2 Specification -- 7.2.1 Initialization -- 7.2.2 Events -- References -- Implementation and Performance Evaluation of Omni Compiler -- 1 Overview -- 2 Implementation -- 2.1 Operation Flow.
2.2 Example of Code Translation -- 2.2.1 Distributed Array -- 2.2.2 Loop Statement -- 2.2.3 Communication -- 3 Installation -- 3.1 Overview -- 3.2 Get Source Code -- 3.2.1 From GitHub -- 3.2.2 From Our Website -- 3.3 Software Dependency -- 3.4 General Installation -- 3.4.1 Build and Install -- 3.4.2 Set PATH -- 3.5 Optional Installation -- 3.5.1 OpenACC -- 3.5.2 XcalableACC -- 3.5.3 One-Sided Library -- 4 Creation of Execution Binary -- 4.1 Compile -- 4.2 Execution -- 4.2.1 XcalableMP and XcalableACC -- 4.2.2 OpenACC -- 4.3 Cooperation with Profiler -- 4.3.1 Scalasca -- 4.3.2 tlog -- 5 Performance Evaluation -- 5.1 Experimental Environment -- 5.2 EP STREAM Triad -- 5.2.1 Design -- 5.2.2 Implementation -- 5.2.3 Evaluation -- 5.3 High-Performance Linpack -- 5.3.1 Design -- 5.3.2 Implementation -- 5.3.3 Evaluation -- 5.4 Global Fast Fourier Transform -- 5.4.1 Design -- 5.4.2 Implementation -- 5.4.3 Evaluation -- 5.5 RandomAccess -- 5.5.1 Design -- 5.5.2 Implementation -- 5.5.3 Evaluation -- 5.6 Discussion -- 6 Conclusion -- References -- Coarrays in the Context of XcalableMP -- 1 Introduction -- 2 Requirements from Language Specifications -- 2.1 Images Mapped to XMP Nodes -- 2.2 Allocation of Coarrays -- 2.3 Communication -- 2.4 Synchronization -- 2.5 Subarrays and Data Contiguity -- 2.6 Coarray C Language Specifications -- 3 Implementation -- 3.1 Omni XMP Compiler Framework -- 3.2 Allocation and Registration -- 3.2.1 Three Methods of Memory Management -- 3.2.2 Initial Allocation for Static Coarrays -- 3.2.3 Runtime Allocation for Allocatable Coarrays -- 3.3 PUT/GET Communication -- 3.3.1 Determining the Possibility of DMA -- 3.3.2 Buffering Communication Methods -- 3.3.3 Non-blocking PUT Communication -- 3.3.4 Optimization of GET Communication -- 3.4 Runtime Libraries -- 3.4.1 Fortran Wrapper -- 3.4.2 Upper-layer Runtime (ULR) Library.
3.4.3 Lower-layer Runtime (LLR) Library -- 3.4.4 Communication Libraries -- 4 Evaluation -- 4.1 Fundamental Performance -- 4.2 Non-blocking Communication -- 4.3 Application Program -- 4.3.1 Coarray Version of the Himeno Benchmark -- 4.3.2 Measurement Result -- 4.3.3 Productivity -- 5 Related Work -- 6 Conclusion -- References -- XcalableACC: An Integration of XcalableMP and OpenACC -- 1 Introduction -- 1.1 Hardware Model -- 1.2 Programming Model -- 1.2.1 XMP Extensions -- 1.2.2 OpenACC Extensions -- 1.3 Execution Model -- 1.4 Data Model -- 2 XcalableACC Language -- 2.1 Data Mapping -- Example -- 2.2 Work Mapping -- Restriction -- Example 1 -- Example 2 -- 2.3 Data Communication and Synchronization -- Example -- 2.4 Coarrays -- Restriction -- Example -- 2.5 Handling Multiple Accelerators -- 2.5.1 devices Directive -- Example -- 2.5.2 on_device Clause -- 2.5.3 layout Clause -- Example -- 2.5.4 shadow Clause -- Example -- 2.5.5 barrier_device Construct -- Example -- 3 Omni XcalableACC Compiler -- 4 Performance of Lattice QCD Application -- 4.1 Overview of Lattice QCD -- 4.2 Implementation -- 5 Performance Evaluation -- 5.1 Result -- 5.2 Discussion -- 6 Productivity Improvement -- 6.1 Requirement for Productive Parallel Language -- 6.2 Quantitative Evaluation by Delta Source Lines of Codes -- 6.3 Discussion -- References -- Mixed-Language Programming with XcalableMP -- 1 Background -- 2 Translation by Omni Compiler -- 3 Functions for Mixed-Language -- 3.1 Function to Call MPI Program from XMP Program -- 3.2 Function to Call XMP Program from MPI Program -- 3.3 Function to Call XMP Program from Python Program -- 3.3.1 From Parallel Python Program -- 3.3.2 From Sequential Python Program -- 4 Application to Order/Degree Problem -- 4.1 What Is Order/Degree Program -- 4.2 Implementation -- 4.3 Evaluation -- 5 Conclusion -- References.
Three-Dimensional Fluid Code with XcalableMP -- 1 Introduction -- 2 Global-View Programming Model -- 2.1 Domain Decomposition Methods -- 2.2 Performance on the K Computer -- 2.2.1 Comparison with Hand-Coded MPI Program -- 2.2.2 Optimization for SIMD -- 2.2.3 Optimization for Allocatable Arrays -- 3 Local-View Programming Model -- 3.1 Communications Using Coarray -- 3.2 Performance on the K Computer -- 4 Summary -- References -- Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP -- 1 Introduction -- 2 Nuclear Fusion Simulation Code -- 2.1 Gyrokinetic PIC Simulation -- 2.2 GTC -- 3 Implementation of GTC-P by Hybrid-view Programming -- 3.1 Hybrid-View Programming Model -- 3.2 Implementation Based on the XMP-Localview Model: XMP-localview -- 3.3 Implementation Based on the XMP-Hybridview Model: XMP-Hybridview -- 4 Performance Evaluation -- 4.1 Experimental Setting -- 4.2 Results -- 4.3 Productivity and Performance -- 5 Related Research -- 6 Conclusion -- References -- Parallelization of Atomic Image Reconstruction from X-ray Fluorescence Holograms with XcalableMP -- 1 Introduction -- 2 X-ray Fluorescence Holography -- 2.1 Reconstruction of Atomic Images -- 2.2 Analysis Procedure of XFH -- 3 Parallelization -- 3.1 Parallelization of Reconstruction of Two-Dimensional Atomic Images by OpenMP -- 3.2 Parallelization of Reconstruction of Three-dimensional Atomic Images by XcalableMP -- 4 Performance Evaluation -- 4.1 Performance Results of Reconstruction of Two-Dimensional Atomic Images -- 4.2 Performance Results of Reconstruction of Three-dimensional Atomic Images -- 4.3 Comparison of Parallelization with MPI -- 5 Conclusion -- References -- Multi-SPMD Programming Model with YML and XcalableMP -- 1 Introduction -- 2 Background: International Collaborations for the Post-Petascale and Exascale Computing -- 3 Multi-SPMD Programming Model.
3.1 Overview -- 3.2 YML -- 3.3 OmniRPC-MPI -- 4 Application Development in the mSPMD Programming Environment -- 4.1 Task Generator -- 4.2 Workflow Development -- 4.3 Workflow Execution -- 5 Experiments -- 6 Eigen Solver on the mSPMD Programming Model -- 6.1 Implicitly Restarted Arnoldi Method (IRAM), Multiple Implicitly Restarted Arnoldi Method (MIRAM) and Their Implementations for the mSPMD Programming Model -- 6.2 Experiments -- 7 Fault-Tolerance Features in the mSPMD Programming Model -- 7.1 Overview and Implementation -- 7.2 Experiments -- 8 Runtime Correctness Check for the mSPMD Programming Model -- 8.1 Overview and Implementation -- 8.2 Experiments -- 9 Summary -- References -- XcalableMP 2.0 and Future Directions -- 1 Introduction -- 2 XcalableMP on Fugaku -- 2.1 Performance of XcalableMP Global View Programming -- 2.2 Performance of XcalableMP Local View Programming -- 3 Global Task Parallel Programming -- 3.1 OpenMP and XMP Tasklet Directive -- 3.2 A Proposal for Global Task Parallel Programming -- 3.3 Prototype Design of Code Transformation -- 3.4 Preliminary Performance -- 3.5 Communication Optimization for Manycore Clusters -- 4 Retrospectives and Challenges for Future PGAS Models -- 4.1 Low-Level Communication Layer for PGAS Model -- 4.2 XcalableMP as a DSL for Stencil Applications -- 4.3 XcalableMP API: Compiler-Free Approach -- 4.4 Global Task Parallel Programming Model for Accelerators -- References.
Bu kütüphanenin etiketleri: Kütüphanedeki eser adı için etiket yok. Etiket eklemek için oturumu açın.
    Ortalama derecelendirme: 0.0 (0 oy)
Bu kayda ilişkin materyal yok

Intro -- Preface -- Contents -- XcalableMP Programming Model and Language -- 1 Introduction -- 1.1 Target Hardware -- 1.2 Execution Model -- 1.3 Data Model -- 1.4 Programming Models -- 1.4.1 Partitioned Global Address Space -- 1.4.2 Global-View Programming Model -- 1.4.3 Local-View Programming Model -- 1.4.4 Mixture of Global View and Local View -- 1.5 Base Languages -- 1.5.1 Array Section in XcalableMP C -- 1.5.2 Array Assignment Statement in XcalableMP C -- 1.6 Interoperability -- 2 Data Mapping -- 2.1 nodes Directive -- 2.2 template Directive -- 2.3 distribute Directive -- 2.3.1 Block Distribution -- 2.3.2 Cyclic Distribution -- 2.3.3 Block-Cyclic Distribution -- 2.3.4 Gblock Distribution -- 2.3.5 Distribution of Multi-Dimensional Templates -- 2.4 align Directive -- 2.5 Dynamic Allocation of Distributed Array -- 2.6 template_fix Construct -- 3 Work Mapping -- 3.1 task and tasks Construct -- 3.1.1 task Construct -- 3.1.2 tasks Construct -- 3.2 loop Construct -- 3.2.1 Reduction Computation -- 3.2.2 Parallelizing Nested Loop -- 3.3 array Construct -- 4 Data Communication -- 4.1 shadow Directive and reflect Construct -- 4.1.1 Declaring Shadow -- 4.1.2 Updating Shadow -- 4.2 gmove Construct -- 4.2.1 Collective Mode -- 4.2.2 In Mode -- 4.2.3 Out Mode -- 4.3 barrier Construct -- 4.4 reduction Construct -- 4.5 bcast Construct -- 4.6 wait_async Construct -- 4.7 reduce_shadow Construct -- 5 Local-View Programming -- 5.1 Introduction -- 5.2 Coarray Declaration -- 5.3 Put Communication -- 5.4 Get Communication -- 5.5 Synchronization -- 5.5.1 Sync All -- 5.5.2 Sync Images -- 5.5.3 Sync Memory -- 6 Procedure Interface -- 7 XMPT Tool Interface -- 7.1 Overview -- 7.2 Specification -- 7.2.1 Initialization -- 7.2.2 Events -- References -- Implementation and Performance Evaluation of Omni Compiler -- 1 Overview -- 2 Implementation -- 2.1 Operation Flow.

2.2 Example of Code Translation -- 2.2.1 Distributed Array -- 2.2.2 Loop Statement -- 2.2.3 Communication -- 3 Installation -- 3.1 Overview -- 3.2 Get Source Code -- 3.2.1 From GitHub -- 3.2.2 From Our Website -- 3.3 Software Dependency -- 3.4 General Installation -- 3.4.1 Build and Install -- 3.4.2 Set PATH -- 3.5 Optional Installation -- 3.5.1 OpenACC -- 3.5.2 XcalableACC -- 3.5.3 One-Sided Library -- 4 Creation of Execution Binary -- 4.1 Compile -- 4.2 Execution -- 4.2.1 XcalableMP and XcalableACC -- 4.2.2 OpenACC -- 4.3 Cooperation with Profiler -- 4.3.1 Scalasca -- 4.3.2 tlog -- 5 Performance Evaluation -- 5.1 Experimental Environment -- 5.2 EP STREAM Triad -- 5.2.1 Design -- 5.2.2 Implementation -- 5.2.3 Evaluation -- 5.3 High-Performance Linpack -- 5.3.1 Design -- 5.3.2 Implementation -- 5.3.3 Evaluation -- 5.4 Global Fast Fourier Transform -- 5.4.1 Design -- 5.4.2 Implementation -- 5.4.3 Evaluation -- 5.5 RandomAccess -- 5.5.1 Design -- 5.5.2 Implementation -- 5.5.3 Evaluation -- 5.6 Discussion -- 6 Conclusion -- References -- Coarrays in the Context of XcalableMP -- 1 Introduction -- 2 Requirements from Language Specifications -- 2.1 Images Mapped to XMP Nodes -- 2.2 Allocation of Coarrays -- 2.3 Communication -- 2.4 Synchronization -- 2.5 Subarrays and Data Contiguity -- 2.6 Coarray C Language Specifications -- 3 Implementation -- 3.1 Omni XMP Compiler Framework -- 3.2 Allocation and Registration -- 3.2.1 Three Methods of Memory Management -- 3.2.2 Initial Allocation for Static Coarrays -- 3.2.3 Runtime Allocation for Allocatable Coarrays -- 3.3 PUT/GET Communication -- 3.3.1 Determining the Possibility of DMA -- 3.3.2 Buffering Communication Methods -- 3.3.3 Non-blocking PUT Communication -- 3.3.4 Optimization of GET Communication -- 3.4 Runtime Libraries -- 3.4.1 Fortran Wrapper -- 3.4.2 Upper-layer Runtime (ULR) Library.

3.4.3 Lower-layer Runtime (LLR) Library -- 3.4.4 Communication Libraries -- 4 Evaluation -- 4.1 Fundamental Performance -- 4.2 Non-blocking Communication -- 4.3 Application Program -- 4.3.1 Coarray Version of the Himeno Benchmark -- 4.3.2 Measurement Result -- 4.3.3 Productivity -- 5 Related Work -- 6 Conclusion -- References -- XcalableACC: An Integration of XcalableMP and OpenACC -- 1 Introduction -- 1.1 Hardware Model -- 1.2 Programming Model -- 1.2.1 XMP Extensions -- 1.2.2 OpenACC Extensions -- 1.3 Execution Model -- 1.4 Data Model -- 2 XcalableACC Language -- 2.1 Data Mapping -- Example -- 2.2 Work Mapping -- Restriction -- Example 1 -- Example 2 -- 2.3 Data Communication and Synchronization -- Example -- 2.4 Coarrays -- Restriction -- Example -- 2.5 Handling Multiple Accelerators -- 2.5.1 devices Directive -- Example -- 2.5.2 on_device Clause -- 2.5.3 layout Clause -- Example -- 2.5.4 shadow Clause -- Example -- 2.5.5 barrier_device Construct -- Example -- 3 Omni XcalableACC Compiler -- 4 Performance of Lattice QCD Application -- 4.1 Overview of Lattice QCD -- 4.2 Implementation -- 5 Performance Evaluation -- 5.1 Result -- 5.2 Discussion -- 6 Productivity Improvement -- 6.1 Requirement for Productive Parallel Language -- 6.2 Quantitative Evaluation by Delta Source Lines of Codes -- 6.3 Discussion -- References -- Mixed-Language Programming with XcalableMP -- 1 Background -- 2 Translation by Omni Compiler -- 3 Functions for Mixed-Language -- 3.1 Function to Call MPI Program from XMP Program -- 3.2 Function to Call XMP Program from MPI Program -- 3.3 Function to Call XMP Program from Python Program -- 3.3.1 From Parallel Python Program -- 3.3.2 From Sequential Python Program -- 4 Application to Order/Degree Problem -- 4.1 What Is Order/Degree Program -- 4.2 Implementation -- 4.3 Evaluation -- 5 Conclusion -- References.

Three-Dimensional Fluid Code with XcalableMP -- 1 Introduction -- 2 Global-View Programming Model -- 2.1 Domain Decomposition Methods -- 2.2 Performance on the K Computer -- 2.2.1 Comparison with Hand-Coded MPI Program -- 2.2.2 Optimization for SIMD -- 2.2.3 Optimization for Allocatable Arrays -- 3 Local-View Programming Model -- 3.1 Communications Using Coarray -- 3.2 Performance on the K Computer -- 4 Summary -- References -- Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP -- 1 Introduction -- 2 Nuclear Fusion Simulation Code -- 2.1 Gyrokinetic PIC Simulation -- 2.2 GTC -- 3 Implementation of GTC-P by Hybrid-view Programming -- 3.1 Hybrid-View Programming Model -- 3.2 Implementation Based on the XMP-Localview Model: XMP-localview -- 3.3 Implementation Based on the XMP-Hybridview Model: XMP-Hybridview -- 4 Performance Evaluation -- 4.1 Experimental Setting -- 4.2 Results -- 4.3 Productivity and Performance -- 5 Related Research -- 6 Conclusion -- References -- Parallelization of Atomic Image Reconstruction from X-ray Fluorescence Holograms with XcalableMP -- 1 Introduction -- 2 X-ray Fluorescence Holography -- 2.1 Reconstruction of Atomic Images -- 2.2 Analysis Procedure of XFH -- 3 Parallelization -- 3.1 Parallelization of Reconstruction of Two-Dimensional Atomic Images by OpenMP -- 3.2 Parallelization of Reconstruction of Three-dimensional Atomic Images by XcalableMP -- 4 Performance Evaluation -- 4.1 Performance Results of Reconstruction of Two-Dimensional Atomic Images -- 4.2 Performance Results of Reconstruction of Three-dimensional Atomic Images -- 4.3 Comparison of Parallelization with MPI -- 5 Conclusion -- References -- Multi-SPMD Programming Model with YML and XcalableMP -- 1 Introduction -- 2 Background: International Collaborations for the Post-Petascale and Exascale Computing -- 3 Multi-SPMD Programming Model.

3.1 Overview -- 3.2 YML -- 3.3 OmniRPC-MPI -- 4 Application Development in the mSPMD Programming Environment -- 4.1 Task Generator -- 4.2 Workflow Development -- 4.3 Workflow Execution -- 5 Experiments -- 6 Eigen Solver on the mSPMD Programming Model -- 6.1 Implicitly Restarted Arnoldi Method (IRAM), Multiple Implicitly Restarted Arnoldi Method (MIRAM) and Their Implementations for the mSPMD Programming Model -- 6.2 Experiments -- 7 Fault-Tolerance Features in the mSPMD Programming Model -- 7.1 Overview and Implementation -- 7.2 Experiments -- 8 Runtime Correctness Check for the mSPMD Programming Model -- 8.1 Overview and Implementation -- 8.2 Experiments -- 9 Summary -- References -- XcalableMP 2.0 and Future Directions -- 1 Introduction -- 2 XcalableMP on Fugaku -- 2.1 Performance of XcalableMP Global View Programming -- 2.2 Performance of XcalableMP Local View Programming -- 3 Global Task Parallel Programming -- 3.1 OpenMP and XMP Tasklet Directive -- 3.2 A Proposal for Global Task Parallel Programming -- 3.3 Prototype Design of Code Transformation -- 3.4 Preliminary Performance -- 3.5 Communication Optimization for Manycore Clusters -- 4 Retrospectives and Challenges for Future PGAS Models -- 4.1 Low-Level Communication Layer for PGAS Model -- 4.2 XcalableMP as a DSL for Stencil Applications -- 4.3 XcalableMP API: Compiler-Free Approach -- 4.4 Global Task Parallel Programming Model for Accelerators -- References.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2022. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

yorum yazmak için.

Ziyaretçi Sayısı

Destekleyen Koha