Overview Venue : CMSD, UoH Key-Note/Invited Talks Faculty / Speakers Proceedings Downloads Past Tech. Workshops Target Audience Benefits Organisers Accommodation Local Travel Sponsors Feedback Acknowledgements Contact Home

Topics of Interest Tech. Prog. Schedule Topic : Multi-Core Topic : ARM Proc. Topic : Coprocessors Topic : GPGPUs Topic : HPC Cluster Topic : App. Kernels. Topic : Lab. Session Key-Note / Invited Talks Home

Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compilers - Opt. Features Threads-Perf. Math. Lib. Threads-Prof. & Tools Threads - I/O Perf. PGAS : UPC / CAF/ GA Power & Perf. Home

Mode-2 ARM Prog. Env Benchmarks Power & Perf. Home

Mode-3 Coprocessors Arch. Software Compiler & Vect. Prog. Env. Benchmarks Power & Perf. Home

Mode-4 GPGPUs NVIDIA - CUDA/OpenCL AMD APP - OpenCL GPGPUs - OpenCL GPGPUs : Power & Perf. Home

Mode-5 HPC Cluster HPC MPI Cluster GPU Cluster - NVIDIA GPU Cluster - AMD APP Cluster - Intel Coprocessors Cluster- Power & Perf. Home

Mode-6 App. Kernels PDE Solvers : FDM/FEM Image Processing - FFT Monte Carlo Methods String Srch. Seq. Analy. Video Process. Intr. Detcn. Sys App. Power & Perf. Home

Reg. Overview Pvt. Sector Pub. Sector Govt. Acad. Staff Students Reg. On-line Reg. Accommodation Contact Home

• Mode-1 Multi-Core • Memory Allocators • OpenMP • Intel TBB • Pthreads • Java - Threads • Charm++ Prog. • Message Passing (MPI) • MPI - OpenMP • MPI - Intel TBB • MPI - Pthreads • Compiler Opt. Features • Threads-Perf. Math.Lib. • Threads-Prof. & Tools • Threads-I/O Perf. • PGAS : UPC / CAF / GA • Power-Perf. • Reference • Home

HeGaPa-2012 References - PGAS

Partitioned Global Address Space (PGAS) : UPC - References & Web sites

[UPC-01].	UPC Language Specification, v1.2
[UPC-02].	David E. Hudak, Ph.D., Program Director for HPC Engineering, Introduction to PGAS -UPC, Ohio Supercomputer Center
[UPC-03].	Abdullah Kayi , Prof. Tarek El-Ghazawi (GWU) PGAS Languages, HPCL, The George Washington University
[UPC-04].	Jithin Jose PGAS Programming models, PhD student, Computer Science Department, The Ohio State University
[UPC-05].	P Balaji (Argonne), R. Thakuar (Argonne), E. Lusk (Argonne) & James Dinan (OSU) Hyrbid Parallel Programming MPI and PGAS (UPC)
[UPC-06].	James Dian, PhD Intern from Ohio State, May, 2009 An Introduction to Unified Parallel C (UPC)
[UPC-07].	Kathy Yelick,(LBNL and UC Berkeley) UPC Benchmarks
[UPC-08].	Ian Kirker and Adrian Jackson Unified Parallel C - UPC on HPCx , Jan 2008
[UPC-09].	Berkeley UPC User's Guide, v2.4.0
[UPC-10].	Dan OBonachea, Christian Bell,Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Rajesh Nishtala, MikeWelcome , Programming in UPC (Unified Parallel C), CS267 Lecture, Spring 2006
[UPC-11].	Mike Welcome, (LBNL), Evaluation of High-Performance Networks as Compilation Targets for Global Address Space Language In conjunction with the joint UCB and NERSC/LBL UPC compiler development project http://upc.nersc.gov"
[UPC-12].	James Dinan (The Ohio State University), Pavan Balaji (ANL), Ewing Lusk (ANL), P. Sadayappan (Ohio-State University), Rajeev Thakur (ANL) (2010) Hybrid Parallel Programming with MPI and Unified Parallel C, ACM 2010
[UPC-13].	Berkeley UPC. Berkeley UPC user’s guide version 2.8.0, 2009.
[UPC-14].	Zhang Zhang, Steven Seidel, Benchmark Measurements of Current UPC Platforms Dept. of Computer Science, Michigan Technological University, UPC Developer's Workshop Washington D.C. (2004)
[UPC-15].	Zhang Zhang, Steven Seidel, Benchmark Measurements of Current UPC Platforms Dept. of Computer Science, Michigan Technological University, Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05)
[UPC-16].	Bill Carison, IDA; Tarek E-Ghazawi, GWU; Rober Numrich, University of Minnesota; Kathy Yelick, UC Berkeley; Progamming in the Partitioned Global Address Space Model Sc-2003
[UPC-17].	E. Wiebel, D. Greenberg, and S. Seidel. UPC Collective Operations Specification V1.0. Technical report, May 2003.
[UPC-18].	F. Cantonnet and T. El-Ghazawi UPC Performance and Potential: A NPB Experimental Study, In Proceedings, Supercomputing 2002: Baltimore, Maryland, Nov. 2002.
[UPC-19].	Yili Zheng, Costin Iancu, Paul Hargrove, Seung-Jai Min, Katherine Yelick, Lawrence Berkeley National Lab Extending Unified Parallel C for GPU Computing FEbruary 2010, SIAM PP 2010
[UPC-20].	F. Cantonnet T. El-Ghazawi. UPC performance and potential: A NPB experimental study, 2002.
[UPC-21].	KAthy Yelick Unified Parallel C (UPC); PSC Petascale Methods, April 2004
[UPC-22].	UC Berkeley. GASNet Home Page, 2004
[UPC-23].	UC Berkeley. Berkeley Unified Parallel C Home Page, 2004. http://upc.nersc.gov
[UPC-24].	T. El-Ghazawi and S. Chauvin. UPC Benchmarking Issues In Proceedings of ICPP 2001.
[UPC-25].	UPC Community Forum. UPC specification v1.2, 2005.
[UPC-26].	The Berkeley UPC - Unified Parallel C A Joint Project of LBNL and UC Berkeley, 2002
[UPC-27].	MuPC portable UPC runtime system
[UPC-28].	UPC language specifications, v1.2. Technical Report, LBNL-59208, Berkeley National Lab, 2005.
[UPC-29].	T. El-Ghazawi, W. Carison, T. Sterling, and K. Yelick. UPC: Distributed shared memory programming. John Wiley & Sons, January 2005.
[UPC-30].	Zhang Zhang and Steven R. Seidel, A Performance Model for Fine-Grain Accesses in UPC, Michigan Technological University, Dept. of Computer Science, Houghton, MI 49931-1295 USA

Partitioned Global Address Space (PGAS) :Titanium - References & Web sites

[TNM-01].	Titanium home page
[TNM-02]	Paul N. Hilfinger (ed.), Dan Oscar Bonachea, et al. "Titanium Language Reference Manual." UC Berkeley EECS August 2005. Titanium Language Reference Manual, version 2.19
[TNM-03].	Katherine Yelick, Paul Hilfinger, et al. "Parallel Languages and Compilers: Perspective from the Titanium Experience." International Journal of High Performance Computing Applications, Vol. 21, No. 3, 266290, 2007.
[TNM-04].	Jason Ryder, Matt BeaumontGay, Aravind Bappanadu, Titanium: A HighPerformance Java Dialect
[TNM-05].	Kaushik Datta, Dan Bonachea1 and Katherine Yelick, fkdatta, bonachea, yelick ,Titanium Performance and Potential: An NPB Experimental Study Computer Science Division, University of California at Berkeley & Lawrence Berkeley National Laboratory
[TNM-06].	K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilnger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: a high-performance Java dialect. In Proceedings of ACM 1998 Workshop on Java for High- Performance Network Computing, February 1998.
[TNM-07].	P. Hilfinger, D. Bonachea, K. Datta, D. Gay, S. Graham, B. Liblit, G. Pike, J. Su and K. Yelick. Titanium Language Reference Manual, U.C. Berkeley Tech Report
[TNM-08].	Jimmy Su and Katherine Yelick, Array Prefetching for Irregular Array Accesses in Titanium Sixth Annual Workshop on Java for Parallel and Distributed Computing, Santa Fe, New Mexico, April 2004.
[TNM-09].	GASNet home page
[TNM-10].	Intrepid Technology Inc. ; Intrepid Tech. GCC Unified Parallel C (GCCUPC) toolset
[TNM-11].	J. Su and K. Yelick. Array prefetching for irregular array accesses in Titanium, In Sixth Annual Workshop on Java for Parallel and Distributed Computing, Santa Fe, New Mexico, April 2004
[TNM-12].	Ben Liblit, Alex Aiken, and Katherine Yelick , Data Sharing Analysis for Titanium, University of California, Berkeley, Computer Science Division Technical Report, CSD-01-1165, November, 2001. PDF available
[TNM-13].	K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken, "Titanium: A highperformance Java dialect," Concurrency Practice and Experience, vol. 10, pp. 825- 836, 1998.
[TNM-14].	K. Yelick, P. Hilfinger, S. Graham, D. Bonachea, J. Su, A. Kamil, K. Datta, P. Colella, and T. Wen. Parallel languages and compilers: Perspective from the Titanium experience. The International Journal of High Performance Computing Applications, 21(2), 2007.

Partitioned Global Address Space (PGAS) : CAF - References & Web sites

[CAF-01].	Cristian Coarf, Yuri Dotsenko, John Mellor-Crummey (Rice University), Franciois Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanty, Yiyi Yao, (George Washington University), Daniel Chavarria-Miranda (Pacific Northwest National Laboratory (PNNL)); An Evaluation of Global Address Space Languages: Co-Array Fortran and Unified Parallel C; ACM-2005
[CAF-02].	Co-Array Fortran 2.0 Aprril,2009-2011
[CAF-03].	R. W. Numrich and J. K. Reid, Co-Array Fortran for parallel programming. ACM Fortran Forum, 17(2):1–31, August 1998.
[CAF-04].	John Mellor-Crummey, Laksono Adhianto Mark Krentel, Guohua Jin, William Scherer III, Chaoran Yang; , 2010 HPC Challenge Class II Submission: Coarray Fortran, Center for Scalable Application Development Software 2.0, Department of Computer Science Rice University, SC2010
[CAF-05].	Coarray Fortran (CAF) 2.0; Department of Computer Science, Rice University, Houston, TX
[CAF-06].	Robert W. Numerich and John Reid, Co-arrays in the next fortran standard. SIGPLAN Fortran Forum, 24(2):4–17, 2005.
[CAF-07].	Robert W. Numrich Introduction to Co-Array Fortran, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis
[CAF-08].	C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel, The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.
[CAF-09].	Robert W. Numrich , A parallel numerical library for co-array fortran. Springer Lecture Notes in Computer Science, LNCS 3911:960–969, 2005.
[CAF-10].	Y.Dotsenko, C. Coarfa, J. Mellor-Crummey, and D. Chavarria-Miranda. Experiences with Co-Array Fortran on Hardware Shared Memory Platforms, In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, September 2004.
[CAF-11].	C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. Mellor-Crummey, Co-array Fortran Performance and Potential: An NPB Experimental Study. In Proc. of the 16th Intl. Workshop on Languages and Compilers for Parallel Computing, Number 2958 in LNCS. Springer-Verlag, October 2-4, 2003.
[CAF-12].	Yuri Dotsenko Cristian Coarfa John MellorCrummey, A Multiplatform CoArray Fortran Compiler Dept. of Computer Science, Rice University
[CAF-13].	J. Reid and R. W. Numrich, Co-arrays in the next Fortran standard. Sci. Program., 15(1):9–26, 2007.
[CAF-14].	Dotsenko, C. Coarfa, and J. Mellor-Crummey, A multi-platform co-array fortran compiler. In Proceedings of the 13th International Conference of Parallel Architectures and Compilation Techniques (PACT 2004), Antibes Juan-les-Pins, France, October 2004.
[CAF-15].	R. Numrich and J. Reid. Co-array fortran for parallel programming. In ACM Fortran Forum 17, 2, 1-31.,

Partitioned Global Address Space (PGAS) : GA - References & Web sites

[GA-01].	Global Arrays ToolKit Home Page
[GA-02].	J. Nieplocha, R. J. Harrison, and R. J. Littlefield, "Global Arrays: A Portable Shared Memory Programming Model for Distributed Memory Computers," in proceedings of Supercomputing, 1994.
[GA-03].	J. Nieplocha, R. J. Harrison, and R. J. Littlefield, "Global arrays: A nonuniform memory access programming model for high-performance computers," Journal of Supercomputing, vol. 10, pp. 169-189, 1996.
[GA-04].	J. Nieplocha, R. J. Harrison, M. Krishnan, B. Palmer, and V. Tipparaju, "Combining shared and distributed memory models: Evolution and recent advancements of the Global Array Toolkit," in proceedings of POHLL'2002 workshop of ICS-2002, NYC, 2002.
[GA-05]	J. Nieplocha, M. Krishnan, B. Palmer, V. Tipparaju, and Y. Zhang, "Exploiting Processor Groups to Extend Scalability of the GA Shared Memory Programming Model," in proceedings of ACM Computing Frontiers, Italy, 2005.
[GA-06].	J. Nieplocha, R.J. Harrison and R.J. Littleld, The Global Array Programming Model for High Performance Scientific Computing, Pacific Northwest Laboratory, SIAM News, August/September 1995
[GA-07].	Jarek Nieplocha, Bruce Palmer, Vinod Tipparaju, Manojkumar Krishnan, Harold Trease, Computational Sciences and Mathematics Department, Edoardo A , William, R. Wiley, Environmental Molecular Sciences Laboratory Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit, Pacific Northwest National Laboratory, Richland, WA 99352
[GA-08].	Vinod Tipparaju, Edoardo A Weikuan Yu, Jeffrey S. Vetter, Computer Science & Mathematics† Department of Computer Science Oak Ridge National Laboratory Auburn University, Enabling a highly-scalable global address space model for petascale computing Oak Ridge, TN 37831 Auburn, AL 36849
[GA-09].	S.Goedecker and A.Hoisie (2001) Performance Optimisation of Numericaly Intensive Codes, SIAM 2001
[GA-10].	Manojkumar Krishnan, Bruce Palmer, Abhinav Vishnu, Sriram Krishnamoorthy, Je Daily, Daniel Chavarria The Global Arrays User Manual. November 12, 2010
[GA-11].	Nieplocha, R.J. Harrison, M.K. Kumar, B. Palmer, V. Tipparaju, H. Trease Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit, Pacific Northwest National Laboratory
[GA-12].	Dr.Christian Halloy and Dr. Kwai Wong "Parallel Computing Techniques to Maximize Your Megaflops" Supercomputing'99-Portland, OR Tutorial Workshop Notes, November 15, 1999
[GA-13].	L. Huang, B. Chapman, and Z. Liu. Towards a more excient implementation of openmp for clusters via translation to global arrays. Parallel Computing, Jan 2005.
[GA-14].	J. Nieplocha, R. J. Harrison, and R. J. Littleeld Towards a more programming model for high-performance computers. J. Supercomputing, (10):197-220, 1996.

Partitioned Global Address Space (PGAS) : X10 - References & Web sites

[X10-01].	X10 Web Site
[X10-02].	The X10 programming language.
[X10-03]	Haichuan Wang, IBM Research An Overview of the X10 programming language November 2010 X10.Tutorial-byHaichuanWang.pdf
[X10-04].	Vijya Saraswat, Report on the Experimental Language X10 DRAFT V 0.4,1, IBM USA, Feb 2006
[X10-05].	Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam, (1994) P. Charles, C. Grothoff, C. Donawa, K. Ebcioglu, A. Kielstra, C. von Praun, V. Saraswat, and V. Sarkar, X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of OOPSLA, 2005.
[X10-06].	Jose E. Moreira, Samuel P. Midkiff, Manish Gupta, Pedro V. Artigas, Marc Snir, and Richard D. Lawrence, Java programming for high-performance numerical computing. IBM Systems Journal, 39(1):21–, 2000.
[X10-07].	X10 is a new language developed in the IBM X10 PERCS project as part of the DARPA program on High Productivity Computing Systems (HPCS) (SC09 Tutorial)
[X10-08].	X10 tutorial Vijay Saraswat, (Based on tutorial co-authored with Vivek Sarkar, Christoph von Praun, Nate Nystrom, Igor Peshansky) August 2008 IBM Research; htp://www.saraswat.org/X10-Tutorial-Rice.ppt
[X10-09].	"X10: an Experimental Language for High Productivity Programming of Scalable Systems", P-PHEC workshop, February 2005.
[X10-10].	2011 X10 Workshop Program
[X10-11].	Shivali Agarwal (TIFR, Mumbai, India), Rajkishore Barik, & Rudrapatna K Shyamasundar (IBM India Research Lab, New Delhi, India), Vivek Sarakar (IBM T. J. Watson Research Center), “May-Happen-in-Parallel Analysis of X10 Programs”, PPoPP 2007 (ACM-2007)
[X10-12].	Jonathan P. Brezin, Thomas J. Watson Research Center, Hawthorne, NY USA, An Introduction To Programming With X10, December 2010
[X10-13].	P. Charles et al., X10: An object-oriented approach to non-uniform cluster computing, in Conference on Object Oriented Programming Systems Languages and Applications, pp. 519–538. 2005

Partitioned Global Address Space (PGAS) : Chapel - References & Web sites

[Chp-01].	Cray Inc. Chapel Language Specification April 2010
[Chp-02].	Bradford L. Chamberlain, David Callahan, Hans P. Zima, Cray Inc., Seattle WA, USA, Microsoft Corporation, Redmond WA, USA, Parallel Programmability and the Chapel Language, JPL, Pasadena CA, USA, and University of Vienna, Austria
[Chp-03].	Joe Elizondo and Samuel Palmer, Department of Computer Science University of Texas at Austin, Using Chapel to Implement Dense Linear Algebra Libraries May 16, 2010
[Chp-04].	Chamberlain, B. L., Choi, S.-E., Choi, S.-E., and Iten, D. Hpc challenge benchmarks in Chapel. HPC Challenge Awards Competition at SC09 (November 2009). (available at http://chapel.cray.com)
[Chp-05].	Randy Dodgen,Parallelizing a Sparse Domain Distribution in Chapel, CS380P Spring 2010 - Final Project May 17, 2010
[Chp-06].	Steven J. Deitz Bradford L. Chamberlain Sung-Eun Choi David Iten Lee Prokowich , Five Powerful Chapel Idioms, Cray Inc.
[Chp-07].	Bradford L. Chamberlain, David Callahan, and Hans P. Zima. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 21(3):291–312, August 2007.
[Chp-08].	Chapel Team. Parallel programming in Chapel: The Cascade High-Productivity Language. http://chapel.cray.com/tutorials.html November2010.
[Chp-09].	Chapel Specification 0.4, Cray Inc, Seattle, WA 98104, February 4, 2005
[Chp-10].	A. Waheed and J. Yan, Parallelization of NAS Benchmarks for Shared Memory Multi-processors, Technical Report, NAS-98-010, March 1998, Available at http://www.nas.nasa.gov/Research/Reports/Techreports/1998/nas-98-010-abstract.html
[Chp-11].	The Chapel Parallel Programming Language
[Chp-12].	Chapel: The Cascade high productivity language.
[Chp-13].	B. Chamberlain, D. Callahan, and H. Zima Parallel programmability and the chapel language, International Journal of High Performance Computing Jan 2007.
[Chp-14].	Brad Chamberlain, Chapel Team, Cray Inc. HPC Programming Models:Current Practice, Emerging Promise SIAM Conference on Parallel Processing for Scientific Computing (PP10), February 25, 2010

Partitioned Global Address Space (PGAS) : Libraries

[shm-01].	SHMEM website
[shm-01].	Intro_shmem - Introduction to the SHMEM programming
[shm-02].	Bonachea, D. and Jeong, J. Spring 2002. GASNet: A portable high-performance communication layer for global address-space languages. CS258 Parallel Computer Architecture Project.
[shm-03].	APGAS 2009. Workshop on Asynchrony in the PGAS Programming Model. APGAS 2009. Workshop on Asynchrony in the PGAS Programming Model
[shm-04].	GAS Models in PModels Project, The Center for Programming Models for Scalable Parallel Programming.
[shm-05].	R. Barriuso and A. Knies., Shmem user's guide for c. Cray Research Inc, Jan 1994.
[shm-06].	R. A. Kendall K. Parzyszek, J. Neiplocha , A generalized portable SHMEM library for high performance computing. pages 401–406, 2000.
[shm-07].	SGI’s SHMEM API is the baseline for OpenSHMEM Specification 1.0
[shm-08].	GPI – Global Address Space Programming Interface
[shm-09].	GASNet home page
[shm-10].	Open64 compiler tools
[TNM-11].	Intrepid Technology Inc. ; Intrepid Tech. GCC Unified Parallel C (GCCUPC) toolset
[shm-12].	D. Callahan, B. L. Chamberlain, and H. P. Zima, The Cascade high productivity language,” in Int’l Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004, pp. 52–60.
[shm-13].	E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, S. Ryu, G. Steele, and S. Tobin-Hochstadt. The Fortress language specification
[shm-14].	E. Allen, D. Chase, J. Hallett, V. Luchangco, J.W. Maessen, S. Ryu, G. Steele, and S. Tobin-Hochstadt, The Fortress language specification version 0.707. Technical report,” Sun Microsystems, 2005.
[shm-15].	A. Bhatele and L. V. Kale, Benets of Topology Aware Mapping for Mesh Interconnects. Parallel Processing Letters (Special issue on Large-Scale Parallel Processing), 18(4):549-566, 2008.
[shm-16].	A. Bhatele and L. V. Kale, An Evaluation of the Effect of Interconnect Topologies on Message Latencies in Large Supercomputers. In Proceedings of Workshop on Large-Scale Parallel Processing (IPDPS '09), May 2009.
[shm-17].	J. Su and K. Yelick. Automatic support for irregular computations in a high-level language. In 19th International Parallel and Distributed Processing Symposium (IPDPS), 2005.
[shm-18].	Brad Chamberlain, Chapel Team, Cray Inc. SIAM Conference on Parallel Processing for Scientific Computing (PP10) February 25, 2010
[shm-19].	MPI-2 , Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface, July 1997.

Centre for Development of Advanced Computing