Mode-1 Multi-Core Memory Allocators OpenMP Intel TBB Pthreads Java - Threads Charm++ Prog. Message Passing (MPI) MPI - OpenMP MPI - Intel TBB MPI - Pthreads Compiler Opt. Features Threads-Perf. Math.Lib. Threads-Prof. & Tools Threads-I/O Perf. PGAS : UPC / CAF / GA Power-Perf. Reference Home




Prog. on GPUS : GPGPUs /GPU Computing : References & Web sites

[GPUComp-01]. Randi J. Rost, OpenGL \96 shading Language, Second Edition, Addison Wesley 2006
[GPUComp-02]. GPGPU Reference http://www.gpgpu.org
[GPUComp-03]. NVIDIA http://www.nvidia.com
[GPUComp-04]. NVIDIA Tesla http://www.nvidia.com/object/tesla_computing_solutions.html
[GPUComp-05]. CUDA sample source code: http://www.nvidia.com/object/cuda_get_samples.html
[GPUComp-06]. AMD Stream Processors http://ati.amd.com/products/streamprocessor/specs.html
[GPUComp-07]. OpenCL - The open standard for parallel programming of heterogeneous systems http://www.khronos.org/opencl
[GPUComp-08]. List of NVIDIA GPUs compatible with CUDA:f heterogeneous systems http://www.nvidia.com/object/cuda_learn_products.html
[GPUComp-09]. RAPIDMIND http://www.rapidmind.net
[GPUComp-10]. Peak Stream - Parallel Processing (Acquired by Google in 2007) http:/www.google.com
[GPUComp-11]. guru3d.com http://www.guru3d.com/news/sandra-2009-gets-gpgpu-support/
[GPUComp-12]. NVIDIA, NVIDIA CUDA, Programming Guide, v. 2.3, NVIDIA Corporation (2009).
[GPUComp-02].
CUDA Zone - http://www.nvidia.com/object/cuda_home.html
[GPUComp-13]. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruuger, A. E Lefohn, T. J. Purcell, A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum (2007), Vol 26, pages 80 - 113s
[GPUComp-14]. M. Harris, Optimizing NVIDIA CUDA, Presentation at AstroGPU conference (2007).
[GPUComp-15]. G. Ruestch, P. Micikevicius, Optimizing Matrix Transpose in CUDA, Tech report, NVIDIA Corporation (2009).
[GPUComp-16]. GPU Gems book series (available online), GPU Gems: http://developer.nvidia.com/object/gpu_gems_home.html
http://developer.nvidia.com/object/gpu_gems_2_home.html
http://developer.nvidia.com/object/gpu-gems-3.html
[GPUComp-17]. G. Ruestch, P. Micikevicius, Optimizing Matrix Transpose in CUDA, Tech report, NVIDIA Corporation (2009).
[GPUComp-18]. M. Harris, Parallel Prefix Sum (Scan) with CUDA, Tech report, NVIDIA Corporation (2008).
[GPUComp-19]. N. Sathish, M. Harris, M. Garland, Designing Efficient Sorting Algorithms for Many-core GPUs, Tech report, NVIDIA Corporation (2008).
[GPUComp-20]. J. Meng, K. Skadro, Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs, ICS \9209: Proceedings of the 23rd international conference on Supercomputing (2009), 256 - 265.
[GPUComp-21]. M. Harris, GPU Gems: Chapter 38 - Fast Fluid Dynamics Simulation on the GPU, GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics - NVIDIA Corporation (2007).
[GPUComp-22]. P. Micikevicius, 3D Finite Difference Computation on GPUs using CUDA, Tech report, NVIDIA Corporation (2009).
[GPUComp-23]. M. Bader, H-J. Bungartz, D. Mudigere, S. Narasimhan, B. Narayanan, Optimized CUDA Implementation of a Navier-Stokes based flow solver for the 2D Lid Driven Cavity, poster at the NVIDIA GPU research summit (2009).
[GPUComp-24]. J. M. Cohen, M. J. Molemaker, A Fast Double Precision CFD Code using CUDA, Tech report, NVIDIA Corporation (2009).
[GPUComp-25]. M. Harris, Parallel Prefix Sum (Scan) with CUDA, Tech report, NVIDIA Corporation (2008).
[GPUComp-26]. RAPIDMIND & AMD http://www.rapidmind.net/News-Aug4-08-SIGGRAPH.php
[GPUComp-27]. Merrimac - Stream Architecture Standford Brook for GPUs http://www-graphics.stanford.edu/projects/brookgpu/
[GPUComp-28]. Standford : Merrimac - Stream Architecture http://merrimac.stanford.edu/
[GPUComp-29]. ATI RADEON - AMD http://www.canadacomputers.com/amd/radeon/
[GPUComp-30]. Sparse Matrix Solvers on the GPU ; conjugate Gradients and Multigrid by Jeff Bolts, Ian Farmer, Eitan Grinspum, Peter Schroder, Caltech Report (2003); Supported in part by NSF, NVIDIA
[GPUComp-31]. Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
[GPUComp-32]. Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
[GPUComp-33].

Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
[GPUComp-34].

Bollz J., Farmer I., Grinspun F., Schroder F : Sparse Matris Solvers on the GPU ; Conjugate Gradients and multigrid ACM Transactions on Graphics (Proceedings of ACM SIGRAPH 2003) 22, 2 (Jul y2003) pp 917-924 Graphic Hardware (2007).
[GPUComp-35]. Number crunching with GPUs PeakStream Math API Exploits Parallelism in Graphics Processors, Ocotober 2006; Microprocessor http://www.mdronline.com
[GPUComp-36]. Tom R. Halfhill, Parallel Processing with CUDA Nvidia's High-Performance Computing Platform Uses Massive Multithreading ; Microprocessors, Volume 22, Archive 1, January 2008 http://www.mdronline.com
[GPUComp-37]. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Hoston, P.Hanrahan, Brook for GPUs ; Stream Computing on GRaphics Hadrware, ACM Tran. GRaph (SIGGRAPH) 2008
[GPUComp-38]. J. Kriiger, R. Wetermann, Linear Algeria operators for GPU implementation of Numerical Algorithms ACM Tran, Graph (SIGGRAPH) 22 (3) pp. 908-916. (2003)
[GPUComp-39]. Tutorial SC 2007 : High Performance Computing with CUDA
[GPUComp-40]. FASTRA http://www.fastra.ua.ac.be/en/faq.html
[GPUComp-41]. AMD Stream Computing software Stack http://www.amd.com
[GPUComp-42]. BrookGPU : http://graphics.standafrod.edu/projects/brookgpu/index.html
[GPUComp-43]. Tom R Halfhill, Intel\92s Larrabee Redefines GPUs \96 Fully Programmable Many core Processor Reaches Beyond Graphics, Microprocessor Report September 29, 2008
[GPUComp-44]. Tom R Halfhill AMD\92s Stream Becomes a River \96 Parallel Processing Platform for ATI GPUs Reaches More Systems, Microprocessor Report December 2008
[GPUComp-45]. General-purpose computing on graphics processing units (GPGPU) http://en.wikipedia.org/wiki/GPGPU
[GPUComp-46]. Khronous Group, OpenGL 3, December 2008 http://www.khronos.org/opengl
[GPUComp-47]. Perry H. Wang1, Jamison D. Collins1, Gautham N. Chinya1, Hong Jiang2, Xinmin Tian3 , EXOCHI: Architecture and Programming Environment for A Heterogeneous Multi-core Multithreaded System, PLDI\9207
[GPUComp-48]. Daniel Weiskopf, Basics of GPU-Based Programming, Institute of Visualization and Interactive Systems, Interactive Visualization of Volumetric Data on Consumer PC Hardware: Basics of Hardware-Based Programming University of Stuttgart, VIS 2003
[GPUComp-48]. GPU Programming Languages http://www.cis.upenn.edu/~suvenkat/700/
[GPUComp-49]. OpenGL design http://graphics.stanford.edu/courses/cs448a-01-fall/design_opengl.pdf
[GPUComp-50]. OpenCL - The open standard for parallel programming of heterogeneous systems http://www.khronos.org/opencl
[GPUComp-51]. Mary Fetcher and Vivek Sarkar, Introduction to GPGPUS \96 Seminar on Heterogeneous Processors, Dept. of computer Science, Rice University, October 2007
[GPUComp-52]. C-DAC Technology Workshops PEEP-2008 & OPECG-2009 http://www.cdac.in
[GPUComp-53]. NVIDIA CUDA Quick Start Guide 2007-2009 http://www.nvidia.com/object/cuda_develop.html
[GPUComp-54]. NVIDIA OpenCL Best Practices Guide Version 1.0 August 2009 http://www.nvidia.com
[GPUComp-55]. NVIDIA OpenCL Getting Started Guide Version 2009 http://www.nvidia.com
[GPUComp-56]. NVIDIA OpenCL Programming Guide for the CUDA Architecture Version 2.3 August 2009 http://www.nvidia.com
[GPUComp-57]. NVIDIA OpenCL JumpStart Guide Technical Brief Version 0.9 April 2009 http://www.nvidia.com
[GPUComp-57]. The OpenCL Specification version 1.0, Published by Khronous OpenCL Working Group, ed. : Aftab Munshi 2009 http://www.khronos.org/registry/cl
[GPUComp-58]. Programming Guide AMD - ATI Stream Computing - Compute Abstraction Layer (CAL) March 2010 http://www.amd.com
[GPUComp-59]. AMD - ATI Stream http://www.amd.com/stream
[GPUComp-60]. Programming Guide - AMD - ATI Stream Computing - OpenCL March 2010 http://www.amd.com/stream
[GPUComp-61]. AMD - ATI Stream Developer Forum http://www.amd.com/streamdevforum
[GPUComp-62]. OpenGL Programming Guide http://www.glprogramming.com/red/
[GPUComp-63]. GPGPU http://www.gpgpu.org     Standford discusison forum http://www.gpgpu.org/forums/
[GPUComp-64]. Techncial Notes -ATI Stream SDK V2.01 Performance and Optimization http://www.amd.com/stream
[GPUComp-65]. Microsoft DirectX Reference Web site http://www.msdn.microsoft.com/en-us/directx
[GPUComp-66]. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, \93Brook for GPUs: stream computing on graphics hardware,\94 ACM Trans. Graph., vol. 23, no. 3, pp. 777\96786, 2004
[GPUComp-67]. Buck, Ian; Foley, Tim; Horn, Daniel; Sugerman, Jeremy; Hanrahan, Pat; Houston, Mike; Fatahalian, Kayvon. \93BrookGPU\94 http://graphics.stanford.edu/projects/brookgpu/
[GPUComp-68]. ATI Compute Abstraction Layer (CAL) Intermediate Language (IL) Reference Manual. Published by AMD.
[GPUComp-69]. CAL Image. ATI Compute Abstraction Layer Program Binary Format Specification. Published by AMD.
[GPUComp-70]. Kernighan Brian W., and Ritchie, Dennis M., The C Programming Language, Prentice-Hall, Inc., Upper Saddle River, NJ, 1978.
[GPUComp-71]. Computational Methods for Tomography - Medical Image Processing http://www.fastra.ua.ac.be
[GPUComp-72]. GPU Gems 3 : Chapter 37 Efficient Random Number Generation and ApplciationUsing CUDA Lee Howes, David Thomas, Imperial College London
[GPUComp-73]. NVIDIA's Fermi : The First Complete GPU Computing Architecture, A white paper by Peter N Glasowsky (Prepared under contract ith NVIDIA Coporation), September 2009
[GPUComp-74]. White Paper Loking Beyond Graphics - NVIDIA's NExt-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Nuslce fo Parallel Computing Analyst : Tom R HalfHill, September 2009 Sponsored by NVIDIA
http://www.in.star-com
[GPUComp-75]. White Paper Loking Beyond Graphics - NVIDIA's NExt-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Nuslce fo Parallel Computing Analyst : Tom R HalfHill, September 2009 Sponsored by NVIDIA
http://www.in.star-com
[GPUComp-76]. Director, Parallel Computing Research Laboratory (Par Lab), U.C. Berjeley The top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges, September 30, 2009 (NVIDIA is one of eight sponsors of the Par. Lab.
[GPUComp-77]. The Protland Group - CUDA Fortran Programming Guide and Reference Published November 2009
http://www.pgroup.com/resources/accel.htm
[GPUComp-78]. The Portland Grpup - PGI Accelerator Compilers - CUDA enabled NVIDIA GPUs http://www.pgroup.com/resources/accel.htm
[GPUComp-79]. GPU Computing Solutions - NVIDIA Tesla & CUDA http://www.nvidia.com/tesla
http://www.nvidia.com/cuda
[GPUComp-80]. Nvidia CUDA :Practical uses - BeHardwaqre, DAmien Triolet Aug 2007 http://www.behardware.com/art/lire/678/
[GPUComp-81]. Sain-Zee Ueng, Melvin Lathara, Sara S BAghsorkhi, and Wen-mei W Hwu CUDA-lite : Reducing GPU Programming Complexity, Center for Reliable and High-Performance Computing Dept of Electrical & CVomp. Engg, Univ of Illinois at Urbana-Champagin
[GPUComp-82]. Yao Zgang Jonathan Cohen, John D Owens Fast Tridiagonal Solvers on the GPU University of California, Davis, Nvidia
[GPUComp-83] Bharatkumar Sharma,Rahul Thota,Naga Vydyanathan,and Amit Kale Towards a Robust,Real-time Face Processing System using CUDA-enabled GPUs Siemens Corporate Techchnology Banglore,India
[GPUComp-84] kishore Kothapalli Rishabh,Mukherjee,M.Suhail Rehman,Suryakant Patidar,P.J.Narayanan,Kannan Srinathan A Performance Prediction Model for the CUDA GPGPU Platform International Institute of Information Technology,Hyderabad,India
[GPUComp-85] John Nickolls,Ian Buck and Michael Garland,NVIDIA,Kevin Skadronn Scalable Parallel Programming Scalable Parallel Programming
[GPUComp-86] N.P.Karunadasa & D.N.Ranasinghe On the comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters University of Colombo School of Computing,Srilanka
[GPUComp-87] Michael Bader,Hans-Joachim Bungartz,Dheevatsa, Srihari Narasimhan,Babu Narayanan Fast GPGPU Data Rearrangement kernels using CUDA Technische universitat Munchen,Munich,Germany, GE Global Research,JFWTC,Bangalore,India
[GPUComp-88] M.Sussman,W.Crutchfield and M.Papakinos Pseudorandom Number Generation on the GPU PeakStream,Inc.,Redwood City,CA,USA
[GPUComp-89] W.B.Langdon A Fast High Quality Pseudo Random Number Generator for nVidia CUDA Department of Computer Science,CREST Centre,King's College,London,WC@R 2LS,UK
[GPUComp-90]

Sara S.Baghsorkhi, Matthieu Delahaye, Sanjay J.atel, William D.Gropp.Wen-mei W.Hwu An Adaptive Performance Modeling Tool for GPU Architectures University of Illionois at Urbana-Champaign,UrbanamIL 61801

[GPUComp-91]

David B. Kirk Wen-mei W. HWu Programming Massively Parallel Processors - A Hands-on Approach Morgan Kaufmann Publishers, 2010

[GPUComp-92]

Dheevatsa Mudigere, Data access optimized applicatios on the GPU using NVIDIA CUDA, Thesis - Master of Science in Computational Science and Engineering, TECHNISCHE UNIVERSITY MUNCHEN,Germany ,October 2009

[GPUComp-93]

Dheevatsa Mudigere (Technischen Universit\E4t M\FCnchen (TUM), Munich, Germany, DE) Fast GPGPU Data Rearrangement Kernels using CUDA , Student Research Symosium, International Conference HiPC-2009, HiPC, Kochi, (Kerla,India), December 2009

[GPUComp-94]

Khronos Group (2009). The OpenCL Specification Version 1.0. Beaverton, OR: Khronos Group http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf

[GPUComp-95]

Message Passing Interface Forum. (2009). MPI: A Message-Passing Interface Standard, Version 2.2. Knowville: University of Tennessee. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf

[GPUComp-96]

OpenMP Architecture Review Board. (2005). OpenMP Application Program Interface. http://www.openmp.org/mp-documents/spec25.pdf

[GPUComp-97]

Buck, I., Foley, T., Horn, D., Sugerman, K., Fatahlian, K., Houston, M., et al. (2004). Brooks for GPUs: Stream computing on graphics hardware. ACM Transaction on Graphics, 23(3), 777-786 http://doi.acm.org/10.1145/1186562.1015800

[GPUComp-98]

Fernanco, R. (Ed.), GPU gems: Programming techniques, tips, and tricks for realtime graphics. Reading, MA: Addison-Wesley http://developer.nvidia.com/object/GPU_Gems_Home.html

[GPUComp-99]

Nickolls, J., Buck, I., Garland M., & Skadron, K. (2008). Scalable parallel programming with CUDA. ACM Queue, 6(2), 40-53.

[GPUComp-100]

NVIDIA. (2007b), NVIDIA computer-PTX: Parallel thread execution, ISA Version 1.1 http://nvidia.com/object/io_1195170102263.html

[GPUComp-101]

NVIDIA. (2009). CUDA Zone http://www.nvidia.com/CUDA

[GPUComp-102]

Segal, M., & Akeley, K. (2006). The OpenGL\AE graphics system: A specification, Version 2.1. Mountain View, CA: Silicon Graphics
http://www.opengl.org/documentation/specs/

[GPUComp-103]

Sengupta, S., Harris M., Zhang, Y., & Owens, J. D. (2007). Scan primitives for GPU computing. In T. Aila & M. Segal (Eds.), Graphics hardware (pp. 97-106). San Diego, CA: ACM Press.

[GPUComp-104]

Stratton, J. A., Stone, S., & Hwu, W. W. (2008). MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC). Canada: Edmontion.

[GPUComp-105]

Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., & Stone, S. S. (2008). Optmization principles and application performance evaluation of a multithereaded GPU using CUDA. In Proceedings of the 13th ACL SIGPLAN Symposium of Pringicples and Practice of Parallel Progrmaming (pp. 73-82). Salt lake City, UT.

[GPUComp-106]

Ryoo, S., Rodrigues, C. I., Stone, S. S., Baghsorkhi, S. S., Ueng, S. Z., Stratton, J. A. et al. (2008). Program prunning for a multithreaded GPU. In Code generation and optimization: Proceedigns of the Sixth Annual IEEE/ACM International Symposium on code generation and optimization (pp. 195-204). Boston, MA.

[GPUComp-107]

Khronos Group (2010). OpenCL implementations, tutorials, and sample code. Beaverton, OR: Khronos Group.
http://www.khronos.org/developers/resources/opencl/

[GPUComp-108]

NVIDIA. (2010). OpenCL GPU computing support on NVIDIA\92s CUDA architecture GPUs. Santa Clara, CA: NVIDIA. http://www.nvidia.com/object /cuda_opencl.html

Centre for Development of Advanced Computing