|
[GPUComp-01]. |
Randi J. Rost, OpenGL \96 shading Language, Second Edition, Addison Wesley 2006
|
|
[GPUComp-02]. |
GPGPU Reference http://www.gpgpu.org
|
|
[GPUComp-03]. |
NVIDIA http://www.nvidia.com
|
|
[GPUComp-04]. |
NVIDIA Tesla
http://www.nvidia.com/object/tesla_computing_solutions.html
|
|
[GPUComp-05]. |
CUDA sample source code:
http://www.nvidia.com/object/cuda_get_samples.html
|
|
[GPUComp-06]. |
AMD Stream Processors
http://ati.amd.com/products/streamprocessor/specs.html
|
|
[GPUComp-07]. |
OpenCL - The open standard for parallel programming of heterogeneous systems
http://www.khronos.org/opencl
|
|
[GPUComp-08]. |
List of NVIDIA GPUs compatible with CUDA:f heterogeneous systems
http://www.nvidia.com/object/cuda_learn_products.html
|
|
[GPUComp-09]. |
RAPIDMIND
http://www.rapidmind.net
|
|
[GPUComp-10]. |
Peak Stream - Parallel Processing (Acquired by Google in 2007)
http:/www.google.com
|
|
[GPUComp-11]. |
guru3d.com
http://www.guru3d.com/news/sandra-2009-gets-gpgpu-support/
|
|
[GPUComp-12]. |
NVIDIA, NVIDIA CUDA, Programming Guide, v. 2.3, NVIDIA Corporation (2009).
|
[GPUComp-02]. |
CUDA Zone - http://www.nvidia.com/object/cuda_home.html
|
[GPUComp-13]. |
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruuger, A. E Lefohn, T. J. Purcell,
A Survey of General-Purpose Computation on Graphics Hardware, Computer Graphics Forum (2007), Vol 26, pages 80 - 113s
|
[GPUComp-14]. |
M. Harris, Optimizing NVIDIA CUDA, Presentation at AstroGPU conference (2007).
|
[GPUComp-15]. |
G. Ruestch, P. Micikevicius, Optimizing Matrix Transpose in CUDA, Tech report, NVIDIA Corporation (2009).
|
[GPUComp-16]. |
GPU Gems book series (available online), GPU Gems:
http://developer.nvidia.com/object/gpu_gems_home.html
http://developer.nvidia.com/object/gpu_gems_2_home.html
http://developer.nvidia.com/object/gpu-gems-3.html
|
[GPUComp-17].
|
G. Ruestch, P. Micikevicius, Optimizing Matrix Transpose in CUDA, Tech report, NVIDIA Corporation (2009).
|
[GPUComp-18]. |
M. Harris, Parallel Prefix Sum (Scan) with CUDA, Tech report, NVIDIA Corporation (2008).
|
[GPUComp-19]. |
N. Sathish, M. Harris, M. Garland, Designing Efficient Sorting Algorithms for Many-core GPUs, Tech report,
NVIDIA Corporation (2008).
|
[GPUComp-20]. |
J. Meng, K. Skadro, Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs,
ICS \9209: Proceedings of the 23rd international conference on Supercomputing (2009), 256 - 265.
|
[GPUComp-21]. |
M. Harris, GPU Gems: Chapter 38 - Fast Fluid Dynamics Simulation on the GPU, GPU Gems: Programming Techniques, Tips
and Tricks for Real-Time Graphics - NVIDIA Corporation (2007).
|
[GPUComp-22]. |
P. Micikevicius, 3D Finite Difference Computation on GPUs using CUDA, Tech report, NVIDIA Corporation (2009).
|
[GPUComp-23]. |
M. Bader, H-J. Bungartz, D. Mudigere, S. Narasimhan, B. Narayanan,
Optimized CUDA Implementation of a Navier-Stokes based flow solver for the 2D Lid Driven Cavity,
poster at the NVIDIA GPU research summit (2009).
|
[GPUComp-24]. |
J. M. Cohen, M. J. Molemaker, A Fast Double Precision CFD Code using CUDA, Tech report, NVIDIA Corporation (2009).
|
[GPUComp-25]. |
M. Harris, Parallel Prefix Sum (Scan) with CUDA, Tech report, NVIDIA Corporation (2008).
|
[GPUComp-26]. |
RAPIDMIND & AMD
http://www.rapidmind.net/News-Aug4-08-SIGGRAPH.php
|
[GPUComp-27]. |
Merrimac - Stream Architecture Standford Brook for GPUs
http://www-graphics.stanford.edu/projects/brookgpu/
|
[GPUComp-28]. |
Standford : Merrimac - Stream Architecture
http://merrimac.stanford.edu/
|
[GPUComp-29]. |
ATI RADEON - AMD
http://www.canadacomputers.com/amd/radeon/
|
[GPUComp-30]. |
Sparse Matrix Solvers on the GPU ; conjugate Gradients and Multigrid by Jeff Bolts, Ian Farmer, Eitan Grinspum,
Peter Schroder, Caltech Report (2003); Supported in part by NSF, NVIDIA
|
[GPUComp-31]. |
Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John
D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
|
[GPUComp-32]. |
Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John
D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
|
[GPUComp-33]. |
Scan Primitives for GPU Computing by Shubhabrata Sengupta, Mark Harris*, Yao Zhang and John
D Owens University of California Davis & *nVIDIA Corporation Graphic Hardware (2007).
|
[GPUComp-34]. |
Bollz J., Farmer I., Grinspun F., Schroder F : Sparse Matris Solvers on the GPU ;
Conjugate Gradients and multigrid ACM Transactions on Graphics (Proceedings of ACM SIGRAPH 2003) 22, 2 (Jul y2003)
pp 917-924 Graphic Hardware (2007).
|
[GPUComp-35]. |
Number crunching with GPUs PeakStream Math API Exploits Parallelism in Graphics Processors, Ocotober 2006;
Microprocessor
http://www.mdronline.com
|
[GPUComp-36]. |
Tom R. Halfhill, Parallel Processing with CUDA Nvidia's High-Performance Computing Platform Uses Massive Multithreading ;
Microprocessors, Volume 22, Archive 1, January 2008
http://www.mdronline.com
|
[GPUComp-37]. |
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Hoston, P.Hanrahan, Brook for GPUs ; Stream Computing
on GRaphics Hadrware, ACM Tran. GRaph (SIGGRAPH) 2008
|
[GPUComp-38]. |
J. Kriiger, R. Wetermann, Linear Algeria operators for GPU implementation of Numerical Algorithms
ACM Tran, Graph (SIGGRAPH) 22 (3) pp. 908-916. (2003)
|
[GPUComp-39]. |
Tutorial SC 2007 : High Performance Computing with CUDA
|
[GPUComp-40]. |
FASTRA
http://www.fastra.ua.ac.be/en/faq.html
|
[GPUComp-41]. |
AMD Stream Computing software Stack
http://www.amd.com
|
[GPUComp-42]. |
BrookGPU :
http://graphics.standafrod.edu/projects/brookgpu/index.html
|
[GPUComp-43]. |
Tom R Halfhill, Intel\92s Larrabee Redefines GPUs \96 Fully Programmable Many core Processor
Reaches Beyond Graphics, Microprocessor Report September 29, 2008
|
[GPUComp-44]. |
Tom R Halfhill AMD\92s Stream Becomes a River \96 Parallel Processing Platform for ATI GPUs Reaches More Systems,
Microprocessor Report December 2008
|
[GPUComp-45]. |
General-purpose computing on graphics processing units (GPGPU)
http://en.wikipedia.org/wiki/GPGPU
|
[GPUComp-46]. |
Khronous Group, OpenGL 3, December 2008
http://www.khronos.org/opengl
|
[GPUComp-47]. |
Perry H. Wang1, Jamison D. Collins1, Gautham N. Chinya1, Hong Jiang2, Xinmin Tian3 ,
EXOCHI: Architecture and Programming Environment for A Heterogeneous Multi-core Multithreaded System,
PLDI\9207
|
[GPUComp-48]. |
Daniel Weiskopf, Basics of GPU-Based Programming, Institute of Visualization and Interactive Systems,
Interactive Visualization of Volumetric Data on Consumer PC Hardware: Basics of Hardware-Based Programming
University of Stuttgart, VIS 2003
|
[GPUComp-48]. |
GPU Programming Languages
http://www.cis.upenn.edu/~suvenkat/700/
|
[GPUComp-49]. |
OpenGL design
http://graphics.stanford.edu/courses/cs448a-01-fall/design_opengl.pdf
|
[GPUComp-50]. |
OpenCL - The open standard for parallel programming of heterogeneous systems
http://www.khronos.org/opencl
|
[GPUComp-51]. |
Mary Fetcher and Vivek Sarkar, Introduction to GPGPUS \96 Seminar on Heterogeneous Processors,
Dept. of computer Science, Rice University, October 2007
|
[GPUComp-52]. |
C-DAC Technology Workshops PEEP-2008 & OPECG-2009
http://www.cdac.in
|
[GPUComp-53]. |
NVIDIA CUDA Quick Start Guide 2007-2009
http://www.nvidia.com/object/cuda_develop.html
|
[GPUComp-54]. |
NVIDIA OpenCL Best Practices Guide Version 1.0 August 2009
http://www.nvidia.com
|
[GPUComp-55]. |
NVIDIA OpenCL Getting Started Guide Version 2009
http://www.nvidia.com
|
[GPUComp-56]. |
NVIDIA OpenCL Programming Guide for the CUDA Architecture Version 2.3 August 2009
http://www.nvidia.com
|
[GPUComp-57]. |
NVIDIA OpenCL JumpStart Guide Technical Brief Version 0.9 April 2009
http://www.nvidia.com
|
[GPUComp-57]. |
The OpenCL Specification version 1.0, Published by Khronous OpenCL Working Group, ed. : Aftab Munshi 2009
http://www.khronos.org/registry/cl
|
[GPUComp-58]. |
Programming Guide
AMD - ATI Stream Computing - Compute Abstraction Layer (CAL) March 2010
http://www.amd.com
|
[GPUComp-59]. |
AMD - ATI Stream
http://www.amd.com/stream
|
[GPUComp-60]. |
Programming Guide -
AMD - ATI Stream Computing - OpenCL March 2010
http://www.amd.com/stream
|
[GPUComp-61]. |
AMD - ATI Stream Developer Forum
http://www.amd.com/streamdevforum
|
[GPUComp-62]. |
OpenGL Programming Guide
http://www.glprogramming.com/red/
|
[GPUComp-63]. |
GPGPU
http://www.gpgpu.org
Standford discusison forum
http://www.gpgpu.org/forums/
|
[GPUComp-64]. |
Techncial Notes -ATI Stream SDK V2.01 Performance and Optimization
http://www.amd.com/stream
|
[GPUComp-65]. |
Microsoft DirectX Reference Web site
http://www.msdn.microsoft.com/en-us/directx
|
[GPUComp-66]. |
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P.
Hanrahan, \93Brook for GPUs: stream computing on graphics hardware,\94 ACM
Trans. Graph., vol. 23, no. 3, pp. 777\96786, 2004
|
[GPUComp-67]. |
Buck, Ian; Foley, Tim; Horn, Daniel; Sugerman, Jeremy; Hanrahan, Pat;
Houston, Mike; Fatahalian, Kayvon. \93BrookGPU\94
http://graphics.stanford.edu/projects/brookgpu/
|
[GPUComp-68]. |
ATI Compute Abstraction Layer (CAL) Intermediate Language (IL) Reference
Manual. Published by AMD.
|
[GPUComp-69]. |
CAL Image. ATI Compute Abstraction Layer Program Binary Format
Specification. Published by AMD.
|
[GPUComp-70]. |
Kernighan Brian W., and Ritchie, Dennis M., The C Programming Language,
Prentice-Hall, Inc., Upper Saddle River, NJ, 1978.
|
[GPUComp-71]. |
Computational Methods for Tomography - Medical Image Processing
http://www.fastra.ua.ac.be
|
[GPUComp-72]. |
GPU Gems 3 : Chapter 37 Efficient Random Number Generation and ApplciationUsing CUDA
Lee Howes, David Thomas, Imperial College London
|
[GPUComp-73]. |
NVIDIA's Fermi : The First Complete GPU Computing Architecture, A white paper by Peter N Glasowsky (Prepared under
contract ith NVIDIA Coporation), September 2009
|
[GPUComp-74]. |
White Paper Loking Beyond Graphics - NVIDIA's NExt-Generation CUDA Compute and Graphics Architecture,
Code-Named Fermi, Adds Nuslce fo Parallel Computing Analyst : Tom R HalfHill, September 2009
Sponsored by NVIDIA
http://www.in.star-com
|
[GPUComp-75]. |
White Paper Loking Beyond Graphics - NVIDIA's NExt-Generation CUDA Compute and Graphics Architecture,
Code-Named Fermi, Adds Nuslce fo Parallel Computing Analyst : Tom R HalfHill, September 2009
Sponsored by NVIDIA
http://www.in.star-com
|
[GPUComp-76]. |
Director, Parallel Computing Research Laboratory (Par Lab), U.C. Berjeley
The top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges,
September 30, 2009 (NVIDIA is one of eight sponsors of the Par. Lab.
|
[GPUComp-77]. |
The Protland Group - CUDA Fortran Programming Guide and Reference Published November 2009
http://www.pgroup.com/resources/accel.htm
|
[GPUComp-78]. |
The Portland Grpup - PGI Accelerator Compilers - CUDA enabled NVIDIA GPUs
http://www.pgroup.com/resources/accel.htm
|
[GPUComp-79]. |
GPU Computing Solutions - NVIDIA Tesla & CUDA
http://www.nvidia.com/tesla
http://www.nvidia.com/cuda
|
[GPUComp-80]. |
Nvidia CUDA :Practical uses - BeHardwaqre, DAmien Triolet Aug 2007
http://www.behardware.com/art/lire/678/
|
[GPUComp-81]. |
Sain-Zee Ueng, Melvin Lathara, Sara S BAghsorkhi, and Wen-mei W Hwu
CUDA-lite : Reducing GPU Programming Complexity, Center for Reliable and High-Performance Computing
Dept of Electrical & CVomp. Engg, Univ of Illinois at Urbana-Champagin
|
[GPUComp-82]. |
Yao Zgang Jonathan Cohen, John D Owens
Fast Tridiagonal Solvers on the GPU
University of California, Davis, Nvidia
|
[GPUComp-83] |
Bharatkumar Sharma,Rahul Thota,Naga Vydyanathan,and Amit Kale
Towards a Robust,Real-time Face Processing System using CUDA-enabled GPUs
Siemens Corporate Techchnology Banglore,India
|
[GPUComp-84] |
kishore Kothapalli Rishabh,Mukherjee,M.Suhail Rehman,Suryakant Patidar,P.J.Narayanan,Kannan Srinathan
A Performance Prediction Model for the CUDA GPGPU Platform
International Institute of Information Technology,Hyderabad,India
|
[GPUComp-85] |
John Nickolls,Ian Buck and Michael Garland,NVIDIA,Kevin Skadronn
Scalable Parallel Programming
Scalable Parallel Programming
|
[GPUComp-86] |
N.P.Karunadasa & D.N.Ranasinghe
On the comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
University of Colombo School of Computing,Srilanka
|
[GPUComp-87] |
Michael Bader,Hans-Joachim Bungartz,Dheevatsa, Srihari Narasimhan,Babu Narayanan
Fast GPGPU Data Rearrangement kernels using CUDA
Technische universitat Munchen,Munich,Germany, GE Global Research,JFWTC,Bangalore,India
|
[GPUComp-88] |
M.Sussman,W.Crutchfield and M.Papakinos
Pseudorandom Number Generation on the GPU
PeakStream,Inc.,Redwood City,CA,USA
|
[GPUComp-89] |
W.B.Langdon
A Fast High Quality Pseudo Random Number Generator for nVidia CUDA
Department of Computer Science,CREST Centre,King's College,London,WC@R 2LS,UK
|
[GPUComp-90] |
Sara S.Baghsorkhi, Matthieu Delahaye, Sanjay J.atel, William D.Gropp.Wen-mei W.Hwu
An Adaptive Performance Modeling Tool for GPU Architectures
University of Illionois at Urbana-Champaign,UrbanamIL 61801
|
[GPUComp-91] |
David B. Kirk Wen-mei W. HWu
Programming Massively Parallel Processors - A Hands-on Approach
Morgan Kaufmann Publishers, 2010
|
[GPUComp-92] |
Dheevatsa Mudigere, Data access optimized applicatios on the GPU using NVIDIA CUDA,
Thesis - Master of Science in Computational Science and Engineering, TECHNISCHE UNIVERSITY MUNCHEN,Germany ,October 2009
|
[GPUComp-93] |
Dheevatsa Mudigere (Technischen Universit\E4t M\FCnchen (TUM), Munich, Germany, DE)
Fast GPGPU Data Rearrangement Kernels using CUDA ,
Student Research Symosium, International Conference HiPC-2009,
HiPC, Kochi, (Kerla,India), December 2009
|
[GPUComp-94] |
Khronos Group (2009). The OpenCL Specification Version 1.0. Beaverton, OR: Khronos Group
http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
|
[GPUComp-95] |
Message Passing Interface Forum. (2009). MPI: A Message-Passing Interface Standard, Version 2.2. Knowville: University of Tennessee.
http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
|
[GPUComp-96] |
OpenMP Architecture Review Board. (2005). OpenMP Application Program Interface.
http://www.openmp.org/mp-documents/spec25.pdf
|
[GPUComp-97] |
Buck, I., Foley, T., Horn, D., Sugerman, K., Fatahlian, K., Houston, M., et al. (2004). Brooks for GPUs: Stream computing on graphics hardware.
ACM Transaction on Graphics, 23(3), 777-786
http://doi.acm.org/10.1145/1186562.1015800
|
[GPUComp-98] |
Fernanco, R. (Ed.), GPU gems: Programming techniques, tips, and tricks for realtime graphics. Reading, MA: Addison-Wesley
http://developer.nvidia.com/object/GPU_Gems_Home.html
|
[GPUComp-99] |
Nickolls, J., Buck, I., Garland M., & Skadron, K. (2008). Scalable parallel programming with CUDA. ACM Queue, 6(2), 40-53.
|
[GPUComp-100] |
NVIDIA. (2007b), NVIDIA computer-PTX: Parallel thread execution, ISA Version 1.1
http://nvidia.com/object/io_1195170102263.html
|
[GPUComp-101] |
NVIDIA. (2009). CUDA Zone
http://www.nvidia.com/CUDA
|
[GPUComp-102] |
Segal, M., & Akeley, K. (2006). The OpenGL\AE graphics system: A specification, Version 2.1. Mountain View, CA: Silicon Graphics
http://www.opengl.org/documentation/specs/
|
[GPUComp-103] |
Sengupta, S., Harris M., Zhang, Y., & Owens, J. D. (2007). Scan primitives for GPU computing.
In T. Aila & M. Segal (Eds.), Graphics hardware (pp. 97-106). San Diego, CA: ACM Press.
|
[GPUComp-104] |
Stratton, J. A., Stone, S., & Hwu, W. W. (2008). MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs.
In Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC). Canada: Edmontion.
|
[GPUComp-105] |
Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., & Stone, S. S. (2008). Optmization principles and application performance evaluation of a multithereaded GPU using CUDA.
In Proceedings of the 13th ACL SIGPLAN Symposium of Pringicples and Practice of Parallel Progrmaming (pp. 73-82). Salt lake City, UT.
|
[GPUComp-106] |
Ryoo, S., Rodrigues, C. I., Stone, S. S., Baghsorkhi, S. S., Ueng, S. Z., Stratton, J. A. et al. (2008).
Program prunning for a multithreaded GPU. In Code generation and optimization:
Proceedigns of the Sixth Annual IEEE/ACM International Symposium on code generation and optimization (pp. 195-204). Boston, MA.
|
[GPUComp-107] |
Khronos Group (2010). OpenCL implementations, tutorials, and sample code. Beaverton, OR: Khronos Group.
http://www.khronos.org/developers/resources/opencl/
|
[GPUComp-108] |
NVIDIA. (2010). OpenCL GPU computing support on NVIDIA\92s CUDA architecture GPUs. Santa Clara, CA: NVIDIA.
http://www.nvidia.com/object /cuda_opencl.html
|